With the recent deluge of ransomware articles discussing risks, likelihood, payment options, and proposed solutions, it’s a good idea to take a step back to see where you stand with regard to preparedness, response, and recovery.  

If you’ve had a risk or vulnerability assessment in the last several years, your organization was likely advised to take steps to help prevent and prepare for large-scale malware/ransomware events.  Managing cyber risk or being “prepared” is much more than writing documents or installing technology, but fundamentally it is the result of operationalizing all activity that involves people, processes or technology (PPT). This equates to effective risk management and reduced impact should an event occur.  For example:

Imagine, Monday morning as the plant begins to execute a start-up after a weekend shutdown, a flurry of tech support tickets and escalations begin to stack up.  According to your last audit, there is an ad hoc process to restore a backup and recover from a single system failure, but this seems out of the ordinary, and you intuitively begin to suspect the worst. What now? 

You wonder: Are my backups any good? How do I stop the spread? How do I plug the holes and get control of my assets and their users?  What are my assets? Who do I need to call? Who will help me resolve the issue? Too many questions and too little time.

The awareness and training aspects of being prepared can be overwhelming, especially if you want the closest simulation, but for an initial smoke test, I believe a rudimentary skit can be devised to illustrate gaps in your organization’s processes, resources, training, and even technology.  Tabletop exercises do not need to be “hacker” orientated, don’t require elaborate props or expensive third-party trainers and platforms, and needn’t be limited to just the security team. With a little time and effort, they can be made effective and accessible to a  wider audience of stakeholders.

Executing low-cost ransomware or cyber-event tabletop (TTX) or paper-based training has several benefits:

  • It raises awareness within the organization about the current state of maturity and incident/event preparedness.
  • It satisfies a compliance or framework checkbox.
  • Such training offers a low-cost, high-reward way to illuminate gaps that could threaten the organization’s overall event response.
  • The exercises can be devised internally by individuals who understand how the facilities actually run.
  • It brings all parties to the table and settles disputes over who owns what.
  • Communications driven by the tabletops often facilitate organizational change and foster improved inter-domain trust.

Based on my experiences creating technical simulations from real data such as the S4 ICS Detection Challenges, the principal components in creating a skit can be simplified when a straightforward event needs to be explored:

  • Frame: Devising a relevant scenario that could affect the organization.
  • Composure: Describing the scenario playing out using the organization’s systems, processes, technology, personnel, and the attack itself.  This is the largest component, and multiple pathways for the attack/response should be considered.
  • Implement: Pulling all of the pieces together based on the “frame” and composure elements means you will need to have scripts, supporting material, roles/responsibilities, processes/playbooks, and everything aligned to represent the realities of the organization and event.  This can be simulated standalone without technologies or console access for example.
  • Execute: Running the event includes scheduling required resources, facilitating the event, distributing material, and recording results and observations.
  • Next steps: Summarizing all of the events’ learnings and acting on any identified gaps is a critical component of the tabletop exercise.

Using those phases, let’s start by creating and facilitating awareness events that include technical and non-technical participants.  

Framing the ransomware event

This step entails the creation of a summary scenario that outlines the whole exercise.  This can be crafted by an individual or by a team with relevant understanding of the critical functions of the organization and their overall technology and security posture.  The framing is an outline that describes an initial hypothesis and activities for the event within scope.  For example, a frame may be built using the following elements:

  1. Survey “environment” high-level data (just like you would assets that are in scope for red-teaming)
  2. Research the domain and the business/site itself (generate context)
    • Determine high-value targets and end-games that would define a “bad day” at your organization.
    • Describe the kind of noise (extraneous details) that would likely be present. 
    • Evaluate level of anonymization required (estimate sensitivity).
    • Deep-anonymizing and what that would look like/appear to be (optional to some extent).
    • Construct a compelling and realistic scenario (playwright scenario).
    • Describe a high-level attack in one sentence based on the data, company details, and attack vector.
  3. Define the objective of the exercise. 
  4. Consider judging, event direction and outputs in the context of stakeholders, internal audit requirements, and the inputs and desires of management.

If we can flesh out all of these areas, we’re on our way to creating an informative exercise.  I’ll illustrate framing with an example: A large-scale asset owner (MarineCo) operating a maritime port.

Risk Manager (RM) from MarineCo has been watching the news and heard about the Maersk ransomware incident.  RM’s company is a $100M company with profits tightly correlated to the organization running smoothly.  Any disruption to product moving in or out of the facility has a string impact on both the company’s bottom line and on the local economy.  RM knows that the team is aware of this risk from several audit findings, but he wants to know if the others in the organization are prepared for a massive outage that would likely occur in the wake of a ransomware attack.  RM is also aware that many MarineCo systems run on antiquated software, leverage end-of-life operating systems, and suffer from subpar network and user management.

RM frames the incident using these statements:

  • “A malicious party enters ABC site through common attack vectors in the business network. The attacker then moves toward XYZ critical system as the target for ransomware with the goal of disabling operations by disrupting an OT process hosted on IT infrastructure.”
  • “Without operations, cargo cannot be moved, transported, loaded or unloaded, and the organization will burn $123 dollars per hour until the situation is resolved.”
  • “The systems affected would be ABC and XYZ.  These reside here, and they are supported by these particular groups who are reported to follow and have a variety of processes at hand.”

Then RM arrives at this scenario:

“If the organization faced an aggressive ransomware incident that was spreading quickly from a vulnerable and compromised system, could we manage it as I’ve been promised, communicate to our customers during a disruption, and recover efficiently – even to a degraded state?”

RM would the move to the next phase: The scenario.

Composing the scenario

Beginning with the framing materials, we need to scope out the scenario much the way an author or playwright defines their story.  First, we need to know:

  • Where would a plausible compromise originate from?  IT systems used for accounts receivables or order confirmations, likely through an email infrastructure or a compromised VPN account.  The facility or organization is not particularly special but would certainly face a similar risk exposure profile to other organizations that have been ransomed already.
  • How would an incident occur in order to establish likelihood and a common footing to base the event?  These systems are generally multi-purpose, users watch YouTube and open a variety of emails and corporate attachments.  Users of these systems have a fairly high phishing fail rate during anonymous awareness testing campaigns, certainly these systems are under corporate control, but their policies and controls are not concrete due to union or personnel policy complications.  Isolation and eradication would be difficult, and an attack here would be likely.  Given that the plant network is often improperly segmented due to its age and design, malware can spread into OT quite easily.  Once into OT, if compromised logistics servers were targeted or even the AD servers, operations would likely grind to a halt.
  • Who would be the participants and what roles they would play in the detection, identification, response, and remediation of the event?  E.g., analyst, local operator, local administrator, facility manager, corporate security manager, technical, director, executive, legal, etc.
  • What supporting systems, infrastructure, and evidence would be used in the scenario?   At a minimum, something that resembles screenshots and data from an email client, Windows workstations, SIEM/SOC services, corporate servers for IT or OT, VPN/local user accounts, AD servers, networking infrastructure and related logs, backup servers, asset inventory information, etc. Company processes, procedures, incident/response playbooks, best practices should also be on hand.  
  • And ultimately, why are we doing this?  If our goal is to test processes, knowledge and OTSM maturity, then the story needs to align.

We need to draft the scenario in a play-by-play manner.  It can be linear or multi-pathed – like a choose-your-own-adventure book.  The simplest of the two options is a linear storyline, but often reality likes to add its own dose of surprises so it’s best to have multiple paths considered, and a few predefined complications to add at times.  

For example:

  • Analyst A in the MarineCo’s SOC has had a busy day. She sees the alert and decides to close the ticket because it looks like business as usual. The malware continues to spread.  

or

  • Analyst A just started their shift and has “fresh” eyes. She recognizes the alert as one that requires investigation and decides to escalate once a convergence of alarms is observed.  This limits infection to just a handful of systems.

The overall event of course needs to be eventually scripted (this is in the implementation phase), but during the composure phase, an approach that looks similar to the table of contents for a technical manual might suffice:

  1. Chapter 1 – Prepping the event
    1.  Outline the background, roles, responsibilities and scenario.
  2. Chapter 2 – Begin the event by starting with detection and identification.
    1. Walk through the initial event with “evidence” 
    2. Begin the infection story starting with the ticket, assign AnalystA role.
    3. … continue script
  3. Chapter 3 – Response
    1. Plant manager is made aware and coordinates activities to ensure safe operation under a watch condition
    2. Response teams review evidence that points to a few predefined entry points and malware based on predefined evidence/props (e.g., screenshots and logs)
    3. Response teams quickly attempt to isolate those systems, but one of the wrenches is thrown into the mix.
      1. The response team is not affected
      2. The response team is negatively affected, and the incident continues to escalate to C-suite roles
      3. Insurance company does not pay the ransom
    4. Response team escalates to a high severity event:
      1. C-suite is notified
      2. Legal comes into play
      3. Media/communications need to be drafted
      4. Entire network goes down, email and messaging included
      5. IT shuts down systems critical to OT or makes changes
      6. Consequences of those changes or loss of connectivity without proper OT support increase delays
      7. Losses grow by the minute
    5. Script continues
  4. Chapter 4 – Recovery
    1. Assuming the response teams were able to manage the response and isolation
    2. Suddenly a wrench is thrown into the mix: not enough bandwidth, failed backup/RAID, missing credentials, malware re-emerges spread, etc
    3. Teams watch their watches, and the media needs to be informed
    4. Script continues
  5. Chapter 5 – Remediation
    1. ….

The idea is to ensure that processes that are in scope will be enacted at some point, all roles are impacted, escalation paths are noted, and even fringe activities are covered during the envisioned scenario. For a first attempt, it’s best to keep things simple and have a manageable group of perhaps seven or eight individuals.

Implementing the ransomware scenario

With the scenario has been outlined, it’s time to implement the exercise in its entirety.  In addition to being the playwright, you’ll also be serving as producer, director, prop master and observer.  The script may cover all of al the roles, but it is the responsibility of the producer to set up the scenario with sufficient context to capture the audience’s attention or deliver a message. Whatever the effect, a good scenario needs believable data.

Implementing a scenario has a time factor that cannot be understated.  It’s one thing to find enough time to get everyone in the room, another to cross all scenarios or gaps, and yet another to raise awareness for gaps early on while everyone involved is intently focused on the task at hand.

It’s great to test the processes end to end and have varying amounts of realistic data, but if you cannot implement a plausible scenario within a reasonable time frame, the impact of such a training exercise will be limited.

As an example, during the S4x19 S4 ICS Detection Challenge, we created a gigantic set of data under strict NDA to mimic a large faux mining facility located in Eastern Canada.  The attack had plenty of noise, some real attacks, and an endgame.  The data set was over 130GB in network traffic and the participants were well versed in OT cybersecurity. But despite the lead time for the participants, the main objective of the attackers was missed.

My goal was to see where the participants and their tools would fall short.  I wanted to challenge their confidence in their tools and chase rabbits through a labyrinth because during most incidents, defenders have too much data or not enough time or are generally dealing with the consequences after the fact.  To be fair, all parties fared reasonably well, but my point was to explain that even with the greatest tools and minds, the limited detection surface would have only been a single piece of the puzzle.

In the MarineCo scenario, you need to keep in mind all of the pieces.  There will likely be conversations, processes to be found, responsibilities assigned, challenges to be added. But the event needs to be flexible enough for re-use, if possible, and completable within a single sitting.

The final script needs to contain all the elements with believable roles, relevant screenshots or simulated tooling, props and organization artifacts handy, OT facts such as shared passwords or other common behaviors, and clear start and endpoints.

Executing the cyber incident tabletop exercise

Now that we have the frame, composed elements, and the implemented attack all in one, we need to execute.  I believe that being in the same room helps establish trust and bring light to groups that often do not interact.  I even sometimes ask them to switch sides

Regardless, the execution phase is primarily about:

  • Getting the right people in the room and assigned to their appropriate roles
  • Distributing the props, evidence and supporting materials at the correct times.  This should include referencing your OT asset inventories for the specific sites in question.
  • Mediating and facilitating the scenario sufficiently to keep the execution of the script smooth.
  • Observing and recording times, responses and questions from participants, especially when the complications are introduced.

In addition to the execution phase, a very important piece to keep in mind is the role of the mediator, and also any recordings. You may have organizational policies and concerns either for privacy or other situational factors such as sensitive data.

Regardless of any of the actions, frustrations, or even the observations, it is important to consider exercises such as simulating a wide-spread ransomware attack as a training tool.  It is not a “finger-pointing” exercise or criteria for someone to be disciplined or removed. Rather, it should be viewed as guidance to help individuals, groups and the company at large fare better during an incident.

Summarizing and acting upon any observations from the exercise

The last piece of an exercise involves the collection of insights into how the organization is performing in terms of preparedness.  Generally, most organizations rate themselves on cybersecurity maturity via a matrix of controls, but rarely are those controls truly tested end to end,.  The summation of the exercise often results in a number of surprises. At this final stage, it is important to:

  • Observe all of the participants and their roles for clarity, understanding, demeanor, and competency. 
  • Monitor the time it takes for important events to be understood and acted upon.
  • Look for gaps in training and process. If the plant operator has to go find the manual, how quickly can they find the playbook for isolating the network or to recover systems at scale?  Alternatively, if there is an analyst, can they walk through a series of alerts and identify the correct one in a timely manner?
  • Extract the learnings the organization and participants just witnessed.  For example, we may have just recovered all the Windows systems and can function, but we lost XYZ transport data and must now manually inspect 123 containers until verified because of the time-lapse, and this results also in ABC external repercussions such as inability to deliver products as specified in contract.
  • Begin initiatives that result in acquiring technology or resources where clear gaps were observed.
  • Act on all of the findings with follow-up test runs of the ransomware event as part of your overall cybersecurity OT program.

Real-world incidents in IT or OT require all hands to be present.  An event with limited scope, however, can be simulated with minimal investment and can quickly highlight the gaps in cybersecurity capability, particularly if the organization does not consistently apply cybersecurity basics.  Through the combination of simulating a ransomware event across people, processes, and technology, organizations can improve their chances to defend against a ransomware attack, limit the impact, find value in their technology investments, and create organizational change.

Staying ahead of today’s threats

Simulated attacks and tabletop exercises represent significant tools for reducing risk and bolstering defenses in modern ICS environments. But they’re not the only arrows in the defender’s quiver. Maximizing OT security maturity requires planning and practice coupled with exhaustive asset inventories, well-crafted policies, robust controls and a unified platform that offers 360-degree visibility into all aspects of ICS security assessment, defense, response and recovery.

To learn more:

Take a deep dive into all things OT security including what it is, how it works and where to start when building an effective OT/ICS security program.

Read our case study detailing one industrial firm’s journey to NIST CSF-based security maturity leveraging the powerful combination of Verve Security Center and our expert VIP Services.

See how Verve’s endpoint management capabilities compare head-to-head with other OT/ICS security assessment, protection and response methodologies.

OT System Management Whitepaper

Download our whitepaper to learn more about the benefits of an OTSM approach.

OT Systems Management Whitepaper

Related Resources

Blog

How to Prevent OT Ransomware Attacks: A Comprehensive Guide

OT ransomware attacks are on the rise. Learn proven strategies to protect your industrial systems, minimize downtime, and recover quickly.

Learn More
Blog

3 Benefits of a 360-Degree Vulnerability Assessment

Defending critical infrastructure requires 360-degree visibility into asset and network vulnerabilities through a vulnerability assessment.

Learn More
Blog, Guide

The Ultimate Guide to Protecting OT Systems with IEC 62443

The ISA/IEC 62443 collection of standards is laser-focused on industrial controls. Here’s how to make the most of them.

Learn More