Between May 6 and May 12, 2021, Colonial Pipeline, owner of 5,500 miles of pipeline carrying natural gas, gasoline, and diesel from Texas to New Jersey, shut down its operations in response to what it said was a ransomware attack targeting its IT network. In a media statement, Colonial officials indicated the damage was limited to their IT systems, but that the company “proactively took certain systems offline to contain the threat.”

That response, which included disabling select OT/ICS systems, “temporarily halted all pipeline operations … which we are actively in the process of restoring.” The company added that its operational technology (OT) systems were fine, and the shutdown was a measured response to enable quick recovery. Without such an abundance of caution, the IT malware might have proven much more disruptive thanks to the interconnectedness of pipeline infrastructure and participants upstream/downstream (e.g., custody transfers, shared remote metering, available storage/capacity, etc.).

Since the Colonial incident, several other major ransomware attacks on operating entities have been reported: Martha’s Vineyard Ferry Service, FUJIFILM, and the JBS meat company who supplies 40% of all the US meat supply. This comes on the heels of several other large public ransomware events at the second-largest paper company, Westrock, Molson Coors, and others just this year.

The reality is that industrial organizations are now in the crosshairs of the ransomware gangs as the impacts from lost availability is in the millions of dollars, so the ransom demands can be quite high. A recent report by Digital Shadows found that industrial goods and services was the number one most targeted industry in 2020 at 29% the number of attacks was more than those on the next 3 industries (retail, construction,  and technology) combined.

What is Ransomware?

Ransomware is a form of virus or more commonly called malware.  Essentially the bad guys find a way in (phishing, social engineering, etc) to first invade the target network.  Their ‘software’ then runs around the network (traversing network shares, local drives, etc) encrypting everything it finds with a key that only the bad guys know.  If you want to unlock your files you have to pay the bad guy to give you the key.  The costs to get the key and decrypt files can range from hundreds to thousands or even millions of dollars depending on the specifics of the attacker and victim.

Why is Ransomware used and what are the potential impacts?

Ransomware has roots in the scam and extortion criminal world, but by nature, it can also be used to target larger asset owners and organizations or to mask other activities that might be more devious.

Let’s first look at why ransomware is becoming such a challenge for industrial organizations today:

  • Ransomware takes advantage of “availability” risks and is highly profitable in industrial organizations. The business of cyber theft of personal information used to be quite profitable, but prices for that information have fallen dramatically as supply has increased. So cyber criminals have found new business models. They have shifted from the “C” in the Confidentiality -Integrity-Availability triad to the “A”. And industrial organizations require availability to operate, so the payment is usually quick and large.
  • In most cases, insurance covers a significant portion of the cost of the ransom and recovery. As a result, with current policies in place, the payment process is greased by the presence of insurance. This, however, is changing as insurers start to change policies going forward as seen in AXA’s recent announcement to stop coverage for ransomware payments.
  • Even IT attacks can shut down OT operations. Why is this so? First, OT systems are usually highly susceptible to ransomware if it gets to those systems. So, the first step in any incident response plan is to stop the spread by disconnecting OT systems. While IT systems are costly to restore, OT systems may be 3-4X as costly and may take much longer. Hence the ” abundance of caution” we always read about. Second, we in many cases operations does not solely rely on “OT” systems, but “IT” systems such as billing or supply chain software are now necessary to operate effectively. Thus, shutting down key IT systems can essentially require an OT shut down as well.
  • Why is OT so susceptible?
    • Most ransomware takes advantage of older vulnerabilities that have been left unpatched. In OT we know there are a huge number of both exploits and unpatched systems
    • Ransomware often exploits networkbased insecurities to gain access (eg, through RDP) but spreads from endpoint to endpoint. Compensating controls, system hardening, vulnerability management and other techniques such as network isolation all play a critical role in reducing the impact and spread of a virus attack.
  • Ransomware is often very effective because many organizations are insufficiently equipped to recognize (avoid) potential incidents (phishing?) Large numbers of legacy, unpatched assets often poorly monitored and supervised by a handful of non-cyber security personnel is a recipe for disaster.

To put the cycle into perspective the diagram below illustrates the typical path ransomware takes to get into a facility:

standard ransomware scenario from IT to OT

What happened to Colonial specifically?

According to published reports, part of Colonial’s immediate reaction to the attack late Friday was to enlist the services of incident response specialist FireEye. Those investigators have since attributed the attack to a prolific Russian criminal ransomware group known as DarkSide, a crew credited with around 40 similar attacks with ransom demands ranging from $200,000 to more than $2 million.

DarkSide has claimed its attacks feature a professional “experience,” focusing on providing “quality products” to its consumers.  The hacker crew claims it will only attack those who have the means to pay, or who are known to have cybersecurity insurance. The group also has been known to employ a double extortion methodology – getting victims to pay for unencrypting their data or, failing that, blackmailing them with the threat of public release of data exfiltrated as part of the crime.

By Monday, the DarkSide attackers expressed contrition for the Colonial Pipeline attack. Perhaps in response to the international publicity and the focused governmental and law enforcement efforts spun up in the wake of the incident, the hackers took to their dark website to say they never intended to disrupt public utilities.

“We are apolitical,” the hackers wrote. “We do not participate in geopolitics, do not need to tie us with a defined government and look for other our motives. Our goal is to make money, and not creating problems for society.”.

Too bad they had already collected close to $5M dollars worth of Bitcoin!

As mentioned above, the Colonial attack specifically targeted their IT systems that operate things such as billing and inventory. In fact, the ransomware never did cross over to infect the company’s OT systems. However, operations were halted anyway, due to the risk of further spread into OT.

What does this mean for OT? Are OT systems immune because they are less connected to the internet? Are they just “later in the spread” so rather than being patient 1-100, they are patients 101 and following – it’s just a matter of time? Should we focus all our resources on stopping the ransomware from impacting IT and, if we can do that, OT is safe? Is the solution more about incident response and how to protect operations from potential IT ransomware by creating redundancy for those systems or barriers that let OT run without reliance on those critical IT systems?  The questions are numerous and should raise strategic questions for all industrial operators.

How to protect against a Ransomware attack on industrial organizations?

Any security program is intended to try and ensure ‘safe, reliable, expected’ operations.  Note that 100% avoidance of downtime or incidents is not a possibility.  Rather the true measure of security is in your resiliency.  In other words, how quickly do we detect, respond and recover to a threat or activity?

And while an overall security program (like the NIST CSF, IEC 62443, or CSC18) are the proper ‘end game’ for operational security there are a few specific security controls you can emphasize directly related to ransomware.  They are listed here with some very specific ‘OT Notes’ where application of these practices are more challenging particularly due to the nature of OT.

How to Protect Against Ransomware in your OT Environment

Know how an IT attack can impact OT, build clear incident response gameplay, and prioritize risks to ensure as little impact on operations as possible in case of emergency.

  • Well-defined maps of potential threats and impacts. One of the biggest questions is the risk levels and priorities of assets and systems. What systems are tied to what systems, not just technically but operationally? The great news is many industrial organizations already have disaster recovery plans. We need to extend those to cyber events so we understand what we can disconnect, what we can keep operating, etc. This is key as attacks can spread from IT to OT so easily.
  • Risk prioritization: These exercises then can determine the true crown jewels – which systems are the lynchpins to operations, all the way down to the individual servers etc. This then allows the organization to prioritize risk management on those systems and add extra layers of security to protect those key assets.
    • OT Challenge: OT specific policies and procedures – Most IT tools and behaviors MUST be modified to provide similar effects without disrupting OT. This type of balance requires significant knowledge of both security practices but also Operational awareness
  • Robust backup and recovery: Expanded backup coverage and frequent snapshots (more hosts): The more hosts that are frequently backed up SECURELY, and assuming an adequate pipeline to get systems back those backups (e.g., enough network bandwidth), the faster you can recover from a ransomware attack. However, you must ensure the vulnerability is mitigated or the host is isolated when the backup is restored, or they may become re-infected.
    • OT Challenge: Legacy systems, lack of bandwidth and need to track multiple backup solutions/products in most OT environments makes management difficult
  • Have offline backups of critical assets: Offline backups as a resilience or disaster recovery strategy is critical to ensure your most important OT assets are protected or can be readily restored if your infrastructure is down. This includes PLC logic code, configuration, documentation, and system images/files. It may sound expensive, but it is often accomplished with securely encrypted USBs that are periodically rotated such that file integrity is maintained.
    • OT Challenge: Complexity of OT environments, number and variations of source code type, location, etc – requires a wholistic backup and recovery program
  • Regularly have “cyber fire drills” to test backups and their recovery: Again, I cannot stress this enough, a frequent training regime should be absolutely applied for OT and cyber-related events. Forensics, failed hardware, shutdowns, etc. should have at least an initial note for cyber, just to ensure it was not cyber-related, and if so, a chain of custody and due diligence can be assured. Secondly, it is important that your resources know what to do when there is an issue, so this is another way to double-check processes while improving the likelihood of a quick recovery.

Endpoint Management

As stated above, one of the reasons organizations use an “abundance of caution” and shut down their OT processes is the fundamental endpoint risks on these assets.  While we might like to avoid this hard topic, the reality is that resilience requires more secure OT endpoints.

The first question in this effort (as well as in beginning monitoring for potential threats) is ‘what do we have and how is it configured’?  In other words, you need to know about the endpoints in question.  To do this you need many items but for starters the following are fundamentally required:

  • Asset inventory:Effective endpoint management begins with a robust asset inventory. As the age-old saying goes, if you don’t know what you have, you can’t manage the risks. A rich view of a 360-degree picture of each endpoint enables proper endpoint management.
    • OT challenge: Incorporating an automated asset inventory that includes all asset types from OS based to networking but also embedded with deep asset profiles including set criticality, users and accounts, presence of compensating controls, etc.
  • OT systems management:OT asset inventory is only the beginning of a robust endpoint management program. A robust OT Systems Management program includes configuration hardening, user and account management, software management, etc. In many cases, OT systems are insecurely designed and unpatched, making it ripe for ransomware.
  • Patch management: Most threats enter through commodity systems such as Windows machines. You cannot patch everything in OT, but an end-to-end patch management program(i.e. automation and intelligent application of patches) is of great importance due to several environmental factors such as compliance, legislation, and risk management (e.g., patches on hosts with RDP or firewalls connected to the Internet should be prioritized over a PLC protected by several layers). Where unfeasible, application whitelisting, and policy enforcement makes an attacker’s life very difficult to improve your chances to defend or deny a ransomware attack on your OT organization.
    • OT challenge: need to have a prioritized patching process and move to compensating controls when/where necessary.
  • Removable media:USBs, removable media, and transient devices are other forms of low hanging fruit, especially if your network is “air-gapped” or heavily controlled. Users WILL bypass your controls by way of removable media. As a best practice, system policies are easily deployed, whitelisting software used, registered secure drives, and other technologies such as 802.X ensure authorized systems are allowed on network segments.
    • OT challenge: Enumerating, applying, monitoring and enforcing removable media policies as well as extending to transient cyber assets

Monitor network, system and application logs for anomalies

An attack often has precursory elements that indicate an infection.  However, it could indicate a vulnerable system that is amidst an attack or is about to be compromised giving your defensive team an advantage to prevent a wide-scale infection or attack. One way of doing this is with what is called a “Canary ” that places a system in the network that acts as the “canary in the coal mine” and alerts as the ransomware is impacting that endpoint allowing you to more quickly react.

  • OT challenge: providing ‘OT context’ to traditional SIEM and alerting tools
  • Monitored external attack surfaces: Many attacks are successfully accomplished due to a misconfiguration or an inadvertent hole caused by a gap in change management. It is a best practice to monitor for exposed services (e.g., Shodan).

Access Control and network segmentation

Stopping the spread of ransomware often comes down to placing firebreaks in its path. These can be in the form of network protections such as firewalls or other forms of segmentation or strict access control.

  • Implement network separation or segmentation. One key way to slow the spread of ransomware is to place network barriers between IT and OT (or even within segments of IT and/or OT) networks. This approach is a foundational element but one, because of its technical challenges, often underutilized.
    • OT Challenge: segmentation is not easy on IT or OT but in OT particular challenges arise due to legacy equipment, need for physical cabling, the downtime required to move systems onto new firewalls, etc. OT segmentation requires a team with deep knowledge of networking and the OT systems themselves.
  • Isolate systems based on software, user role, and function: To protect systems compromised through remote access, local Windows networking flaws (e.g., print spool or SMB/NETBIOS), or Office/Acrobat, isolate them based on function and ensure unnecessary software is NOT included in standardized golden images or the same AD server is not serving policy for IT and OT. This also applies to user-based accounts; if an HMI is an HMI, treat its operator as an operator, not as an administrator.
    • OT Challenge: Finding, profiling and securing these types of controls – ability to correct and enforce baselines
  • Technical Diversity between zones or systems: Consistency across systems has scaling advantages, but when a single vulnerability affects multiple products this strategy grounds your entire operations if exploited. Barriers such as a VPN with 2FA, a remote access terminal server, and multiple firewall vendors exponentially increases the efforts it would take for an external attack to be successful.
zone and conduit network
Example of a zone and conduit network and acceptable vs unacceptable connections

Conclusion and success stories

Improving these five categories reduces the risk and impact of a ransomware attack, leverages existing technology investments, and improves recovery in the event of a compromise. Each of these add successive protections and safeguards against a possible ransomware attack.

OT-specific challenges are identified in this document not to show that a robust OT security program is unattainable or improbable but rather to help the reader identify key decision points that will help a successful program to achieve maximum protection with minimal challenges.

The application of ‘IT-like’ security controls in OT is increasingly being achieved in numerous industries, companies and countries around the world.  But the true measure of success is in the maintenance and monitoring of their initial efforts.  The companies that are significantly improving their security posture are acknowledging the unique challenges of an OT environment and making decisions such as:

  • Building robust, 360-degree asset views
  • Incorporating multiple functions into a single platform
  • Tying together IT and OT skill sets at an enterprise level to review, monitor, plan and execute systemic security controls
  • Automated data collection and remediation tasks
  • Partnering with proven OT safe software and services vendors/consultants

To learn more about options for your operational environment look no further than your local CS2AI chapter for access to a significant body of OT professionals, best practices, research, resources and insight.

Ransomware Webinar

Download the on-demand webinar to hear Ron Brash and John Livingston discuss best practices and use cases to protect your OT organization against ransomware.

Webinar Download

Related Resources

Blog

Colonial Pipeline Attack: Lessons Learned for Ransomware Protection

How to leverage lessons learned from the Colonial Pipeline ransomware attack to prepare for cyber-related threats in oil & gas.

Learn More
Blog

TSA Pipeline Cyber Security Directive is a Strong First Step

Following the Colonial Pipeline ransomware attack, DHS and CISA have released a new Security Directive for critical pipeline operators.

Learn More
News

TSA Pipeline Oversight Faces Scrutiny After Colonial Hack

A ransomware attack that shut down the Colonial Pipeline has raised questions about lax federal oversight.

Learn More

Ransomware Data Sheet

Defend your critical infrastructure against the threats of targeted and nontargated ransomware with comprehensive protection.

Data Sheet Download

Subscribe to stay in the loop

Subscribe now to receive the latest OT cyber security expertise, trends and best practices to protect your industrial systems.