Industrial cyber security leaders – including the C-suite, CISOs, security teams, and operational leaders – are increasingly realizing the potential financial, operational and safety impact of cyber events. Attempting to get their hands around securing this challenging part of their networks, many leaders have kicked off efforts by separating their IT and OT networks, gaining visibility into the arcane world of OT assets, or gathering data from the OT networks into incident detection processes to identify potential threats.
Some require specific security actions based on regulatory structures. Activity is bustling. Meetings, planning, network architecture discussions, POCs, etc. are keeping teams incredibly busy as they also try to keep their plants operational in a world of declining resources and COVID-19 limitations.
We must stop and ask fundamental questions: Are we making progress? If we aren’t a victim this week, month, or year, are we successful? Are we wasting money or spending too little? We must start treating OT cyber security with the same set of objectives, metrics, targets, and performance management that we treat operating a plant, railroad, or power grid.
Let us offer a point of view which we welcome others to add to or provide alternative perspectives. We believe there are two primary objectives of OT security that deliver the ultimate objective of reducing potential impact to OT operations:
- Reduce risk
- Respond to threats
We know this is simplistic. This is just restating the obvious. But, in fact, we would argue this foundation begins answering questions from the top of the organization: How do we know if we are making progress or being successful? Are we actually improving our risk posture? Are we equipped to respond to a real threat or just detect anomalous behavior?
Many industrial organizations want “visibility” or “detection” but aren’t clear on the ultimate objective or how to measure it. If we get a lot of detections is that good…or bad? If we have visibility, have we increased our security? These two core fundamentals and the key components of each help determine the best path.
3 Steps to Reduce Risk in Industrial Environments
Create a real-time view of the risk status of the OT environment
The first step to reducing risk is risk awareness. Most organizations start this journey with a vulnerability assessment of their OT environment, then estimate the potential likelihood and impact of each potential risk. This is a necessary, but insufficient step.
A one-time or infrequent assessment is outdated immediately AND makes it very difficult to track progress in reduction over time. Success in risk reduction requires a constantly updated view of the risk.
Take remediating actions to reduce risk
Risk reduction requires executing specific actions to reduce those specific risks. If the assessment identifies risks from unpatched systems, insecure configurations, dormant or insecure accounts, and users, poor access controls, etc., the next step must be to reduce those risks.
Actionability requires the organization to manage its OT endpoints. They must take back control from vendors and ensure configurations are hardened, network devices are updated and properly configured, users and accounts are cleaned up, etc. These endpoint actions are an example of why risk detection is not enough, and why they must close the loop to remediate the risks.
Track and report on operational excellence
The great thing about securing operational environments is that the leadership and staff are comfortable with rigorous operations management. Security requires the same kind of operational excellence as manufacturing or supply chain. Foundational to operations management is the tracking of performance on critical metrics and reporting on performance. Whether it be “red to green” dashboards or % completes, a strong risk reduction program establishes clear metrics and monitors them over time.
This reporting should also include who is responsible for each metric. In security, this requires having uncomfortable conversations with operational leaders of their personal responsibility for maintaining and improving the overall risk profile of the OT systems.
3 Elements for Effective Threat Response
Defined response process and plan
Incident response plans are common in almost every cyber security standard because they are critical in the ability to stop a potential attack in real-time. But many incident response processes stop with a set of high-level procedures or policies such as whom to call when you see an issue, how to communicate with authorities, and who to use as an incident response vendor.
In the past several months, the industry has seen first-hand that incident response plans need to be much more detailed and specific to the individual IT-OT environment. The Colonial Pipeline event highlighted the risks of limited response planning in the OT environment. The solution to their ransomware involved shutting down operations. This may have been a necessary step, but the key to a strong incident response plan is to identify the Least Disruptive Response (LDR) for each threat. The LDR is built by understanding the specifics of the OT risk posture (part of step one of Reducing Risk). To define the least disruptive response, the organization needs visibility into the risk posture of each asset and knowledge to reduce the impact of different types of threats. This goes beyond the paper-based “who to call”.
“XDR” is a growing security industry buzzword to define the broad telemetry required to contain modern threats. In OT, “XDR” is often thrown out because of the risks from automated response actions. But we should not throw out the concept of “X-dimensional” detection. This refers to gathering a wide set of data from the OT systems – endpoint logs, user behavior, network flows, firewall logs, even physical process alarms – and using integrated analysis to identify potential threats. In the IT world, no security leader would accept a single form of telemetry such as packet inspection as the answer to detection. We shouldn’t in OT either.
Integrating these various forms of telemetry also reduces the false positives that cripple the SOC teams and keep them from responding to the most critical alerts.
OT-safe, rapid, least disruptive response
As mentioned, organizations need plans for the LDR – least disruptive response. But they also need to implement response actions in a rapid, but operationally safe fashion. A response plan is only as good as an organization’s ability to execute it in the heat of the moment. The plan should be backed with the people, processes and technology that allow the security team (including both security experts and industrial process experts) to take the security actions necessary to stop the threat. This would include: removing a specific user, changing passwords, eliminating certain ports and services, patching a system, etc. Too often in the world of OT, these steps are manual or require vendor involvement amid an event. For rapid response, the industry needs the ability to take targeted response actions when necessary.
These response actions should be managed by a team of security and operational personnel. Unlike IT where automated response is becoming the norm, OT believes that response requires a human to review the potential threat as well as the potential negative operational impacts before executing the action. We call this the “Think Global: Act Local” approach.
Organizations are reacting to the emerging threats to OT security and beginning to take action. This is great news. However, we all need to step back to determine what the overall objectives are and how to ensure we are actually making progress against the two key elements – risk reduction and threat response – before taking actions that may not lead to true security improvement.