4 Components to Rapidly Improve & Measure OT Security
Here's how to demonstrate progress and improvement on key security metrics over time in OT cyber security environments.
Learn MoreSubscribe to stay in the loop with the latest OT cyber security best practices.
Over the past thirty years, Verve has worked with industrial organizations to increase the security and reliability of their operational technology. Through our security and automation and network design services to our software platform, the Verve Security Center, we assisted dozens of organizations improve their performance.
But one of the biggest challenges organizations face is developing a robust business plan that creates the right momentum, focus and budget required to truly make measurable progress against these threats. The following whitepaper explains an effective, cross-industry approach to create the right commitment across an organization. The whitepaper is structured in an example form so that you can tune it to your situation.
This section is not company-specific. It lays out the overall industry trends as to threats to the environment, regardless of the specific risks born at the company level. These trends are all pointing in the wrong direction from a security point of view and act as a strong foundation for awareness building.
In this section, we bridge from the industry threat landscape to the specific risks the company faces, based on detailed information gathered through what we call a “technology-enabled vulnerability assessment” (TEVA).
Once the risks are clear, the business case includes a set of prioritized portfolios of initiatives that are sequenced appropriately based on impact and initiatives that form the foundation of what is to come. The business case takes certain elements and steps first and builds foundational elements that can be added to over time to drive efficiency and the fastest overall impact.
Build a “Think Global: Act Local” organization model. The biggest challenge identified in ICS security is the lack of skilled resources. The organization design is a key element of any business case and must address the cost and resource constraints of distributed, sensitive OT environments. Verve has developed an approach called “Think Global: Act Local” that delivers 70% lower cost while ensuring operational resilience.
Lastly, we need to demonstrate ROI and maintain it over time. Often this is done through annual or less often assessments. These are usually late and often very expensive. The key to the efficient business case is the technology-enabled assessment platform that is updated in real-time, allowing the organization to monitor ROI.
Industrial organizations are under increasing challenges from cyber threats and the resulting reporting and regulatory requirements. We refer to this as industrial organizations being under an AIR-RAID of cybersecurity challenges:
And just at the end of 2021, the Log4j vulnerability dramatically increased the challenges for OT cybersecurity personnel. This is an insidious vulnerability as it exists within libraries used by a very large number of software vendors. Finding that library on sensitive OT systems is incredibly challenging. Unfortunately, this is only the latest in a series of challenging risks and vulnerabilities which have been arising ever more frequently.
All of these point towards the urgent need to make demonstrable improvements in OT cyber security posture. Increases are necessary to ward off the increasing threats. But they also need to be demonstrable – measurable – to satisfy insurers, regulators and directors of the continuous improvement that the organization is making towards its goal.
This is now coming to a head. The idea that it will take six months of assessment and planning followed by 18-24 months to deploy hardware (taps, span ports, firewalls, etc.), harden endpoints, deploy robust backup solutions, and create a robust vulnerability management program for OT is not acceptable anymore.
While industrial organizations broadly are under attack, the company has specific risks due to its systems, architectures, operations, etc. To understand the specific risks this company faces, the next part of the business case is a “Technology-Enabled Vulnerability Assessment” (TEVA) which identifies detailed, fact-based risks to company operations, safety, etc. from potential cyber threats.
The TEVA begins by establishing a standard against which to measure the company’s current OT cybersecurity risk. In this case, we’ve used a combination of the NIST Cybersecurity Framework and IEC 62443 as two standards. We’ve integrated those two to provide the right type of guidelines and get an overall view in this CSF, but also get very specific on some of the OT-specific controls or standards that are out there.
The TEVA uses technology to gather detailed information on each asset in the environment to develop a comprehensive risk view across the full range of NIST requirements – Identify-Protect-Detect-Respond-Recover. In this case, the company’s risks are very high – essentially equating to a 1 or 1.5 on a NIST scale of 1-5.
It is critical that we break down the particular risks by facility and endpoint to allow for the later development of a practical, prioritized roadmap. The foundational risks are that the company has no visibility into the assets, software, network connections, etc. across its environments. Manual spreadsheets are out of date, and the technology assessment captured 40% more assets than were accounted for on the spreadsheets. Furthermore, a lack of proactive vulnerability management and patching in OT means that the only real line of defense for protection is the network firewall protection or user & account management.
Unfortunately, in our case, the vulnerability assessment also discovered through analysis of the firewall and network rules, that many of the intended protections did not exist in practice, New rules have been added to allow vendor or remote access, dual nic’s have been deployed to allow certain communications, etc. So the “air gap” we depend on no longer exists, and the entire inside of the network is widely accessible.
User & account management is also weak. The TEVA found hundreds of dormant and misconfigured accounts across the environment, many with administrative access to various devices. The lack of consistent configuration and user management means that the OT endpoints are at significant risk of lateral movement using weak security settings.
The TEVA has also identified significant gaps in detection capabilities without consistent log collection or analysis, weak backup and restore procedures, many devices either lack backups at all or are backed up in central, enterprise repositories meaning that restoration would take days or weeks in case of a significant ransomware threat. Finally, policies and procedures around items such as incident response, change management, etc. are not well-defined or shared.
In short, the tech-enabled vulnerability assessment defines a set of significant risks across the environment.
We use this information to define those assets at the greatest risks and posing the greatest threat so that when we build our roadmap, we can go after those that are at the greatest risk. We looked at specific threat vectors and the financial impact of those vectors. This analysis demonstrated that we have multiple likely scenarios where the operational impact could be over $1 million. We also identified over ten potential significant life safety issues due to risk in the environment.
These scenarios looked at specific actions an attacker might take. It could lock up a server, and the company wouldn’t have control or monitoring. So it would likely shut down the plant because of a loss of safety monitoring.
For events like ransomware, we analyzed how long a disruption would last and what we might do in response. Do we have backups that we could restore from? How long would it take us to do that? What this does is tie these threats back to the operational index.
We also look at potential spread across facilities. 36 of the sites are networked together into a single flat network, meaning if ransomware gets into one of them, the chances it can spread are very high. Therefore, this $200,000 per plant per day potential operational impact multiplies very quickly.
The assessment identifies the key risks. The business case then transforms those into a set of prioritized initiatives with underlying human resources and technology required to achieve success.
The initiative plan prioritizes actions by the value of the site initially. Different facilities obviously produce different values of goods. However, beyond that, some facilities may be key suppliers to other facilities or customers. These may rank higher in importance than their direct financial output. Similarly, others may pose greater life safety risks. The business case combines the operational impact with the cyber security risks to define the right sequence of locations to focus on – essentially a bronze, silver, and gold type scheme, where different sites receive greater focus depending on their risk.
One of the keys to their business case is investing in core foundational security elements first to build on over time. The initial investments are in the beginning components of the NIST framework around identification and visibility across the fleet. The good news is that by using the technology-enabled assessment, the company is already ahead in many respects.
Leveraging a technology-based platform to gather asset inventory brings it full circle and becomes the foundational layer of the entire program. This allows us to do vulnerability management, patch management, configuration management, user management, malware maintaining or whitelisting, etc. from a single solution. It brings in telemetry data for intrusion detection, backups, and incident response. Investing in a multifunction platform saves the cost of multiple people at every site managing security on their own. And that platform is built for a 360-degree assessment for visibility into all risks.
In the remediation phase, the platform enables action so there is no gap between assessment and remediation. The minute we’ve assessed the situation, we immediately remediate those risks which accelerate time to security.
This creates the foundation of a broader portfolio of initiatives. The below chart is an example of a high-level portfolio.
The initiatives are defined over time with different types of plants receiving different levels of security due to their risk.
The roadmap is enabled through a vendor-agnostic OT systems management platform. One of the biggest challenges in industrial security is the various toolkits required. In many organizations, not only are plants or facilities each pursuing their own security initiatives, but each OEM vendor has its own suite. As a result, the costs of remediation grow and the management of security maintenance becomes unwieldy.
The business case is based on an integrated platform that provides OT systems management across the whole fleet. The team spent a lot of time analyzing different approaches. It looked at using multiple tools, independent tools, one for vulnerability, one for network, one for backups. And then we’ve analyzed an approach where we bring all of that together with the same deployment in a single platform. A single OT systems management platform saves approximately 30% in upfront licensing and deployment. In addition, it acts as the foundation for a much lower cost maintenance model as described below (Think Global: Act Local) that can save up to 70% of the labor costs of maintaining security over time. And perhaps more importantly, the integrated platform improves security by enabling visibility into how all portfolio initiatives work together and avoiding the dreaded gaps between tools and initiatives.
This business case isn’t solely reliant on technology or the security team. We must align both our security resources and our very sparse OT resources to enable the kind of protections we need in this environment. This is enabled in the technology model described above.
According to KPMG, the number one challenge (57%) of OT cyber security is a lack of resources. This is driven both by the uniqueness of the OT systems, but also because of their distributed nature. To achieve the business case, the company needs to scale work without taking inappropriate actions on the process equipment that might cause operational disruptions.
The model in this business case is “Think Global: Act Local”. The TG:AL approach enables the company to centralize the analysis portions of security (“think”) while ensuring expert process-level personnel approve any actions taken to manage the assets prior to their deployment. For instance, for vulnerability and patch management, the central team has information on all missing patches and known vulnerabilities across plants. It can prioritize the greatest risks based on the criticality of the asset and the vulnerability, as well as make trade-offs based on compensating controls. The small, central team designs playbooks and action plan to distribute to each facility.
At this point, the “Act Local” part comes into play. Before the actions are run, the local process engineers confirm the action and timing. Then with an automated set of actions, they execute those actions. The resulting reports are automatically seen centrally to confirm the action is taken. This ensures no patches are deployed from 2,000 miles away potentially disrupting operations. By the same token, it reduces the necessary labor by ~70%.
This is how to balance the OT need for sensitivity to the process with our security need for asset management. So the business case is designed for centralization, providing a 70% cost savings in the overall approach.
This chart shows how the plant executes actions and software admin activities. The engineering teams at the sites provide input whether it be adjusting network communications and rules in our firewalls or patch deployment.
That group is involved in training and works closely with the site-level people to execute the central team recommendations. And then finally, that smaller team centrally analyzes endpoint network data to identify vulnerabilities. They’ll build those playbooks and the automation scripts that will be distributed for incident response activities.
So what does that mean in terms of hours and dollars? Our organization model at a 50 plant model is rough as it’s laid out here. At the plant level, we’ve broken this into the maintain, the remediate and the maintain components, and we’ve broken down the number of hours in each one of those. The remediate step is where we see improvement on the base level of security and maintain that as time goes on.
The key to this approach is the vendor-agnostic OT systems management platform described above. To accomplish the Think Global: Act Local model, a vendor-agnostic platform with visibility across all facilities and types of assets combined with the ability to control actions locally is critical.
The key to the company’s OT cybersecurity program is the ability to track the performance of our security on a regular basis. The technology-enabled assessment platform provides real-time updating on the status of risks across all endpoints and network elements. Our business case is founded on the premise of efficient labor through the TG:AL approach and the tracking of metrics moving from “red to green” over time.
For instance, as new patches are deployed, vulnerabilities released and remediated, users and accounts cleaned up, firewall changes monitored, etc. the team immediately reports on its current status to demonstrate the improvement expected.
We recognize that OT cyber security is an investment. This approach to monitoring and tracking results on a real-time basis enables management to see how that investment is improving security metrics. These metrics are critical both for internal security as well as for insurers, regulators and boards of directors.
This business case identifies the key risks across the industry and specifically within the company environment. The team has developed a prioritized set of initiatives as well as an organizational model to achieve a low total cost of ownership, while monitoring and demonstrating ROI results over time.
Here's how to demonstrate progress and improvement on key security metrics over time in OT cyber security environments.
Learn MoreEnhance security with effective OT asset risk prioritization strategies and discover insights for optimized risk management. Read more now.
Learn MoreExplore the crucial 4 key elements of OTSM for enhancing cybersecurity and reliability in connected industrial systems.
Learn More