Table of Contents
Industries such as manufacturing, energy production, power distribution, water treatment and supply, transportation, and healthcare all rely on a highly specialized collection of technologies — referred to as “operational technology,” or OT — to produce, move, heal, clean, and otherwise support the critical processes that are the pulse of their endeavors. These industrial systems are increasingly under attack, targeted by malicious actors with varying levels of skill and diversity of motives. Modern menace in OT ranges from ransomware and IP theft to vandalism and full-blown acts of cyberterrorism that can disrupt critical infrastructure, damage systems and facilities, and injure people.
This guide provides a holistic perspective on the processes, policies, and technologies that together provide protection and defense of operational technology assets and processes from cyber threats and attacks. It includes foundational elements of OT security relevant to those just learning about the space, descriptions of various standards available to system defenders, along with deeper dives into specific elements of OT security. This primer provides links to more detailed coverage topics beyond the scope of this holistic guide. We hope this provides a jumping-off point for those who need to build their OT security foundation.
Understanding The Many Facets of OT
As the name suggests, operational Technology describes the combination of technologies that support and enable industrial operations. OT encompasses a variety of systems from a wide array of industries ranging from transport (rail, maritime, etc.) to logistics (ports, warehouses, etc.) and many more.
OT also covers so-called “cyber-physical systems,” the set of technologies responsible for monitoring and controlling real-world physical processes.
Basic systems in modern OT:
ICS: Industrial Control Systems
ICS includes a wide range of systems sometimes referred to as “factory automation” or “distributed control systems”, and typically include DCS, SCADA, and IIOT. Industrial control systems act as the interfaces to manufacturing, process management, rail or maritime transport controls, and other similar functionality.
DCS: Distributed Control Systems
A subset of ICS, DCS describes complex systems in discrete or continuous manufacturing environments that help control and manage production facilities. Functions such as power generation, manufacturing, and refining often have significant OT assets in a single, geographic site.
SCADA: Supervisory Control and Data Acquisition Systems
SCADA systems operate as an overarching data network that captures inputs and outputs of an industrial process and facilitates system monitoring, analysis, and control. SCADA systems collect data from widely distributed I/O devices across a large geographic footprint. Processes such as electric transmission, pipelines, and rail all typically deploy SCADA technology.
Buildings and Physical Access Controls
OT also includes systems that control physical facilities. This includes elevators, HVAC systems, lighting, and other physical elements. Building and access controls also include security cameras, swipe cards, electronic door locks, and similar systems. Such building controls use proprietary protocols and employ approaches very different from the industrial systems mentioned above.
IIoT: Industrial Internet of Things
Sometimes considered a subset of ICS or SCADA, IIoT warrants its own category because such devices often are not connected to the controls network, instead operating over public or private wireless networks, a distinction that raises unique security challenges.
These come in two varieties: devices that control on-site medical devices providing various services to patients either in hospitals, homes, or doctor offices(MRI scanners, IV pumps and the like); and consumer health devices such as pacemakers, insulin pumps, and prenatal monitors.
Four types of OT devices
Securing OT systems is challenging in part due to the wide variety of device types deployed on OT networks. Servers, workstations, firewalls, diodes, remote terminal units (RTU), relays, I/O devices, IIoT sensors, cameras, and backup power supplies are just a few of the thousands of device types that comprise modern OT environments.
From a security perspective, it’s helpful to organize the plethora of OT constituents into four broad categories:
Servers, workstations, HMI’s, and more.
These typically run traditional commodity operating systems such as Windows or Linux and are used for a variety of control and reporting tasks from domain control to operating critical process application software. They may also act as historian servers, which gather and forward data to enterprise data collection.
In addition to traditional IT-style switches and firewalls, OT systems include specialized networking equipment such as industrial firewalls that control traffic using industrial protocols. These purpose-built devices run proprietary embedded operating systems from the networking company manufacturers.
Embedded control devices
Here the list grows significantly with the enormous diversity of control devices. The collection includes PLCs (Programmable Logic Controllers), distributed control systems controllers, remote terminal units, protective relays, machine controls of manufacturing devices, physical access controls such as swipe cards, and a wide range of medical devices which control inputs and outputs for medicinal dosing or bio-regulation signals, for example. These devices run proprietary, embedded operating systems developed by manufacturers, often built on commodity components with custom elements.
I/O: Input/output devices
While the list of control devices is long, the roster of I/O devices is practically limitless. I/O is sometimes integrated with the control, but here we separate pure I/O devices, which provide inputs to or outputs from processes. These can be cards in a PLC rack, cameras, pressure or temperature sensors, and thousands of other types. Like embedded control devices, these devices also run on proprietary manufacturer operating systems often built on commodity components with custom elements.
Defining industry-specific OT
Process controls (continuous manufacturing rather than discrete)
Examples: Power generation, chemicals refining, pulp and paper, many consumer-packaged goods, water/wastewater
These systems depend on integrated systems that control a wide range of inputs and outputs for end-to-end process management. Often managed via distributed control systems, these processes require precise inputs and outputs throughout. The control systems must adjust in real-time in response to readings from I/O devices while also guarding against unintended changes that could cause catastrophic physical damage or harm the product itself.
In many cases, these processes integrate different types of controls – the core process itself, other systems sometimes referred to as “balance of plant” – this could include environmental controls, safety controls, water treatment, measurement of outputs such as vibration or temperature. While these systems frequently include equipment from different OEMs specific to their functions, the entire system needs to work together.
Risks in process control environments can be extreme. Many of the best-known OT cyber security incidents targeted such systems. Examples include Stuxnet, which targeted Iranian centrifuges, along with the multiple attacks on water treatment plants like the 2021 attack in Oldsmar, Florida, the attack on the Saudi Arabian petrochemicals facility often referred to as Trisis. All these incidents either threatened or achieved significant physical damage to the facility or to the output of the facility.
Examples: Automotive, electronics, and many other manufacturing industries
Here OT systems manage specific steps in a manufacturing process. Often they are controlled by PLCs programmed with “ladder logic” or a set of commands that perform a set of functions such as turning a cutting tool, then picking up the part, and changing its angle for another cut. They can be simple sets of commands or complex routines with thousands of commands built over years. In many cases, such controllers are strung together to operate as a cohesive series of stages of the manufacturing process.
These systems often have many I/O devices feeding data back into the controller and then acting on the commands from the controller to adjust the process as needed. These devices might be networked together or they may operate as stand-alone cells in a process.
Risks associated with these systems involve potential damage to products or disruption of production resulting in financial loss. In addition, if the attack is targeted to a particular part of the process or impacts robots or other mechanical devices, it could cause physical damage to the plant or harm humans as well. Finally, in sensitive manufacturing operations, discrete systems often contain classified or sensitive information like intellectual property that could be compromised in an attack.
Examples: Pipelines, electricity or water distribution, transportation
These systems are characterized by physically distributed controls that require wide-area networking (WAN) capabilities to maintain visibility and control. Distributed controls rely on a range of networking devices and types to gain that visibility and they’re typically used to control valves, protective relays, meters, and the like.
Cyber risk related to these systems ranges from disrupted operations to damage to physical equipment. Impacts can be significant – from shutting critical valves on a gas pipeline providing the necessary fuel to power plants to disabling protective relays which could stop power distribution across the grid.
Perhaps the most personal of devices are those used to capture medical images, manage IV drug delivery, and regulate vital health metrics such as heart rate. These systems are often integrated into larger medical information networks connecting to personal medical information that contains sensitive patient data. In most cases the devices themselves are independent, however, not providing an integrated process like those used for chemical manufacturing or the power grid.
The risk from attacks includes physical human damage from inappropriate changes to inputs or outputs of the systems connected physically to patients. Also at risk are sensitive medical records and other PII should an attacker be able to pivot from the device to the organization’s data stores.
What is cyber security?
An ultimate guide to OT security must include some collection of general cyber security components that can be translated into the specifics of OT. We won’t try to cover this comprehensively here, however; there are hundreds of articles that outline many detailed security programs by function, section, industry, regulation, or standards body.
OT cyber security is a constant challenge with ever-changing threats, perennially increasing vulnerabilities, and evolving attacker business models.
Despite that, the core elements of cyber security remain foundational. While there are dozens of frameworks available, perhaps the simplest is the National Institute of Science and Technology (NIST) Cyber Security Framework. NIST is the U.S. governmental body that establishes standards for a wide range of technologies; its CSF adroitly takes multiple, established standards and maps specific components to a set of five common functions.
The five functions of the NIST CSF:
The Identify function includes requirements for aggregating inventory and categorization of a company’s technology assets, networks, and risks into a comprehensive assessment. One major priority of this component is the identification of all assets, software, and users on the network. This is fundamental to cyber security. As often stated, “you can’t protect what you can’t see”. In OT, this is among the most basic challenges given the range of networks.
NIST’s Protect function covers the defense of assets, networks, and information. This protection takes different forms, from endpoint to networking to information to access controls and more. Core to the mission of the protect function is the ability to stop potential threats before they are able to gain unauthorized access, exfiltrate information, disrupt devices and processes, or install malware.
In the Detect function, the NIST CSF gives guidance for spotting potential threats, actions, or events in order to give defenders time to respond before attackers gain a foothold and inflict serious damage. Proper detection requires the ability to recognize threats at multiple stages of an attack including as malicious actors hit the network, after they’re already present in the system, and as they attempt to pivot across the network. Detection requires not only the ability to monitor information coming from a range of sources but also the ability to analyze patterns of behavior to spot the potential threats in a sea of data. While detection is well-established in IT it is an entirely different challenge to collect and analyze appropriate data in OT.
Detection is useless if the organization does not have the ability to respond. The Response function comprises a set of actions defenders can use to react to the information that emerges from a detected threat or anomaly. The response includes both the further analysis required to understand whether something that was detected is a true threat, as well as the ability to act to stop the threat or, at least, minimize the damage. The response also should include the ability to interact with stakeholders when an attack is successful to alert them and provide information on what they should do.
It is often said that cyber-attacks are a matter of “when” not “if.” The reality is that attackers are well-funded and innovative. They only need to be right once, and defenders need to be right constantly. As a result, recovery is a critical part of a robust cyber security program. NIST’s Recover function covers rapid restoration of operations and data in the wake of a compromise. Even when an attacker fails to cause an outage or steal data, all devices, networks, and information need to be restored to a known-good state from a point in time prior to any successful intrusion.
Differences between IT and OT cyber security
OT systems differ significantly from IT systems. First, the devices themselves create challenges for traditional IT security processes and technology. A sample of devices includes old versions of Windows such as Windows XP or Windows 7, a wide range of embedded devices such as PLCs, controllers, relays, sensors, industrial (and traditional IT) networking equipment, and more.
These devices require a different approach to security from the modern, updated, OS-based, or cloud-based devices in today’s IT stack.
Second, the protection priorities differ greatly between IT and OT. IT cyber security efforts are guided by, priorities the well-known C-I-A triad. In order, the priorities are:
- Confidentiality: Systems and data t are protected from unwanted or unauthorized access
- Integrity: Systems and data are accurate, appropriately tuned, and verified
- Availability: All systems and data are stable, online, and ready to function
In OT cyber security however, the greatest risks are to the safety of people and property which are protected by OT safety and process control systems (availability) followed by integrity. Information confidentiality, while important, pales relative to the others. As a result, OT risk management must also adjust accordingly.
Unlike IT systems that value confidentiality and integrity first, OT systems are better served with a risk management approach known as Safety-Reliability-Productivity, or SRP. Priorities here include::
- Safety: Covers activities designed to ensure the safe operation of a facility. This may include the physical safety of employees or citizens close to the industrial operation. Safety is a top priority because many industrial processes have the potential for catastrophic harm to life and property — chemicals can explode, heavy machinery can fall or change position quickly, robots can injure employees, trains can derail.
- Productivity: After safety comes concerns over the risk of slowed or disrupted operations as a result of a cyberattack. Attackers can manipulate PLC programming to slow production runs or impact cold storage chains causing certain “lots” of product to have to be thrown out or worse.
- Reliability: The recent rise in ransomware attacks on OT operators demonstrates the importance of system reliability. Such malware attacks cause significant plant disruption resulting in deep financial losses.
While manufacturing may not seem an obvious target, eight of nine attacks on manufacturing organizations last year caused shutdowns across multiple plants. The expensive and well-publicized disruptions hackers leverage to ask for large sums of cash — up to $10 million in some cases —especially in industries where cyber security insurance policies are common.
Specific safeguards and responses: the real OT difference makers
Differences in core devices and risk profiles aside, the most meaningful delta between IT and OT from the security practitioner’s perspective resides in the specific knowledge of control systems and security required to manage an OT security program and respond to attacks in the industrial environment as they happen.
Incident detection and response in OT demands a specialized understanding of the unique systems affected. IT systems are commodities with functions grouped and analyzed with a wide range of readily-available detection rules. Incident responders in IT have the benefit of safe, effective,well-documented actions they can take uniformly and automatically when trouble strikes. With industrial control systems, however, system behavior is unique – often to that process.
In OT, response must be measured and handled in a way that does not cause additional harm by stopping expected operational processes inappropriately. Remediation tasks addressing vulnerabilities and insecure configurations also require sensitive approaches on OT systems. Patching, for instance, may require multiple other elements of the control system to be upgraded which may be financially infeasible.
Finally, to secure OT safely and with operational resilience, specific knowledge of control systems and security is required, a unique combination in even shorter supply than the much-publicized IT security skills gap. OT systems were often designed years or decades ago and there is a shortage of skilled personnel that understands them. To secure OT, the industry needs to bring traditional IT security capabilities to the cadre of professionals with deep knowledge of the arcana of OT systems. This emerging discipline, known as OT Systems Management, includes the ability to conduct remediation tasks such as patching, vulnerability management, configuration, and user management.
OT security threat landscape
The growing threat to OT systems is driven by several factors:
Increased connectivity between OT and IT
Historically, many OT systems had limited connectivity to corporate IT systems. Most operated on OT protocols, were not dependent on corporate applications, leveraged proprietary operating systems and devices, and remained isolated from corporate networks.
Over the past two decades, this separation between IT and OT has evaporated. Even before the modern push for IIoT or “Industry 4.0,” industrial organizations and OEMs that provide control systems “modernized” systems leading to increased connections with traditional IT networks and devices. Modern OT environments now feature commodity hardware and software such as Windows operating systems, virtual environments, and IT networking equipment. With the increase of IIoT initiatives, such connectivity is expanding as analytics and productivity require direct links to enterprise cloud and data center applications.
Increased research and focus on OT vulnerabilities
For many years, OT benefitted from so-called “security by obscurity.” While well-known, widely distributed IT systems made attractive targets for attackers, the lesser-known, bespoke wares in OT remained mostly off the hacker radar. That’s changed with the increased use of commodity IT devices in OT as well as the common practice of leveraging traditional IT embedded components to build OT firmware and applications.
Verve’s analysis of ICS-CERT advisories for the past two years shows a nearly 50% rise in published vulnerabilities. This is likely just a fraction of the vulnerabilities actually disclosed. In addition, many embedded software vulnerabilities are never linked back to a corresponding OT device, meaning that unknown risks abound.
Increased targeted attacks
Thanks to shifting motives and new-found ways to profit from cybercrime, attackers now have industrial organizations in their crosshairs. For much of the past two decades, criminals focused on stealing high-value data such as credit card or private medical information. That’s changing as attackers discover new-found ROI in industrial targets. Industrial organizations have shown a willingness to pay ransomware actors millions of dollars to avoid costly shutdowns. Nation-states, meanwhile, are increasing their threats to critical infrastructure as noted in several U.S. government reports in recent years. In 2020, manufacturing moved from the eighth most targeted industry to the second.
The OT cyber security threat landscape is covered well by a number of organizations, but the following immediately come to mind:
- SANS ICS focus area for threat reports, blogs, podcasts, conferences, and training
- FireEye’s summary reports
- IBM’s annual X-Force Threat Indexes
Three classes of OT security threats:
In 2020, cyber security detection and response firm Mandiant analyzed OT cyberattacks and found more than 95% began with an intrusion into IT systems that led to attackers pivoting into the OT environment. This type of collateral damage is very common in OT incidents. In one widely-publicized example, the Wannacry/NotPetya ransomware and wiper attacks of 2017 and 2018, OT systems were not the initial target, but were compromised and disrupted due to poor network segmentation and a lack of patching. The incident cost companies including Merck, Mondelez, Maerskothers billions of dollars in lost productivity and recovery expenses. When it comes to protecting OT environments, such untargeted, highly-damaging threats must be considered.
According to KPMG’s 2020 OT security survey, nearly 60% of industrial organizations that had suffered at least one cyber security incident in the past 12 months claimed insiders were the cause. Forty-five percent were the work of “negligent insiders” who made an error that caused or enabled a data breach. Another 11% said the compromise was the result of a malicious insider. While nation-states get a lot of attention and publicity, fewer than 13% encountered a “nation-state” attacker in the past year, according to the report.
The fact remains, insiders have extensive access to industrial facilities as ease of operation and reliability become increasingly vital metrics. This means mistakes – or more intentional malicious activity – carry significant security risks.
Targeted third-party threats
This broad category includes nation-states, “hacktivists,” and financially motivated actors, to name but a few. Ransomware targeted at OT systems creates significant financial impact and a consequential urgent pressure to pay unless robust incident response and recovery processes are in place.
Nation-states in particular now increasingly target critical infrastructure like power grids, pipelines, pharmaceutical supply chains, and more. What this means for asset owners:
- Third parties providing services or products used in the OT environment face attack as a way to tangentially target industrial organizations either intentionally via collateral damage.
- Products deployed in OT environments often contain source code or components that did not come from the OEM and may be the result of integrating systems of systems. Organizations could have backdoors embedded in products that can be leveraged by sophisticated attackers to their strategic advantage.
- Enterprising malicious actors may target industrial organizations in response to economic forces or political conflict. Organizations portrayed as unscrupulous or immoral because of the products or services they provide may be targeted simply to further the attackers own initiatives rather than for monetary gain.
Regardless of the threat vector, today’s attackers enjoy access to copious technical and organizational reconnaissance data on the internet (also known as OSINT or open-source intelligence). As a result, large, strategic organizations are not the only targets; even small-to-medium businesses can be selected, surveilled, and surgically attacked.
OT cyber security frameworks
The multiplicity of security frameworks available to OT security practitioners only adds to the complexity when it comes to developing effective programs and robust OT defenses. The roster of popular regulatory and self-managed control standards includes both industry-specific as well as general OT guidance. Some of the available frameworks are audited by regulatory bodies while others are strictly voluntary. Some are directional and others are prescriptive. The best known among the applicable OT security frameworks available include:
NIST Cyber Security Framework
NIST has very detailed cyber security controls recommendations including some specially tuned for industrial control systems along with emerging guidance for IoT environments. As noted above, the CSF is a more general set of guidelines with approximately 120 sub-controls across five primary dimensions.
The five NIST functions span both technical and procedural controls, providing a foundation for cyber security assessments. In each of its functional areas, the NIST CSF describes sub-controls with detailed guidelines to achieve specific maturity levels.
Maturity in NIST CSF is defined by establishing a set of “profiles.” These profiles are not prescriptive, though NIST does offer some suggested models. Organizations must determine their own maturity targets and profiles, which adds to the framework’s flexibility.
NIST CSF is the most-used standard in ICS security according to SANS. In their 2017 and 2019 ICS security surveys, more respondents were using NIST CSF than any other framework, followed by CIS Top 20, ISO 27000 Series and IEC 62443/ISA 99. The NIST CSF remains an attractive alternative as it provides directional and foundational guidance without prescriptive policies or controls that organizations may find too restrictive for their unique OT environments.
Center for Internet Security (CIS) Top 18 Security Controls
The Center for Internet Security is a non-profit, non-governmental organization that seeks to improve overall individual, corporate, and government cyber security. Approximately ten years ago, a group of organizations, including DHS/CISA, SANS, and several international cyber security bodies came together to establish a common set of controls designed to improve the security maturity of any organization. Originally known as the CSC 20, the framework was pared down in May 2021 to a simplified list of 18 high-level controls. The updated roster of top-level controls comes with 153 sub-controls or “safeguards” that provide more prescriptive guidance than the NIST CSF.
The CIS Controls v8 represents a complete revamp of the framework’s approach. Previous versions of the organization’s popular guidance, which many organizations are already heavily invested in, broke top-level controls down into three groups dubbed: Basic, Fundamental, and Procedural.
The first six controls of the former CSC 20 were referred to as “Base”. This included hardware and software inventories, endpoint vulnerability and configuration management, user and access management, and event logging. The second set of ten controls covered a wider range including network configuration management, network segmentation, incident detection and response, and data controls. Finally, the Procedural grouping included training, penetration testing, and DFIR.
One advantage of the CIS framework is its prescriptive levels of maturity for each sub-control. This avoids much debate as to what the organization’s profiles and target levels should be. The framework’s five levels of maturity for each sub-control are based on the number of devices, accounts, or assets covered by that sub-control. Organizations can establish a specific maturity level requirement then measure with specific, quantifiable metrics how they are doing against that maturity objective. This offers significant advantages to organizations struggling to make progress in aligning stakeholders.
CIS designed its framework with IT in mind and many CIS controls do not translate to sensitive and embedded devices in OT. However, CIS has developed an “OT” version that attempts to address these limitations. Nonetheless, it is feasible to adopt CIS guidance on OT systems with appropriate compensating controls and technical feasibility exceptions. Many organizations have achieved robust maturity across IT and OT leveraging the CIS framework with the appropriate adaptation to OT. Read the CIS Top 18 Case Study to learn how a U.S.-based energy company improved its cyber security readiness and maturity.
NIST 800-53 and other sub-standards
The NIST CSF is essentially the high-level summary of a much more detailed set of controls defined in its NIST 800 publication. Clocking in at nearly 700 pages, The NIST 800 lays out a comprehensive set of controls covering virtually all computing systems. 800-53 is a deep dive into a specific set of controls relevant for ICS security, a sizable subset of OT security. As such, the 800-53 standard is a helpful — if rather hefty — guide to NIST’s suggestions on OT security.
Most organizations will use 800-53 as an enhancement of their CSF-based program, rather than trying to achieve comprehensive control for all elements of the 800-53 standards. There are more than200 pages in NIST 800-53 covering everything from traditional IT-like controls to details specific to ICS security.
The ISO 27000 series
ISO 27000 was produced by the respected International Organization for Standards which covers a wide range of standards and processes for ensuring quality, security, and productivity. ISO 27000 was developed in coordination with the International Electrotechnical Commission (IEC) and focuses on Information Security Management Systems. Like other ISO standards, organizations can be certified by ISO auditors though that is not a requirement for its use. The standards are considered best practices that organizations can adopt to improve their overall maturity without going through a certification process.
ISO 27000 is a general IT security standard and is. It is not specific to OT or ICS systems. Similar to NIST CSF and CIS, however, many components of the ISO 27000 standard are highly relevant to OT environments.
The ISO 27000 series is procedural in nature and is often used in tandem with NIST CSF or IEC62443 which are more technical. ISO 27001 is the actual ISMS set of standards and an organization can be certified by ISO on that standard as well. ISO 27002 is the list of recommended best practices organizations can choose to pursue to achieve maturity, but there is no certification for ISO 27002.
IEC 62443 and ISA 99 standard
The IEC 62443/ISA 99 is an OT-specific standard. Jointly developed by the International Organization for Standards and the International Society of Automation, the framework details four levels intended to provide security for different types or maturity of attacks. Organizations can determine which security level is most appropriate based on their own, unique compliance or supply-chain requirements.
Each IEC 62443/ISA 99 security level establishes a set of requirements to achieve that level. For example, there are 37 requirements to achieve SL1 and an additional 23 to reach SL2. These components look very much like those included in the CIS or NIST frameworks – which makes sense considering these standards aren’t trying to reinvent security, only focus efforts.
Because IEC 62443/ISA 99 is specifically tailored to OT environments, the controls do provide more context for items relevant to operational technology. For instance, the term of art known as “zones and conduits” is a hallmark of this standard. “Zones” can be thought of as the different parts of a network where devices may talk to one another, a characteristic particularly relevant in segmented OT environments. The “conduits” then are the paths of communication either within or between zones. IEC 62443/ISA 99 includes recommended architectures to ensure secure communications within zones and across conduits.
The holistic IEC 62443/ISA 99 doesn’t compete with the NIST CSF, but can be used as an additional framework to help guide OT-specific implementations in a NIST-based program.
>> See our guide: Protecting OT Systems with IEC 62443
Other OT framework contenders
A more comprehensive list of security standards might include UK NIS directives, CFATS, RIIO2, NERC CIP, and many others around the world. Over time some of these standards may become more widespread as organizations find using them to be more helpful than others.
These cyber security standards and frameworks risk an “alphabet soup” of acronyms, numbers, rules, and requirements. The most important detail to remember however is that the foundational elements are similar across all of them. Nearly all can be mapped to one another. The basic functions of Identify, Protect, Detect, Respond, and recover laid out in the NIST CSF appear again and again as core elements across all the frameworks. Some organizations face regulatory requirements, so the choice of framework becomes more straightforward. But even for them, security is usually more than compliance and the choice of which framework to overlay on top of the regulatory requirements becomes important.
Components of OT security through the NIST CSF lens
Summarizing the components of OT cyber security from these standards is a challenge given the breadth of requirements, and no summary will satisfy everyone as it is sure to leave something out. Leveraging the functions of the NIST CSF as a structure for OT cyber security requirements, we offer here a simplified version of the DHS’s Defense-in-Depth model. Each section includes elements that apply to the broader categories of policy and procedure, networks, access controls, endpoints and information.
Fundamental to the OT cyber security journey is a complete view of the organization’s assets, accounts, software, and connections. This is necessary to create a robust risk assessment that can be used to develop a risk remediation roadmap and prioritize potential impact from a cyber event.
Asset inventory forms this foundation, and here is where challenges in OT security often begin. Legacy systems running proprietary embedded operating systems or older Windows operating systems are difficult to inventory. Traditional IT approaches such as scans and traffic analysis can damage legacy devices. Even when they don’t cause havoc, they frequently fail to identify the asset itself or capture basic information about it. An effective asset inventory needs to include:
- Make, model, and operating system or firmware version of the asset
- All application software on the asset
- All users and accounts and their administrative rights
- Critical configuration information such as communication, password, administration, ports and services, and other settings
- Presence of security functionality such as anti-virus or malware, application whitelisting, or backup status
- Connectivity of the asset such as dual network interface cards (NICs)
- Patch status and vulnerabilities
- Asset criticality to the ultimate process
- Presence of sensitive information
There are both technical and procedural methods available to gather such an inventory. In many organizations, OT resources gather this information on spreadsheets. Others may combine manual and technical means. Still others rely on traffic capture through networking equipment to try to infer what asset information they can through packet inspection or will leverage technology that communicates directly with assets in their native protocols to gather deeper and more accurate inventory.
The “Identify” component also includes the ability to produce a comprehensive risk picture for the environment. This includes vulnerability analysis of endpoints with data such as CVEs from the National Vulnerability Database or ICS-CERT, the U.S. government agency that manages alerts on industrial controls vulnerabilities and threats.
>> Read more about OT Vulnerability Management and overcoming common challenges.
Risks in OT don’t stop at identified software vulnerabilities. OT wares are often “insecure by design,” meaning networks and systems were never designed with security in mind; they are insecure even if the associated software has no documented vulnerabilities. For instance, in an environment where the goal is operational efficiency, all operators can make administrative changes to programmable logic controllers as a way to ensure rapid response to issues and continually improve the process. Similarly, remote access is often widely available to allow OEMs and other support providers to make changes with little or no security safeguards. The prevalence of open ports and services offers attackers easy access without taking advantage of vulnerabilities.
Finally, the Identify component should include the prioritization of risks and a remediation roadmap. Identify should not stop with an assessment that includes all of the areas lacking in security. In almost every case, the first vulnerability assessment in OT security will find a wide range of risks, many critical. The key to a successful Identify component is the prioritization of risks based on likelihood and potential impact. Equally important is the development of a roadmap of initiatives to close the most significant gaps over time.
A robust Protect program includes elements that encompass policies and procedures, network protections, systems and data access controls, and endpoint and application protection. At each level, organizations can set up defensive layers that make it more difficult for an attacker to breach systems, cause damage, or steal information.
Policies and procedures
Configuration and change management
These policies describe the minimum secure configuration standards for different types of systems.
Vulnerability and patch management
These policies define when and how vulnerability assessments are to be completed, the standards for remediation of vulnerabilities in terms of timing, criticality, and compensating controls, as well as the review process for deploying patches to OT systems.
Access control policies
These policies define who and how users will gain access to systems and to information. These include policies of “least privilege” which means ensuring that only those absolutely required to have privileges do have them. It also includes procedures to ensure those least privileged policies are in place, monitored and that any remediation is made in a timely manner. This will also include the level of access different assets have to different zones within a network, a safeguard fundamental to OT security.
These define how information is stored and transmitted over the network. This can include sensitive information such as programming of control devices or performance and sensor data on the process itself. In some environments, this may include intellectual property that needs to be safeguarded from potential threats.
These policies and procedures require adjustment from IT security stakeholders as well. The devices, systems, information, and users in OT require that policies adopted by IT need to evolve in OT as well. This is both to make them more strict and to recognize the need for flexibility.
In IT, it may be normal to include software such as TeamViewer, Webex, or other telecommunication software on a standard configuration. In OT, however, where assets may be less protected and processes more sensitive, this software should not be installed by default. On the other hand, patching of OT systems requires testing and may only be possible when systems are offline. This likely means patching cannot occur at the same pace as in IT. As a result, other compensating controls are required to ensure the protection of these systems until they can be patched.
Common in many OT environments, network protections provide an initial layer of defense for sensitive systems that control critical processes. Network protections exist at the perimeter of the corporate network connecting to the internet or cloud. They may also be found at the OT network perimeter where it connects to the corporate network. Finally, they occur within the OT network itself, providing segmentation and separation to various systems in the process.
There is no magic formula to the perfect network protection design. Approaches differ based on the factors such as risks to access, age of endpoints, organizational defensive capabilities, and required connectivity to corporate systems. This guide provides an outline of the types of protection an organization might pursue as well as some common industry approaches, such as the Purdue Model.
Types of network protection include:
Perhaps most well-known in cyber security are the hardware-based network protections.
These devices offer one-way access to OT environments. Used heavily in the power generation and oil and gas industries, data diodes allow traffic to proceed securely from the control system to the enterprise IT system with no reciprocal connectivity.
This ensures that attackers cannot access these systems through inbound connections while still allowing for monitoring of internal systems by corporate analytics tools. When incorporating data diodes, network design is critical. Improper configurations can make the devices less effective or allow traffic to circumvent the diode structure altogether.
Data diodes are a relatively constricting form of network protection and can have significant operational drawbacks if implemented as originally designed. Many of the advances envisioned by Industry 4.0 and IIoT are difficult to achieve with conventional diodes in place. In response, many diode manufacturers now offer revised portfolios with so-called reversible and “two-way” diodes that facilitate inbound access based on customizable criteria such as time and date schedules or traffic types.
Well known in IT security, firewalls in OT environments serve a similar purpose, but are purpose-built to monitor OT traffic and configured to allow or reject communications into the OT network or subnet. While a number of traditional IT firewall vendors include OT capabilities in their products, OT-specific firewalls continue to provide better management of OT protocols. These are typically deployed deeper inside the OT layers at the level of a PLC or below.
One of the biggest challenges to OT security is ensuring proper firewall configuration. Even appropriately designed firewalls often fail to provide adequate protection due to poorly executed rules and improperly managed changes over time. Ensuring program execution and monitoring for changes and insecure rules is necessary to maintain proper network segmentation.
OT systems are designed with the assumption that remote access will be available to facilitate troubleshooting and programming support. Some industries such as power generation now greatly restrict remote access in their OT environments. For most other industrial organizations, however, remote access remains vital to ongoing operations. This is true even as some argue that their OT systems are “air-gapped,” a reference to the outdated concept that critical OT systems have no external connections or access to the internet. In reality, very few OT environments were ever truly air-gapped. Now, with the rise of IIoT, air gaps are a thing of the past for nearly all modern systems.
That makes secure remote access a critical element of OT security. Connections are necessary for OEMs to manage and troubleshoot their systems. Technicians still need a way to access control systems. Corporate IIoT and analytics teams need persistent data access to leverage insights and facilitate process changes.
Secure remote access involves several important practices including securing communication paths; ensuring only authorized users can connect; monitoring and recording behavior during the session, and; enforcing corporate policies on connected devices.
A range of vendors offer such solutions – some IT-centric, others more specific to the OT environment. Operators should have a single system covering all vendors and personnel that require access. Trying to support unique remote access solutions for multiple groups quickly becomes unmanageable.
Zones and conduits
IEC 62443/ISA99 uses the terms “zones” and “conduits” to describe communications schemas designed to help secure OT networks. Zones can be thought of as the area of the network that encompasses a group of machines. Proper zoning limits the ability to communicate from one zone to another without appropriate authorization. The intent behind zones is to keep an attacker with access to one group of devices in a network from pivoting to another. Zones attempt to restrict such movement unless the user, application, or device has the authorization to connect to the other zone.
“Conduits” are the paths in which communication occurs either within a zone or across zones. Conduits allow for certain traffic to move across zones. Proper conduit design means that certain paths are open to a device to communicate with another device. For instance, an HMI on a particular zone may be allowed to interact with a PLC in another zone if it follows a certain path or conduit in doing so – for example via a particular firewall, only with “read commands” and only for a certain type of traffic.
“Least privilege” is a fundamental concept common to all cyber security standards. It describes minimizing the amount of access provided to each user, device, application, or service account to the least amount possible to perform its prescribed function. In OT, access control is uniquely challenging for several reasons:
First, processes require high reliability and uptime as well as very strict safety systems. As a result, operators want to ensure that the closest person to the process can quickly shut down the system for safety reasons or reset parameters to improve the process. Access controls include requirements for signing on to individual accounts with separate passwords, ensuring time-based lockouts, and limiting access to admin accounts. However, in a rapidly moving process, companies often either don’t use passwords at all or provide shared passwords and accounts.
Next, many OT devices lack access control altogether. The assumption in many industrial and operating processes is that the operator who is physically at the workstation has authority to make changes and the systems are designed to support that.
Additionally, OEMs and service providers need access to OT devices and systems to provide maintenance or troubleshooting. In many cases, efficiency motivates organizations to create new accounts for these users.
Finally, many of these systems are not connected to the central active directory as most IT systems are. As a result, monitoring access and maintaining limited access is often a manual process.
However, true OT security maturity requires managing access in several ways:
User and account management
At its foundation is the asset inventory highlighted above. You can’t limit access to assets you don’t know about. Once users and accounts are identified, mature OT cyber security requires the ability to quickly remediate and maintain limited accounts – cleaning up dormant user accounts, limiting admin rights, and establishing robust password requirements, for example.
The network protections described earlier provide the foundation for this. In the context of network protection, different users, devices, and applications will have different rights. In OT, it’s important to determine which devices and accounts need to communicate with the others. Limiting communication could damage the process. Monitoring required communications, reducing unnecessary flows, and locking down networking equipment are hallmarks of maturity.
Design and programming of devices
One major difference in OT is the variety of embedded device types, many of which do not have access controls enabled by default. Security in OT requires selecting systems that allow for sufficient control of access. For legacy devices, it’s vital to review all access control capabilities to leverage what’s available rather than assuming physical access equals authorization. For devices r that don’t offer such capabilities – and this list will be long – defenders must develop compensating controls to address access control limitations through physical network protections or by limiting the functionality of the device.
Endpoint and application protection
Among the least understood and most critical elements of OT security is the protection of the endpoints themselves. Organizations often assume robust endpoint protection is impossible given the unique characteristics of OT systems – OEM proprietary applications on OS devices and OS on embedded devices. Yet, with the greater connectivity required by the digital infrastructure investments of IIoT and Industry 4.0, it’s imperative that organizations look to actively protect endpoints, rather than simply detecting anomalous network traffic.
Endpoint protection has its foundation in the asset inventory requirements mentioned above; it’s then important to act on that information to provide protective elements. Some of the standard systems management tasks in IT that OT defenders need to heed and adopt include:
Patch and vulnerability management
It is often said that you cannot patch OT systems. This is not true. You cannot patch all systems for all known vulnerabilities immediately, but you can make significant headway in remediating software bugs with a programmatic approach.
>>See our end-to-end patch management whitepaper for more on this topic.
Patch management begins with a detailed understanding of the necessary patches to deploy – commodity OS, application, and firmware. This is followed by an analysis to identify the relevance of each patch to each system in order to determine a patch’s necessity for a certain device if key configuration settings on that device are not enabled or other patches have been deployed. This is particularly important in the case of firmware updates, which may be mitigated through configuration changes that do not require an upgrade of the device or a significant subset of the control system.
Next comes a review of the patch for its impact on the operating environment. In many industrial use cases, these reviews are provided by OEMs of the hardware or process-control software. For others, an independent review of the patch and its potential impact is necessary. To make real progress, organizations need automated patch deployment as manually deploying patches requires significant labor resources of OT personnel that are just not feasible. OT systems that accelerate the efficiency of patch deployment are critical.
Closely tied to patch and vulnerability management is secure configuration management to design and maintain secure settings on OT equipment. As discussed earlier, OT systems are “insecure by design.” Organizations can significantly reduce risks and improve the protection of endpoints with a robust review of configuration settings against standard security posture.
Logout and login settings on devices, closing unnecessary ports and services, removing unnecessary software, and ensuring limited connectivity are just some of the configurations practices that should be followed where feasible. OEMs often share configuration best practices on their websites or in security releases.
In addition to the design and initial install, robust security also requires management of configuration changes. In some industries, such as those regulated by NERC CIP, change management is a foundational notion, but in many others, changes are made on a frequent basis by a range of factory automation or instrumentation and control technicians to improve the process and allow access for remote support among a host of other reasons. Capturing changes, determining whether they’re appropriate, then responding byre-establishing a proper baseline are core components for a secure endpoint environment in OT.
This is separate from vulnerability or patch management, focusing exclusively on a reduction in unnecessary software on OT systems. It’s common to find many unnecessary – and potentially risky – applications on OT systems. This belies the notion that OT devices should only run “OEM-approved” software. It’s not unusual to find remote access tools like LogMeIn, suites of Adobe software, even DVD burners and Apple iTunes when auditing the contents of OT devices. One fundamental endpoint protection lever is to remove all unnecessary software to reduce the risk of malicious actors leveraging those applications.
Many organizations struggle with anti-malware in OT given the challenges of updating signatures on a regular basis, the lack of direct cloud connectivity necessary to leverage many next-gen AV solutions, the inability to apply AV wares on embedded devices, the dearth of vendor solutions that can be broadly applied to all devices in the fleet, and the persistent challenge of false-positive detections that can bring critical processes to a halt. But malware is a constant threat in OT, with strains that target many of the basic components of those found in IT. As a result updated AV tools can be very effective in stopping them. In addition, there is a growing number of OT-specific malware strains which do require more specific OT signatures or detections.
There are several options for Anti-malware defense in OT including:
This technology, which is slowly going away in the IT environment given the explosion of applications and the never-ending challenge of keeping the whitelist updated, remains a very effective solution for more stable OT environments. Application whitelisting inverts the anti-virus concept. Where AV allows all traffic unless it is deemed malicious, whitelisting disallows all applications unless they are explicitly allowed.. Many OT processes do not add applications frequently, and, in fact, the process runs more effectively if these applications are disallowed.
Integrated vendor-agnostic antivirus (AV)
The various OEMs approve and license multiple AV brands. Emerson uses McAfee while Rockwell uses Symantec, for example. It’s not uncommon to find OT environments using a half dozen different AV brands or more across their entire fleet. The management of such a complex AV ecosystem quickly turns into a logistical challenge. One option is to integrate AV solutions into a single pane of glass to gather status, alerts, and detections into a common interface for resolution. This can significantly streamline the management of the various AV systems.
The new standard in IT-oriented anti-malware is next-gen AV, shorthand for cloud-enabled. Next-gen AV tools need no signature updates on the agent deployed to each device. Rather, they take all of the processes occurring on the system, compare them to known risks, and look for anomalous patterns of behavior in their cloud infrastructure to stop malware even before a signature is created. These increasingly popular approaches can be effective in certain OT environments where workstations and servers have access to the cloud. However, these systems do not work for embedded devices where the software’s agents cannot be deployed.
Managing removable media and rogue devices
One of the myths of OT is that network protection and detection are sufficient to protect OT endpoints inside the perimeter. One significant gap in the network protection armor is revealed in the presence of removable media and other access points, especially in today’s IIoT world. Even in the most “air-gapped” network, there inevitably comes a time to introduce or update applications, or move data in and out of the system. Removable media, in the form of USB sticks, portable drives, and other transient cyber assets such as laptops, are the form of choice in most cases. Unfortunately, these devices can contain malware if not properly scanned and treated before introduction into the protected OT environment.
Defenders have several options for protecting OT systems from such threats. Application whitelisting is useful for limiting removable media devices to only those specifically approved to open on any OT device. Asset owners can also use network access controls to limit new devices, such as transient cyber assets or others connecting to the network, potentially introducing strains of malware.
Another increasingly prevalent threat arises when users add wireless access points to the network without approval or review. Ensuring these rogue devices cannot connect without permission, or, at a minimum, alerting on such connections so defenders can remediate risks in an hour or less, is necessary to effectively safeguard OT systems.
The NIST Detect function covers an organization’s ability to rapidly recognize malicious activity. Suspicious activity can range from curious, anomalous events to hard evidence of known bad behavior. Like protection, detection includes both network- and endpoint-based requirements.
Network detection, often referred to as Network Intrusion Detection Systems (NIDS), monitors network connections, traffic, packets, and other information to identify malicious patterns. Endpoint detection, often referred to as Host Intrusion Detection Systems (HIDS), provides similar analysis on the behavior of devices and the processes occurring on those devices in an OT network.
OT security over the past five years has been defined by network detection methods. What has come to be known as “passive anomaly detection” is now synonymous with OT security as endpoint protection and detection is increasingly seen as risky to operational processes.
OEMs fueled this trend, discouraging customers from installing endpoint management and security tools in their environments claiming that such activity could disrupt processes.
Network detection monitors traffic that flows through networking devices such as routers, switches, and firewalls to determine baseline behavior and detect anomalous patterns that, based on prior research, might represent malicious or risky activity. Network intrusion detection then sends an alert to a security information and event management (SIEM) platform where it can be combined with other threat and security data for further analysis. Network detection lets operators spot potentially risky communications that can indicate a threat actor attempting to infiltrate the OT network.
Similar to network detections, endpoint or host detections look for anomalies or threat signatures in behavior. Endpoint detection examines activity and events related to asset or endpoint itself, such as logs, Syslog, and Netflow data. It also includes analyzing user behavior to reveal actions by users on endpoints that might indicate threats. Successful endpoint detection can also include a review of device performance data such as power and CPU usage that could indicate suspicious activity.
In OT environments with connections to physical processes, endpoint detection can also ingest physical outputs of the process itself. By adding these physical patterns, threats can be identified more quickly and false positives reduced by comparing server, workstation, HMI, PLC, or other behavior data with the I/O data on the process itself.
In cyber security of any type, detection is mostly worthless without a well-managed and meaningful response. The response is the “so what” of detection. Those who have been in cyber security for any length of time realize that detections create lots of alerts and potential indicators of compromise. Providing true security hinges on how well an organization can respond to those detections, conduct root cause analysis, and take action to neutralize the threat. The ability to rapidly take appropriate action across an organization’s OT environment is critical to providing effective responses to detections.
Response begins with root cause analysis. This is particularly critical in OT and requires knowledge of the processes being controlled. Taking the wrong action in response to a non-critical event can result in downtime or outages worse than the perceived threat might have caused on its own. Mature organizations will have a response process that takes the alert from the network and host intrusion and be able to further analyze it to understand how critical the threat is, what devices or networks it targets, the cause of the alert, and whether it has an explainable cause such as an operational change.
Incident response in OT requires coordination with personnel that understand the process at that particular site or environment. This knowledge allows for better root cause analysis to determine whether the alert is a false alarm or a true security incident. Furthermore, this knowledge enables OT personnel to take appropriate actions that can both contain the threat while also ensuring as much uptime as possible on the core process.
Incident response often also includes the engagement with third parties, including insurers, regulators, government entities, and cyber security analysts who are proficient and incident response and management. Key in this phase is the coordination of these various groups to ensure the lowest cost, fastest time to recovery.
In the event that the incident causes an outage or disruption of systems, the final phase in the NIST CSF is to ensure the system is back up and running with no remaining malware threatening the environment. Recovery in OT begins with robust backups. In many OT environments, backups, while critical, can often be manual or ad hoc. In most cases, modern IT systems have automated backups through virtualized tools, etc. However, in OT, these tools and processes are often not employed rigorously. Organizations need to ensure they maintain recent and robust backups to allow for rapid restore. As recent ransomware attacks have shown, backups also must-have offline copies as one of the first steps in successful ransomware is to encrypt critical online storage devices as well.
Beyond backup & restore, recovery in OT will often also include the restoration of the running configurations, rules, and programming on industrial control systems. Unlike in IT, where the programming is easily backed up, in OT recovery often requires significant manual efforts to restore the core ladder logic, controller configurations and programming, etc. to ensure the process begins to run correctly. In many operational environments, these programs must also pass regulatory hurdles. For instance, in the medical device/pharma industry, quality protocols require that configurations and programs need to be tested and proven. In the case of a ransomware recovery, these systems may take a significant amount of time to come back online in a compliant manner.
Like so many other parts of OT cyber security, the recovery phase requires a deep knowledge of the OT process as well as the regulatory requirements it operates in. Having a central security operations center try to manage the recovery process will not work. A specialized team of OT personnel, including process control and quality engineers is key to achieving rapid recovery.
Organizational alignment in OT security
Perhaps the largest challenge in OT security is the organizational one. There are myriad reasons for this – from different cultures to different priorities, different performance criteria, different training, and the list goes on. In some cases, it is as if these groups are from different planets.
To achieve meaningful impact in the reduction of OT cyber security risk requires close collaboration between security leaders and the controls engineers and operators that manage and understand the operational technology systems that control the physical processes. In some organizations, these groups are already closely aligned but in others, they are far apart. What is most interesting is this is not an industry-driven or process-driven outcome. It is much more about the culture and organizational models of the organizations prior to pursuing cyber security.
There is a wealth of advice on how to enable closer collaboration between IT, security, and OT.
Proper alignment begins with a shared set of objectives across the top of the organization. Perhaps the largest barrier is when the objectives of the different functions are at odds with one another – or at least seemingly so. In many cases, this is seen as the difference between operational uptime on the part of the OT/plant leadership, while security’s goal is to reduce the threat of attack, even if it may mean short-term downtime to implement tools or change processes. Senior leadership must be willing to step in to bring these groups together so that a common set of objectives is agreed to. Security is, in fact, the objective of reducing downtime. In fact, most of the security controls for OT are about just this. However, clarity of goals and communication across teams is necessary.
Next comes the structure of who is accountable and where authority resides.
There is no magic solution to the right organizational structure of security. Different organizations have succeeded with different models based on their history. The best recommendation is to leverage the overall organizational direction of the company, rather than force-fitting something new just for security.
One of the most successful OT cyber security executions occurred at a utility holding company with a culture of business-unit independence and ownership of results. The company’s incumbent governance model uses the classical distributed business-unit PandL ownership model made famous by Emerson Electric, Illinois Tool Works, Danaher, and many other industrial companies over the years. The principle is to make clear accountabilities around the “what” – i.e. targets and objectives. Then let the management of each business unit have full authority as to the “how”– strategies and tactics to deliver.
In the case of cyber security, the senior team established a very clear top-down directive as to the objective and standards they expected each of the business units to achieve – in this case the CSC top 20 controls – down to specific maturity levels by each sub-control. They put in place a company-wide review process to ensure progress to the objective. The CISO was very involved in helping shape both the objective as well as the process. Then the “how” was left to each business unit. Within a defined construct of objectives and metrics, business units had the authority to make decisions such as what tools to deploy, how to balance compensating controls, the specific approach to achieving least-privilege settings, and specific approaches to incident response.
There are challenges with any approach: duplication of effort, inefficient use of underlying tools, not applying corporate best-in-class approaches to each business unit, need for duplicate cyber security expertise in a world where cyber talent is limited, too focused on a set of standards rather than real “security” and reduction in threats or time to remediation. All of these limitations are absolutely true and were addressed through other measures. However, the organization did not have a culture of centralized experts or top-down directives of shared tools or infrastructure. To create such a model would have meant going against the primary mode of operation for the organization. Had the CISO tried to push in this direction, he most likely would have ultimately failed because it was not in the organization’s DNA.
The CISO knew that no governance model is perfect. Successful OT cyber security leaders take the time to understand the overall governance culture of their organizations and will build a model that works with the flow, rather than trying to force-fit a theoretically better governance model. They will then address the gaps unique to that approach to ensure the limitations do not become hindrances.
In another example, an organization created a single cyber security architecture and management body staffed with representatives from different areas of the organization – production staff that had run plants, security representatives, IT leaders, and more. This group gelled into a working body that brought the best knowledge from each group to the problem.
Balanced scorecards are another important element for driving OT security priorities across the organization. Over the past 40 years, operations executives have learned how to balance a range of different metrics in delivering output – efficiency, quality, environmental health, safety, and others. Security is one additional element in ensuring the ongoing delivery of output. Because cyber security events are still relatively rare, operations executives can often underweight these risks as they are low probability-high impact events. Therefore, using the balanced scorecards adopted and proven over the past 30 years is an effective way to make security an element in the overall delivery of objectives for operations.
According to a recent KPMG/CSAI survey, the number one barrier to OT cyber security is a lack of knowledgeable resources.
How does an organization address this expertise challenge? First, one can look to the outside for help. Managed OT security services represent a growing industry with more firms establishing their capabilities in delivering OT security as an outsourced provider. While this may not be a cure-all, it does address one of the most challenging elements; the turnover of skilled security staff. Many organizations already struggle to recruit new security team members from a limited skills pool only to see their hard-won staff poached by another part of the organization or recruited away from the company just as they are trained on the systems.
According to the NIST Cyberseek database, there are more than one million unfilled cyber security jobs. Any trained person is going to be a recruiting target. When this happens, the OT security organization is left to try to attract and train a new person, often without adequate resources for the training. A managed OT security vendor has the scale to ensure a continuous stream of hiring and training so that each customer can leverage the scale.
Second, training is a valuable component for gaining expertise. Most of the available training is around what might be referred to as “OT Systems Management” rather than advanced threat detection or artificial intelligence. According to NIST, almost three-fourths of jobs in cyber security focus on “systems management” rather than advanced analytics. In OT, given the fact that most of these systems have not been managed historically, these skills are even more in need.
Is it easier to train IT people on OT or vice versa? The reality is that neither is simple and the right answer includes a blend of both. The above chart, however, highlights that the types of cyber security skills necessary are fundamental IT systems management capabilities that are definitely feasible to learn. To take advantage, organizations can:
- Leverage internal IT resources with depth in foundational elements of vulnerability management, configuration hardening, and similar skill sets. By sheer numbers, there are more IT workers than there are industrial engineers and technicians by a factor of 5-10X, depending on how each is defined by BLS. Furthermore, the skills needed to operate and manage IT and OT HMIs, switches, routers, firewalls, and other wares are similar. Finally, functional requirements in security such as understanding correlations, using the latest analysis tools like Splunk, and defining patch requirements are similar between IT and OT, even if the specific threats or incident response actions differ.
- Tap into this IT resource pool by centralizing the analysis of cyber risks using vendor-agnostic technology. This obviates the need to build discrete cyber security expertise areas in each plant or site.
- Integrate OT experts into a central team. While general cyber security knowledge is important, how to address those issues within the OT context requires people that understand what is feasible and operationally safe within the OT environment. This blending also enables cross-learning over time.
- Invest in training for site-level OT resources in critical OT systems management functions like patching and configuration hardening. Safe deployment of these security actions demands that local OT resources be involved in and understand the management tasks taken on those systems. This training should also include incident response activities.
- Leverage technology that enables local teams to automate actions they can take across vendor systems to reduce the labor burden. One major challenge is the dependence on OEMs for this management function, a suboptimal approach that places risk in the hands of third parties — and in most cases multiple third parties — as most plants deploy equipment from multiple vendors.
Having OT personnel closely involved in the response and actions for either protection or response activities is a recipe for success. This can be referred to as a “Think Global: Act Local” approach. This concept gives security teams visibility into endpoints, networks, and users across an OT infrastructure through a centralized database and analytics platform. This enables scaling of knowledgeable resources. The central SOC can monitor vulnerabilities and threat detection across IT and OT and analyze and prioritize them based on experience and scale.
Be forewarned, however, continuing the response into root-cause analysis of an event or taking action to protect OT systems or respond to an ongoing threat can cause unintended impacts to systems if done improperly. Therefore, “act locally;” engage OT resources with the most knowledge of the process when patching or managing users and configurations, for example, to ensure such actions are tested and applied at an appropriate time.
Creating an OT cyber security program
There are no magic bullets or one-time actions to achieve success in OT security. Real progress requires a programmatic approach to continuously improving the cyber security maturity and effectiveness in the OT environment.
A robust program begins with establishing an objective then measuring the baseline against that goal. There are many frameworks an organization can use to establish such targets. None is perfect, but leveraging best practices can help accelerate the organization’s security journey.
Next comes assessment and prioritization built from a baseline and gap analysis of the environment. Steps for an effective assessment and prioritization include:
- Getting specific. Too often assessments end at high-level gaps based on surveys or interpretations of network diagrams and documentation. This results in charts with a sea of red areas and low scores, leaving organizations with little direction on priorities. Instead, get deep access into the assets themselves to gather data necessary to prioritize the risks and potential maturity impact from resolving those risks. This involves an asset-by-asset, 360-degree risk assessment to gather details such as known vulnerabilities, user and account risks, access risks, configuration risks, and network design and implementation. This deep picture of the risks allows for more targeted priorities for remediation actions that really make a difference.
- Developing a roadmap. An assessment without a roadmap to achieve target maturities is meaningless. Translate risks into a programmatic roadmap to close known gaps. Roadmaps often include different time horizons for prescribed actions. No one roadmap works for everyone.
- Beginning remediation. In almost every OT security journey, the onset is a whirlwind of cleaning up insecure architectures, buggy software, vulnerability “debt,” and improperly managed user accounts. Often it requires significant project resources to work through items such as network segmentation, secure remote access, backup, and system patching and upgrading. Every roadmap will include requirements such as these that need to be paced over time.
- Monitoring and maintenance. Once the surge has occurred, the hard work begins even as the initial wave of energy, budgets, and focus begins to fade. People go back to their day jobs, staff that was hired is recruited away, new tools need to be maintained once they’re deployed. This is the crucial period that separates the mature from the immature. All plans and roadmaps should include budgets, resources, and procedures to ensure monitoring and maintenance of the surge efforts. This includes the monitoring of configurations, threats, vulnerabilities, patches, access management and more. It also includes the ability to report on all of these metrics. And, perhaps most importantly, the effort also demands the continued support of senior leadership to maintain organizational focus once the big, initial wave is complete.