Addressing New ICS/OT Cybersecurity Regulations
How to achieve a successful and efficient programmatic response to the current and future regulatory environment for ICS/OT cyber security.Learn More
Subscribe to stay in the loop with the latest OT cyber security best practices.
2021 signaled a shift in government policies to more prescriptive regulatory requirements, such as the United States TSA guidelines for pipelines, rail and marine transport. It also brought an increase in awareness of the risks to industrial environments as ransomware events grew and manufacturing, healthcare, and power industries continued to be the target of many attacks.
To meet these growing regulatory requirements and defend our critical infrastructure from growing attacks, we need an approach that goes beyond perimeter segmentation or network detection. We need to take a “pro-active” approach that protects these sensitive environments.
In this session, John Livingston describes how industrial organizations can achieve regulatory compliance as well as increased defensive measures within these sensitive OT environments. Key topics covered will teach you how to:
Hey everyone, I hope you’re having a wonderful conference so far. My name is John Livingston. I’m here to talk about how to build a robust OT cybersecurity program. I’m really excited to have everyone with us today. Just by way of introduction, as I mentioned, I’m John Livingston. I’m the CEO of Verb Industrial. Describe the company a second. For about 20 years of my career, I was with McKinsey and Company, a global consulting firm helping organizations manage their digital transformations, etc. And then a little over five years ago, I joined Verve as the CEO.
Quick background on Verb, which is really, you know, providing a bunch of the context for the discussion today. Verb started almost 30 years ago, first version of what is called the Verb Security Center, which is a cybersecurity software platform specifically built for industrial control systems in OT. It’s built essentially leveraging the knowledge of our team from their control systems engineering experience, and it really essentially brings the best of IT security into the OT environments, leveraging that knowledge.
And then we also provide a range of managed support capabilities to our clients to try and close the gap. Frankly, given the lack of available OT resources, OT security resources that are out there. And what we’re going to talk about today is what we’ve learned in applying a different approach. A more proactive approach to OT security management with our clients around the world.
So, a quick summary of the discussion today.
The first is that operations technology is really under a growing threat, and that threat is giving rise to changes in the regulatory landscape, insurance landscape, etc. That’s really, you know, sort of driving a need for a different approach and that really creating a robust programmatic effort requires a different approach than most industrial organizations have approached their OT security in the past.
And lastly, the good news about this is that many organizations have found a way to really build this robust program effort where they essentially set an objective, and then over time, they consistently drive an improvement in their OT security maturity that addresses both the risks as well as the various compliance requirements.
And so we’re going to talk about some of the case studies that we’ve seen of how organizations have done that and what are the key criteria of being able to do that effectively. So to start with a bit of the landscape of the threats that are out there. So what we’ve coined this term at air raid for what we’re seeing out there in terms of the shift in OT cyber right now.
First of all, the first “A” is around attackers. There’s been a pretty dramatic shift in the types of attackers and volumes of threats and attacks that are out there specifically focused on industrial environments. It’s moved from a world primarily focused on HPTS, nation-states, etc., to now a much more prevalent and much greater growth in financially motivated hackers leveraging ransomware, etc. We’ll talk a little more about that.
The “I” versus the “I” is for IT, and obviously, we’ve seen a lot more IT move into the OT world, right? Servers, virtualization, etc. All of those things bring vulnerabilities that bring risks along with them as they come in. We start to get blending of these worlds that weren’t originally blended.
The first “R” is regulation. We’re also seeing, due to those threats and attacks, a growth in regulatory requirements and compliance pressures, whether that be the TSA pipeline requirements, the implementation of NERC CIP-type rules in Chile, or the Kingdom of Saudi Arabia releasing their rules, essentially a range of different but much more prescriptive cybersecurity requirements from regulators.
Resource constraints and, probably speaking to the choir on this one, preach to the choir. But as vulnerabilities increase, we’re not seeing an equal increase in resourcing access, the growth in IoT, or the growth in the need for remote access during COVID, all create additional risks to these insecure by design type networks.
Another big thing is insurers. Three years ago, insurers really weren’t focused on OT, but with the increase in ransomware events, insurers have now focused on OT as well and are now saying, “Look, you need to keep the same level of security in OT as you’ve got in IT.”
And then finally, directors, partly due to the regulatory drive, the insurers, and the amount of threats out there, directors are now getting really focused on this world.
This mnemonic is just a way of trying to simplify the various trends we’re seeing, but it’s a pretty big shift over the last couple of years, as evidenced just in a couple of data points from the SANS ICS survey, where you look between 2019 and 2021.
The severity of the risks, based on feedback from practitioners, has essentially doubled. And when you look at the threat vectors, as I mentioned before, ransomware, two years ago, was not even in the top five in terms of the greatest threat vector, and now it’s number one by a significant margin.
So we’re seeing a significant shift both in the degree as well as the type of threats, and that’s been driving some of these regulatory requirements, greater auditing and assessment requirements, right? So it’s not going to be enough just to say, “Yeah, well, I’m monitoring that system.” No, I need to document what I’m doing. I need to know, I need to be able to do mock audits, prepare for assessments, I need to be able to respond to those findings.
Those assessments and the auditing are going to require us to do what we refer to as OT systems management. So we’re going to need to have accurate inventories. We’re going to need to know our vulnerabilities on a regular basis. We’re going to need to be patching things within timeframes. We’re going to manage users and accounts, etc. We’re then going to have to report on all of those things we’re doing so that we can track how well we’re doing and that we’re moving along a progression. And then finally, the board is going to be much more involved than ever before.
These regulatory changes are causing us to need to know more, to need to act, and to take OT systems management actions. Finally, they’re requiring us to report in different ways, which is just, again, we’re going to have to have a different approach. It can’t be traditional business as usual. So, as I mentioned, it really requires this different approach. So what we’ve laid out, there’s this industry reality of the threats, the regulation, etc., and there are some implications that can be grouped into three broad categories.
The first is we, as an industry of industrial companies, are going to need to demonstrate progress in risk reduction. We’re going to be able to show how we’re doing IT-type security in OT. That’s going to mean more than just having an inventory or some sort of detection tool. I’m going to need to demonstrate how I’ve reduced my risk, how I’ve applied protections, have I maintained user and access control? Have I maintained my configuration hardening? Have I been doing patching on a regular basis, software management, and I’m going to demonstrate that, not just say that I’ve done it, but demonstrate that progress.
Similarly, on the detection side, we’re going to need to move from alerting and say, “Great, I’ve now got alerts flowing into my SOC,” to accelerated response, right? So I need to be able to know that I’ve got a breadth of telemetry. It’s not just enough to have network anomalies, but I’m going to have to have endpoint and host-based and AV information, etc. But then I’ve got to move from alert to response and what we refer to as least disruptive response. So what’s the response I can take in that OT environment that is the most targeted to stop that threat in the least disruptive operations?
And then the third broad implication is to drive efficiency while being safe. We just don’t have the resources to do this unless the way we do it is really, really efficient. So we’re going to talk a little bit more as we go today about those implications. But in broad terms, it really comes down to similar kinds of security that we see in the IT world, right?
So this is the NIST cybersecurity framework. But what that means is we have to be managing the systems, or we have to be doing active or proactive OT cybersecurity management, whether that be knowing all of my information about my vulnerabilities, my users, etc., being able to protect with updating patches and hardening configs and application whitelisting, etc., and all the way around through the recovery phases. But essentially, those requirements on the prior page really require us to think about a more comprehensive approach.
By the same token, we know that OT is different, right? And we’ve been doing this for 30 years. We know these things are different. We’ve got legacy equipment, you know, 70% of the assets won’t be running a traditional OS, right? So how do you know when we scan them, we very well may knock them offline.
If we start to take automated remediation actions that we typically would do in the IT side, you know, the so-called type two error or, you know, we take a response action that causes operational downtime when there was really no security incident in the first place, you know, significant risks to these things. So we’ve got to come up with a way to deliver those requirements, but recognizing how different OT is. And what’s happened is a lot of times, when we first talk with organizations, they’re kind of dealing with this sort of poor set of choices, right? They want to realize the goals, but because of those challenges on the prior page, they’re trying to deal with a whole bunch of different siloed OEM-approved tools. They’ve deployed some sort of passive approach gathering network information, but it’s not really giving them much in terms of depth of information. They can’t use it to actually take any actions. They’re trying to do compensating controls because they’re not able to patch it. So there’s just a lot of struggles with, okay, it seems like a great idea to do this, but how? Right, I don’t have many great options.
So what we’re going to talk about for the rest of our time together is how do you do this? Right, what are the ways that we’ve seen clients achieve this? Typically, they go through this four-phase approach. So they establish the objectives. What’s our goal? What’s the improvement we want to make? What’s the framework we’re going to use to go after that?
They then assess their current environments. We typically use what we call a technology-enabled assessment, where we’re getting deep data from those environments to be able to do a data-based assessment, not survey-based but actually gathering directly from the endpoints. And then build a prioritized roadmap of what they can do to remediate. And then there’s usually an initial remediation phase and then an ongoing monitoring and maintenance phase. And obviously, while we’re doing that, we need to think about the organization design and development to support that overall progress. So this is the typical approach that we’ve adopted. But that approach has to be built on a set of criteria, meaning we need to have any program that’s going to be successful meet a set of criteria. These are the criteria that we have found drive a successful OT cybersecurity program.
First, it’s got to be OT-safe. We can’t be taking things offline or lowering security. Number two, it’s got to be comprehensive in its visibility of the risks and allow us to build a comprehensive roadmap. Number three, it’s got to get to proactive OT systems management. It’s got to allow us to take actions to remediate the risks, not just to know they exist, but in a simple, efficient way, remediate them to that efficient point. It’s got to be scalable. We’ve got to be able to do this without deploying a ton of infrastructure, and we’ve got to be able to do this without adding a ton of headcount. And then finally, we’ve got to have IT and OT work together. Because first of all, from a scalability point of view, but secondly, IT organizations are going to want this visibility and not have to have two competing programs. Those are the five criteria, and we’re going to talk about each one of those in a little more detail.
So first is the OT safe part of this. We need to make sure that as we’re doing the visibility and the assessment, we’re not causing disruption, but that also that we have a way of remediating without causing disruption. So we’ve been doing this, as I mentioned, for about 30 years. I think one of the big findings that we’ve had is that it’s very easy to know what you cannot do in a control system.
Lots of people will tell you what you can’t do, right? You can’t patch. Well, maybe you can’t, right, but they’ll tell you what you can’t do because of our team’s background as control system engineers. All we’ve learned is there’s a lot of things you can do safely that allow you to achieve these objectives safely and effectively. So before we get into this, when I’m talking a little bit about the myths and the reality, right, so there’s a lot of myths out there. You know, it’s really impossible to do vulnerability management, patch management. The reality is no, it’s not. Real-time vulnerability detection and strategic patch management are possible. Obviously, it’s not going to work exactly the way it does in IT, but there is a way to build a programmatic approach to patching.
Secondly, Network Packet Inspection is the only safe way to gain asset visibility, right? Can’t do anything else. The reality is no, that’s not true, right? There are OT-safe endpoint approaches, ways of getting directly to the endpoint, which we’ll talk about in a second, that can deliver that comprehensive view without just relying on what’s impacting the network.
Third, OT security has to be done in a plant because each one of those plants is so complex and processes are different. There’s no way we could centralize anything. The reality is there are ways of breaking up the problem, centralizing for scale certain things like the analysis portion of what we call Think global, but then ensuring that when we’re starting to take remediating actions, we’re acting locally and reusing that OT process knowledge locally before we start taking actions to remediate risks. Anyway, there’s a variety of these. These are just a few examples. But the point is, as we think about being OT safe, we need to take a step back and decide what’s reality and what’s a myth? Because we’re going to have to think differently to be able to achieve the objectives that we talked about before.
The second main criteria is to be comprehensive, right, to really see that holistic, what we call the 360-degree risk score of an asset, where we’re seeing patching vulnerabilities, users and accounts, configuration status, AV status, network risks, etc. And be able to bring that into an integrated view with depth on each particular piece. So really, in our view, that visibility forms the foundation of everything we do. You really can’t build a programmatic approach unless you get that visibility, asset inventory, firmware and software, not just, by the way, DLLs, but all the applications, the patch status, the connectivity, flow, etc. And that allows us to make trade-offs. It allows us to avoid false starts like, you know, so many organizations we see. They’ve started down a path of network segmentation, only to realize, “Oh my goodness, actually, I don’t even know what assets I have, and I’m trying to segment,” and then they have to stop. So starting with that endpoint and network visibility is key. However, as I mentioned before, relying just on a network-based packet inspection approach is just not sufficient for what we need to do as both threats and regulatory requirements change.
You know, I’ll know maybe it’s a Windows box if I can see the traffic, but I won’t know the patch levels. Knowing the OS version is very different from knowing the patch levels. I won’t know the policies on that device. I won’t know the users, particularly the dormant users, which is where the real risks are. I won’t know all of the applications that are on that device. You know, I know that it’s a switch and I know what’s communicated through the switch. But I don’t actually have the whole rule set from that switch or firewall or whatever to be able to know if it’s configured appropriately. I won’t be able to see down into the layers of the network, right, of those PLCs and then the EMB T cards in the rack and the devices on the backplane of that device. So, in other words, it’s good to have something, but if we’re going to achieve what we need to, we’re going to need to move beyond just packet visibility.
So, as I said, there’s good news in this right that there are solutions, and we’re going to talk about ours today. I’m sure there are others. But from our standpoint, we’ve been deploying this across pretty much every known vendor system and as a vendor-agnostic approach to get to endpoints. We’ve done this for 15 years, pretty much every control system type. What it entails is to have that centralized reporting server that I mentioned before that allows us to see all of the facilities in one place, and then to have these virtualized asset managers at a site level or group of sites where that action happens. So when we want to take actions, we can control it there and bring the OT people into it. But then we get into the endpoints, right, where we can put an agent on an OT-safe, very lightweight agent on those Windows, Unix, Linux devices, and then communicate directly to all of the embedded devices in their native protocols.
And that endpoint approach, first of all, it’s 100% safe, no issues from an operational standpoint because of the way it’s architected and lightweight, etc. But it allows you to get that rich data, you know, literally thousands of pieces of information about each endpoint that you need to demonstrate the improvement that you need to make.
So, the way we think about this is that that asset inventory really forms the foundation of what we’re doing. And that asset inventory then enables a series of applications on top of that that allow us to bring all of those different applications together, whether that be vulnerability management, patch management, configuration, etc., all within a single platform. And by doing that, we can do that 360-degree risk view. We can prioritize limited resources because we know all the risks in one place. We know, “Okay, this device is unpatched, but I know my AV is up to date. I know my whitelisting is in lockdown. I know I’ve been updated back up, and I know it’s behind a firewall.” All right, let me make a different set of choices about whether to patch that and the timing of the patch. If I know all the compensating controls that I have on that bus. This concept of having all of this foundationally built off that inventory, therefore allows us to do that 360-degree view, and that becomes the foundation of a proactive remediation roadmap.
So, I can think about my maturity level one, okay, I can target certain critical vulnerabilities I need to patch. I can harden configuration settings of certain elements like lockouts and password settings, etc. I can ensure that my backups are updating. Second, monitor those on a real-time basis, etc. Those are things I might be able to do very quickly, whereas there are other things like deploying Application Whitelisting, but that may be a next level, and then over time, move on and do more and more. This is just an example. We’re not suggesting this is what you should do tomorrow. The point is that laying this out, using that deep data that you have about the environment, allows you to be very specific about what that roadmap and program will look like.
Once we’ve got that, and once we’ve developed that insight about all the assets, what the risks are, and what a roadmap is, we then need to be able to take actions, right? Use that platform to patch things, to configure things, to manage users, and to report on what we’re doing and how we’re making progress on things. And then, importantly, to be able to do real-time monitoring of those environments to both monitor drift from or improvement in our security posture, but also to identify emerging threats.
So, just as an example, right, we go from that 360-degree risk assessment to specific endpoint actions, right? And this is a patch action. So the idea of using an OT SM platform (IoT systems management platform) allows you to go directly from a 360-degree view, saying, “Here are the devices I need to patch,” and then to go ahead and patch those or to build mitigation plans and compensating controls where we can’t patch immediately. Similarly, from a proactive approach, we also need to do threat detection, right? But again, we need to think about threat detection not just as, “Okay, great, I’ve got alerts with some deep packet inspection,” but, no, I’ve got what might be thought of as XDR-type approaches where we bring in log data, Syslog, NetFlow, configuration changes, metrics on devices, process control alarms, right, all of that into an analysis tool that allows you to draw conclusions about what are the greatest risks and also how to take the least disruptive action to improve your posture.
So the balance, as I said before, is risk reduction with threat response. Those implications we talked about up front, right? We’ve got to be able to reduce risk, demonstrate reduced risk, identify and eliminate shared passwords, remove risky software, ensure backups are up to date. And simultaneously, we need to do threat response. We need to be able to monitor for things like deploying and monitoring Canary files. But then we need to be able to respond to that when we see it and be able to shut down a user account or take a device offline if we see it, as examples.
The fourth key criteria is that it’s got to be scalable with low costs, right? We can’t add a whole bunch of infrastructure. If we’ve got to invest in taps and cabling, etc., or a bunch of servers, it’s not feasible. Secondly, we ought to be able to do this in a way that’s scalable from a resourcing point of view. So on the resourcing side, this is the biggest challenge that every organization has – insufficient ICS expertise, insufficient personnel. If we’re going to achieve those objectives I talked about at the beginning, we’re going to have to find a way to do this with low cost and efficiently.
The way we’ve approached this problem is what I’ve referred to before as this “Think global, act local” approach. We take that data from a local site, multiple sites, whatever it happens to be, and we aggregate all of that up into a centralized database. That allows us to have a small team of security experts to analyze the risks, prioritize things, build playbooks, etc. And then to distribute those down to the local sites. But then before any action happens, before any remediation action happens, we have the “act local” part. So, in other words, the local process control engineers then get in the game and say, “Okay, we can take that action. We’re going to do that during an outage. We’ve got a redundant system, so we can do that.” Redundant system, etc. And they then execute the action. It’s all automated, right? So they’re not walking around with USB sticks, but they control when and how those actions happen. This allows us to save about 70% of the labor because you’re centralizing that analysis, but also to ensure safe, reliable operations at the same time.
Finally, it’s IT-OT coordination. So if we’re going to make this programmatic approach work and deliver on what we need to, we need to bring IT and OT together. There’s a bunch of skills that IT has that we need to leverage, but also IT is going to want visibility into what we’re doing on an OT security front. So, first, on the responsibilities, right? So back again to this “Think global, act local” approach, right? IT has a bunch of skills that we should leverage. They know how to analyze vulnerabilities, they’ve got threat intel, they know how to operate a SOC, etc. Great. Let’s leverage that experience and knowledge. Similarly, OT has unique capabilities, right? They know what you can remediate and when, they know the challenges operationally, they have of how they’re going to segment their network, when they can take these OT execution actions. And so together, by bringing these groups together, we can really get the right risk assessment, remediation planning, and build the right training, etc.
So, the key to this is getting IT and OT to work together. But it’s not just organizations and technology, right? So, that OT systems management platform that we talked about before, with the asset inventory that allows us to apply those applications around the edges, that’s got to fit into your enterprise IT security management platform, whether that be your CMDB with ServiceNow or whatever it is, your enterprise risk platform, your enterprise SIM. And then similarly, we’ve got to be able to bring data from the enterprise into the OT management platform, whether that be OEMs’ approved patches or enterprise AV tools that are on the OT environment. That are really IT-approved. So again, the two worlds technically also have to work together, and one of the things we’ve spent a lot of time on is building integrations to make sure that our platform works seamlessly with these IT tools too. So you can have one integrated machine, which this image projects.
Finally, just to summarize, there are these five key criteria: OT-safe, but again, separate myth from reality. Number two, a comprehensive asset ability to get all the way down to the endpoint to give you that 360-degree view. The ability to take action and remediate things, and to monitor with a holistic view so you can take those risks with least disruptive response actions. Scalable, and finally, coordinating between IT and OT. Those are the key five criteria that we’ve seen be successful in really delivering dramatic improvements in OT cybersecurity. So I thank you for your time, and I really look forward to hearing feedback and answering any questions that anyone might have. Thank you very much.
How to achieve a successful and efficient programmatic response to the current and future regulatory environment for ICS/OT cyber security.Learn More
Learn efficient & compliant OT patching with Verve's 6-step process, ensuring streamlined operations and regulatory adherence.Learn More
Develop a business plan to creates the right momentum, focus and budget to truly make measurable progress against cyber-related threats.Learn More