Protecting Embedded Systems in OT Cyber Security

Many wonder where to start when securing embedded systems? So before I dive into the answer, I want to share two pieces of information that guide me when understanding cyber security issues for embedded devices and the never ending knowledge acquisition process:

Reflect on what you learned yesterday, a week before, and then a month. First, you need to give yourself credit. Second, know that tomorrow you might restart that entire journey, but today, you clearly know more than you knew yesterday.
Be comfortable being uncomfortable. I know it is not easy, especially with embedded systems we could knock offline or brick. But the idea is – do not be afraid to learn, experiment with easier layers of the “art”, and build up confidence. Everyone starts somewhere.

And yes, I practice what I teach, so let’s jump into those questions you asked during our Embedded Device Webinar.

How do we know about vulnerabilities in firmware we use? Is there a distribution list we can subscribe to?

Great question, and unfortunately, the answer is it’s complicated when it comes to defining, reporting, and matching products to a vulnerability disclosure (e.g., a CVE). Academics try to tackle the CVE to CPE (product) challenge, but for the end user, the simple straight forward answer is to follow feeds:

Follow CISA ICS Cert advisory page
Follow your country’s advisory page
Follow various product vendor security portal pages
Signup for email distribution lists where possible for your various product vendors
And subscribe to experts (social media, or through a service – shameless plug)

Pro tip: Make ICS CERT (and others) one of your browser’s home pages when you restart your browser of choice (e.g., Chrome or Firefox).

Assuming you have an asset inventory to cross reference, or a platform that has frequent CVS updates, you should see new vulnerabilities or notices matched against the assets you own in order to track that however you wish.

How does an organization identify ICS devices, and automate a “live” inventory of the ICS devices to drive Vulnerability Management?

In the previous question, I alluded to the need for a comprehensive list of the assets you have deployed and in scope. Now also recognize, that Verve is in the business of active native interrogation of devices (where possible). This is done in a variety of ways, including through passive detection of devices via their network presence (aka transmitted packets), examining proprietary OEM files that contain information about the asset’s deployment, and polling the devices in a safe and validated fashion.

However, over the past several years, there has been a big industry-wide focus to use passive solutions, but I warn decision makers to understand the limitations of passive detection before thinking they will result in meaningful risk reduction. Check out the results from my Pre-Verve ICS Detection Challenges (1 and 2), and interpret what you wish.

Therefore, an organization identifies ICS/IACS/OT devices through a process that is ideally modern (and has only IP-based assets). Reality and experience tell me something else, so a general time consuming and naïve method would be:

Collect and aggregate an understanding of the network and the assets on it (even if this is done by collecting spreadsheets)
Revise with asset owners, and limited tests (e.g., offline data, or a Sean Connery-esque “a single ping please” method)
Use OEM tools, screenshot, scrape, and validate/cross reference obtained information
Periodically repeat and update, and/or whenever assets are introduced/retired

Seems painful right? It is. Verve automates that part, using technology and 25+ years of experience operationalizing solutions. At the end of the day, you need more than just a MAC address, IP, and a vendor OU match to really make sense of the asset’s attributes.

Pro tip: The detailed information you need for understanding an embedded ICS asset is generally not present in most traffic unless specifically queried, and potentially transmitted under specific conditions (e.g., start up). Merely looking at packets as they fly by in a steady-state facility will provide little insight and be full of false positives (or missed assets hiding behind a TCP/Serial gateway for example).

Could you do it in a “poor person fashion”? Yes, but I would not because you need to know:

Concisely how each of those ICS devices communicate
The exact commands/requests to retrieve the relevant data
How to parse/transform the results into something usable
Know how to programmatically get the result to the consumer

It is not trivial for non-commodity systems or even with OpenSource tools. Do not fall into the trap of “oh but Modbus is openly available, or EtherNet/IP (also called CIP)” – sure the specs are widely published, but just because you know the spec – does not mean you know how to account for local “dialects” and device idiosyncrasies.

The edge-cases will break you, and when you wind up with a non-standard protocol, you may wish you saved yourself the complexity by working with a company who spends the time to understand the devices or has the knowledge/experience to do so.

What are methods for safe discovery and common ways to mitigate vulnerabilities in legacy devices?

Before and during the embedded webinar presentation, we fielded answers about discovering vulnerabilities in devices (known or unknown), and what can you do to mitigate them. Obviously, that is a loaded two-part question, but there are a few paths to the discovery of vulnerabilities (and respect to my two opening hints of wisdom):

Leverage your security feeds and build up your own understanding of a specific system under consideration (SUC in ISA-62443 language). Chances are relevant information can be obtained from a sibling product from the same vendor or another, and so you can begin forming hypotheses.
Make intelligent observations using an offline system or a system that is well understood in a NON-CRITICAL process/function. This means talking to it over Telnet, looking at the files on the SD-Card etc… It is a lot of work, and it is a knowledge-based journey. However, that is only a mile wide, and an inch deep. Real knowledge is needed to understand a variety of issues specific to that product, the embedded hardware, the OS (and how it works), and more…
Read product documentation for hints that are not specifically written in security language e.g., this will break if someone does this, or use credentials to stop X, and disable Y functionality.
Use passive methods to find vulnerabilities. For example, what can you see with Wireshark when using OEM tools to talk to a device?
Test if you have an appropriate setup and organizational commitment to dealing with devices you may break in the process. Common methods to test would be to use protocol fuzzers, automated, but highly monitored/step by step test stacks, applying tribal knowledge, and other technical skillsets.

Regardless, if you find an issue, it is repeatable and looks like it is not addressed by a disclosed public CVE or through a vendor application note. Congrats! And your next step would be to report it to the right channel (see CISA CVD process).

To mitigate and treat the risks around several legacy devices – that is a very lengthy question. From an asset owner’s perspective, you may wish to have:

Appropriate policy and guidance around legacy devices
Ensure you could do a full restore start to finish for them (including take them out of physical inventory and program them)
Reduce the attack surface on the device itself by wisely disability functionality
Prevent network access to vulnerable devices or segments (e.g., zones and conduits)
Have in place a detailed asset management strategy that monitors for changes on the device Vs. looks at out of band project files that are likely out of date
Have a plan and practice restoration of last known good configurations and compiled logic for embedded systems
Have a vulnerability management program in place that tracks vulnerabilities and potential firmware upgrades to embedded systems.
Prepare and test your Incident Response procedures
Adequately protect the privileged systems that interact with vulnerable devices or their network segments. Chances are that an attacker will use one of those to execute OEM functionality and affect your process vs. targeting the device directly.

Regardless, not all of the above apply specifically to the embedded devices that we consider vulnerable or insecure, but are compensating controls designed to add additional layers of protection to deter or alleviate risk to a tolerable level.

In embedded systems, are there firmware configuration features that can be used for least functionality, least privilege, management of change, etc?

It depends. Embedded devices are not typically open platforms (even though some may run Linux), and they vary from vendor to vendor, or product to product. However, if you wear your MacGyver hat – you might occasionally apply functionality within these devices to provide controls. E.g., PLC programming best practices, or by not allowing Modbus writes to them, or changing the default passwords and monitoring logs.

The reality is that in many cases, these devices, and those that are even older require a higher level of attention, and some creativity. For management of change, the question is, how do I gather information from the device, and actively monitor it for change? Thankfully, there are solutions that do that, but they likely are not on the device itself.

On the other hand, many devices have some level of Role Based Access Controls (RBAC), and you can set the username/passwords, but you should be protecting them from physical access, set the “run key” to read-only, and secure the privileged workstations that have the OEM software which can usually communicate to these systems at will. The latter point highlights another aspect – by not using default credentials, you might buy yourself precious seconds to detect anomalous activities and make it a bit harder for an automated script or program to accidentally stumble into an embedded system. Assuming of course you are monitoring and alerting/acting upon the logs on surrounding systems.

Pro tip: In many cases, you may already have a huge fleet of already deployed devices, and merely changing the credentials requires a shutdown or scheduled maintenance window… or break other components such as HMIs. Utilizing RBAC functionality in existing deployments may require some thought.

Pro tip: Another option is to use industrial firewalls that apply access controls at the protocol level, but that is not specifically an answer to this question. Instead, appropriate firewalls are an additional compensating control (to be used to secure the conduit/zone) and require on-boarding into your asset management system.

Can I “patch” firmware for OT/ICS devices such as RTU, IED, and PLCs?

Another question that starts with the typical engineer response – it depends. And unfortunately, on embedded systems when compared to their commodity IT/OT brethren, they have some challenges in the upgrade/hotfix department (see the Embedded Devices & Firmware in OT whitepaper), but in addition, even if a fix is present, there are some operational issues you need to understand:

What is in the update? What does it address? Does the fix improve stability? Solve a critical flaw? Add functionality?
Is this update necessary? Does it need to be deferred? Immediately deployed? Or never applied?
Will this firmware update have a possibility of failure? If so, what is my rollback plan?
Will this firmware update require a scheduled outage and stable power? If so, I need to make sure its scheduled, and all change management processes are followed.
Will this firmware update require a specific process that must be followed as per the OEM guidance? If so, make sure the statement of procedure (SOP) is followed in accordance to the OEM and your organization.

If you still deem the update to the device as relevant and necessary in accordance to your organizations criteria, then yes, patching OT devices is feasible, but it won’t be a frequent activity like the patching programs for perimeter firewalls or commodity Windows assets. Again, that is why it is critical to consistently and frequently monitor assets directly to determine changes in your organization’s deployed asset inventory.

Pro tip: Patching or performing firmware updates is possible and feasible on embedded systems should an update be available, AND it fits your organization’s criteria. It should not be solely relied upon in the same way as in IT, but rather to be used to ensure stability and security of the processes and communication it is supporting.

Pro tip: Patching commodity OT systems such as embedded routers closer to the perimeter (e.g., CISCOs) is a far less concerning task, and they should likely be prioritized given your organization’s dependence on them for security. However, you must still abide by appropriate risk and change controls. Do not forget to include networking infrastructure in your asset inventory.

How can I change the culture to create a better cadence for patching and updates? How can I influence vendors to maintain products, combat cyber security issues at hand today and for the future?

There are several ways to affect change positively whether outside in the community or within your organization, and usually, its done through human-to-human means initially.

To improve culture, start by demonstrating that its possible by making a thoughtful, but rigorous engineering-based organization that considers all aspects vs. outright denies them. One potential idea is to implement that culture beginning at the interview stage if you are hiring, and instilling what is to be expected vs. allowing them to be “poisoned” by current values or attitudes. This way, you grow a team of positive people, but also as a step in taking an active part in changing security culture.

Another piece of advice is to champion patches and updates for “low hanging fruit” or low risk systems. Build a reputation for following process, being rigorous, detail-orientated, understanding of your environment, accepting of insight from local site owners who have a metric ton of “tribal knowledge”, and build relationships around trust. I guarantee that if you stick with it, have adequate support, and don’t rush into things while taking one victory at a time, you can change the patching culture and cadence for A LOT of systems.

Pro tip: Do not stop patching/maintaining/securing systems or their surrounding adjacent systems. Security degrades overtime and it consistently needs attention or it will rot. You still change the oil in your car, even if newer better versions of it exist, otherwise it will degrade and then catastrophically fail.

The other aspect to this question is how to combat cyber security issues today with an eye on the future, but also affecting vendors to create more secure products. Here are some ideas:

Make investments in security that enable and multiply the effectiveness of other technologies, your processes, and your resource’s time
Invest in getting your organization to a sufficient level of cyber security maturity, and maintain it through consistently applying the basics; they have been proven to provide a measurable effect on residual risk, and often provide a reasonable level of protection.
Add security language to RFPs, and validate any claims by the vendor before moving to an installation.
Use the more secure options in products, and be apart of the user group that uses them. Don’t allow devices to be setup “as is” and left… the best chance to add security is early on when deploying. systems/replacements, and to transition systems away from using insecure options or defaults as you get to them (or as they retire).
Ensure you have security processes that require (and validate) cyber security as part of Factory Acceptance Testing (FAT) and Site Acceptance Testing (SAT). Also known as CFAT and CSAT.
Report issues to vendors or CERTS. I recognize that the CVSS/CVE system has flaws, but awareness forces companies to fix issues, and forces your organization to manage them (especially if in a compliance orientated environment such as one that needs to abide by NERC CIP).
Create end-to-end tested processes for overall cyber security governance, and adequate training + livefire exercises to ensure your resources are adequately prepared to reduce disruption and restore assets in the eventuality of an event.
Be engaged in community efforts, ask hard questions to vendors and industry experts. Try to be a part of the change vs. a stonewall. Also consider experiences and how other industries are doing things – chances are, there is prior art we can use in the ICS/OT world!

Creating change requires a culture shift, changing the relationship demeanor with vendors, and holding people accountable (but not punishing them). I recognize this is a huge lift, and it will be continuous, but it is a marathon. So don’t wear yourself out and make wise choices that have impacts.

Verve's Biweekly Newsletter

Fill out form below