Runtime Integrity Measurement Overview


Runtime Integrity Measurement

At Invary it's our mission to protect people, organizations, and governments from malicious entities looking to do cyber harm. Our patented Integrity Measurement technology enables this mission by characterizing the behavior of good software through analysis of compiled binaries, measurement of software internals at runtime, and appraisal of this runtime behavior.

When applied to operating systems, Invary’s Integrity Measurement solution provides a powerful mechanism to ensure that your infrastructure is operating as intended, even at layers invisible to most monitoring and defense systems. It is able to accurately identify specific changes in the behavior of your operating system whether due to an insider misconfiguration or due to a purposeful, dedicated cyber attack.

In this article we’ll dive into the technical details of a real world rootkit, explore how that rootkit compromises an operating system, and then learn how Invary’s Integrity Measurement system detects even novel rootkits that exploit zero day vulnerabilities in your systems.

Operating Systems 101

Before we dig into the technical details of rootkits, we’ll take a short detour to learn just enough about operating systems to understand how rootkits work. Operating systems sit at the boundary between hardware and applications, a unique position within the modern software application stack, with applications running above the operating system and hardware running below it. On one side, they are responsible for interfacing with, configuring, and securing all of the hardware resources available within the system. On the other side, they are responsible for abstracting and partitioning these resources in a secure manner across many different users and applications.

These systems are amazing and interesting not just because they’re incredibly powerful, but also because they’re incredibly complex, extraordinarily difficult to engineer, and absolutely critical to the security of any computer system. The Linux kernel source code has over 30 million lines of code, is maintained by 1000’s of extremely talented engineers, and receives updates many times every single day. All of this code is carefully crafted and reviewed to ensure that it is highly performant and secure.

Structure of an Operating System

Like all software, the Linux kernel is composed of processing instructions and data residing in memory, on disk, or across a network connection. The data can be further segregated into either structured data (called data structures) or opaque binary data. The structured data takes on a specific “shape” or set of shapes in memory, as defined by the variables and types in the software’s code. The processing instructions operate on and change these shapes with the shapes and instructions together defining the surface area of all possible behaviors.

Operating systems differ from normal applications because of their unique position within the software stack. They must securely mediate an application’s access to hardware and must isolate separate applications from one another, unless resource sharing is explicitly requested and allowed. Within an operating system kernel this is done by forcing all hardware access and all data sharing through kernel mediated processing.

This process is implemented using primitives called “interrupts”, which are classified into hardware interrupts and software interrupts. Hardware interrupts are part of the hardware/kernel interface and are used by hardware to tell the operating system kernel that something important in the hardware has happened (e.g. a network packet has arrived on the network interface card and should be processed by the kernel). Software interrupts are part of the application/kernel interface and are used by applications to request that the kernel take some action on behalf of the application (e.g. the kernel should read network data into the application’s memory).

The Application/Kernel Interface

Hardware and software interrupts work in a similar fashion, and the implementation details of both are important from a security standpoint. In this article, however, we will focus on the details of software interrupts since the rootkit we will look at later compromises this code path.

For an operating system to process a request on behalf of an application it needs to define a mechanism to transition from the application to the operating system kernel, a mechanism to define what request is being made, and a mechanism to define what data the request should operate on and/or return. These three mechanisms together define “system calls”, the application/kernel interface.

As discussed previously, software interrupts are the mechanism operating systems use to transition from application code into kernel code. These are implemented as specific processor instructions that are placed in application code paths, typically by using library functions that are shipped with the operating system. In the Linux operating system “glibc” is the software library that contains these software interrupt code paths and is the common application level interface into the kernel.

To define what request is being made and what data is being processed or returned, most operating system kernels enumerate all possible requests using a single integer value that is used to index into an array of function pointers. Each function pointer in the array points to a function that defines the data structures expected and implements the processing required for each request. The array of function pointers is known as the system call table and we’ll see later that this table of function pointers is a target commonly attacked by malware.

As an example, the “open” and “read” requests in the Linux kernel are defined as system call “2” and system call “0”. To read the contents of a file on disk, an application will send “system call 2” to the kernel along with a string “pathname” defining the path to the file, and an integer number “flags”. The kernel will validate this request, apply security rules, and return a single integer value “file descriptor number” if the request was successful. The application will then repeatedly send “system call 0” along with the file descriptor returned by the open system call, a memory region that the application has set aside for the file contents, and an integer number of bytes to read, until all of the data has been read from the file.

As mentioned above, most applications do not work at this low level directly but instead utilize higher level libraries that give meaningful names to system calls (e.g. “open” and “read” rather than “2” and “0”) and codify the expected data structures and return values. All of these libraries and abstractions, however, lead to the single application/kernel interface; the system call table.

Kernel Mode Rootkits

Operating systems, as a consequence of their unique positions as the boundary between applications and hardware, are critical to the security of any computer system. By mediating access to all information about all resources in the system, the operating system exercises an immense amount of control over what applications can see and do and how applications interact with one another.

This purposeful design is what allows an operating system to successfully fulfill its role in securing our computing systems, but also represents an enticing target for cyber criminals looking to exercise control over your systems and data. Malware that targets operating systems are typically called kernel mode rootkits, or just rootkits for short. It is well established in the security industry that this kind of malware is extremely difficult to detect and mitigate.

This type of attack technique cannot be easily mitigated with preventive controls since it is based on the abuse of system features [2].

A successful rootkit can potentially remain in place for years if it's undetected. During this time, it will steal information and resources [3].

The Drovorub Rootkit

Drovorub is a sophisticated Linux malware tool set that contains a kernel mode rootkit. It was disclosed by a joint announcement from the National Security Agency (NSA) and the Federal Bureau of Investigation (FBI) in August of 2020 [1].

The announcement from the NSA and FBI goes in depth on the technical details of the Drovorub malware, but unfortunately the technical details of the kernel module rootkit are only discussed in terms of high level capabilities:

  1. In the “Drovorub Implant Operation” section, the announcement states that the rootkit sets up all of the “system call hooks” that it needs for its operation.
  2. In the “Evasion” section, the announcement states that kernel functions are hooked either by “patching the functions directly” or by “overwriting function pointers that point to the functions”

The announcement also discusses the high level capabilities that the Drovorub malware achieves using its kernel mode rootkit and its function hooking techniques:

  1. The rootkit hides select processes from both system calls and the “/proc” file system, including all child processes of hidden processes.
  2. The rootkit hides select files on the system, making them invisible to any directory listing but still allowing them to be accessed if the full path to is known a-priori (as the malware itself will).
  3. The rootkit hides select network sockets on the system, including all network sockets that are used by hidden processes.
  4. The rootkit hides select network packets from raw sockets.
  5. The rootkit hides select network packets from the Linux netfilter system, commonly used by firewalls to implement packet filtering, including all packets to or from hidden processes.

Taken together, these capabilities make it extremely difficult to detect and mitigate the Drovorub malware. Live response tools, host based tools, antivirus software, EDR/XDR tools, and logging tools all depend on kernel system calls to facilitate their correct operation. By compromising the system calls and kernel functions directly, the Drovorub malware is capable of compromising all of the applications running on the system, including any installed security or defense tools.

After a rootkit infects a device, you can't trust any information that device reports about itself [3].

A rootkit will interfere with your device’s functions, including your security software. If you run a security scan, a rootkit will often prevent your security software from showing you this information so you’ll have no idea that malware is running on your device [4].

Network intrusion detection tools might have insight into the C2 network messages used by the Drovorub malware but are subject to evasion via TLS encryption, message structure changes, IP address changes, host name changes, and other easy to use techniques that make it difficult to stay ahead of cyber criminals.

Implanting Kernel Code

The details on how Drovorub is initially implanted into a running operating system is not discussed in detail by the announcement from the NSA and FBI. There are various known approaches to this, however.

One approach is to simply implant the malware as a dynamically loadable kernel module/driver, something supported by all general purpose operating systems. This simple technique requires administrative privileges on the target system which can be gained using any vulnerability that allows privilege escalation, with the rootkit allowing the malware to persist itself and facilitate criminal activity without being detected.

In modern systems modules/drivers must be signed before they can be loaded into the operating system, which provides a base level of protection for ensuring that malware cannot be easily implanted. However, this protection can be disabled by configuration and there have been several documented rootkits that used valid, signed modules/drivers. This can happen when the signing key is cracked, stolen, or accidentally used to sign malware.

Another approach is to leverage a vulnerability that allows access to kernel memory to directly add malware code into the kernel. A zero day vulnerability of this nature is extremely severe and fixes are developed and deployed very quickly when they are found. Since these types of vulnerabilities are extremely rare, a more common vector for this approach is to leverage an old version of a module/driver that did contain such a vulnerability. The latest version of the module/driver may have the vulnerability fixed, but if the vulnerable version still has a valid signature and can still be loaded then it can be used as a springboard to compromise the system.

System Call Hooking

System call hooking is a kernel mode rootkit technique that attacks the application/kernel interface described above. The technique, used by Drovorob and many other rootkits, is well known and common. To use this technique, the malware first locates the address of the system call table using any one of a number of techniques but commonly by looking at the symbol table which contains the addresses of known kernel symbols or by probing memory looking for a known memory pattern.

Once the address of the system call table is known the malware changes the function pointers in the system call array so that any given system call points to a malicious function rather than to the standard kernel function. Once implanted, all applications running on the system are no longer behaving as designed or intended. In effect, the malware has compromised all applications simultaneously, including any security or defense applications such as virus scanners, local network filters, and EDR/XDR systems.

Kernel Function Hooking

Another technique that rootkits like Drovorub leverage to modify the behavior of running kernels is to hook functions directly instead of changing pointers to functions. This can be accomplished in various ways, but a common approach is to leverage a debugging, profiling, or tracing facility that is already built into the kernel; facilities that are important and necessary features of most kernels.

As an example, the “ftrace” functionality in the Linux kernel can hook into almost any kernel function given a few simple parameters including the name of the function that should be hooked. Once the malware sets up these parameters the kernel dynamically updates the hooked function so that it is redirected to the malware. Though the mechanism is different, the end result is very much the same as system call hooking; the malware has compromised the kernel function and all applications making use of that function are on longer behaving as intended.

Netfilter Hooks

Unlike system call hooking and function hooking, netfilter hooks are not directly modifying the running code of the operating system kernel. Rather, netfilter itself is a framework built on the concept of hooks that can be used to modify the behavior of network packet processing within the kernel. When operating normally, netfilters give the Linux kernel a sophisticated and powerful mechanism for securing the traffic entering and leaving the system.

These mechanisms are just as powerful when leveraged by malware running inside of the operating system kernel, allowing nearly full control over how packets are processed. To do this, malware like Drovorub will install a malicious packet processing function as a netfilter hook, typically as the first function in the netfilter processing pipeline. This allows the malicious function to inspect and change network traffic before others parts of the operating system do, allowing the malware to hide packets, bypass network security rules, alter data as it enters and exits the system, change who is receiving packets, duplicate packet streams to alternative destinations, and many other things.

Measuring Operating System Integrity

Invary’s Integrity Measurement technology is unique and complements existing security tools by securing and monitoring the internal behavior of operating system kernels. Existing solutions don’t have direct insight into your kernel but rely on its correct behavior to provide their security guarantees. Invary ensures the integrity and proper behavior of kernels, strengthening your existing security tools and ensuring they are defending your systems as intended.

The Integrity Measurement Approach

Integrity measurement ensures proper behavior of software using a three phase approach:

  1. The software binary is analyzed and a measurement approach and behavioral baseline are established.
  2. The target software is measured, periodically, continuously, or on demand, as it is running, using the measurement approach determined by the analysis.
  3. Each measurement is appraised to ensure that the runtime behavior matches the expected behavior established during analysis.

To analyze an operating system kernel, Invary’s technology extracts and analyzes the code instructions and data structures within a compiled, binary kernel and any of its modules or drivers. This analysis produces both a description of how to measure the static and dynamic portions of the kernel and a baseline describing all possible code instructions and data structure shapes that can be produced while the kernel is executing. This step is performed outside of the target environment using an automated process.

Once this analysis is complete the target system can be measured. To do this, a measurement agent is installed in the target environment and configured with a description of how and when to measure the kernel, typically on a periodic interval. When a measurement is requested the agent inspects both the static and dynamic portions of the kernel’s memory using the information produced by the software analysis phase. The measurement starts in the static sections because their location and shape are known from the analysis, and proceeds into the dynamic sections constructed by the kernel as it is executing. The shapes of the data in the dynamic sections of the kernel are known from the analysis, but their locations are discovered while the measurement is taken. While the process of determining what and how to measure the software is complicated, the process of taking the measurement once this information is known is relatively simple and quick, and has a low impact on the running system.

The measurement produces an object graph of the executing operating system kernel that describes the shape of the kernel’s static and dynamic code instructions and data structures at the time of measurement. The object graph typically contains several hundred thousand nodes and edges with each node in the graph representing a single measured portion of memory, either static or dynamic, and the edges representing links between those sections of memory as constructed by the kernel while it was executing.

The object graph is used during the appraisal process to ensure the integrity of both the static and dynamic sections of the executing kernel. To do this the appraisal process verifies a large set of invariants that must hold on the measured object graph given the baseline produced during the analysis phase.

As a high level example, consider a software application running on the operating system that reads the contents of a file on disk. The software application asks the operating system kernel to perform two main operations via system calls:

  1. “open” the file to produce a “file descriptor”
  2. “read” the contents of the “file descriptor” into a memory buffer.

When the “open” system call request is made the operating system will look in the system call table for the “open” system call handler and will then invoke that handler’s code. The code in the open system call handler will eventually dynamically create a kernel-side file descriptor, store that file descriptor in a table, and then fill that file descriptor with pointers to functions that implement the Virtual File System (VFS) interface for the requested file. The VFS defines all of the operations, like “read” and “write”, that can be done on a file.

When the application performs the “read” system call, the operating system will lookup and invoke the “read” system call handler just like it did for “open”. The “read” system call handler will eventually lookup the same kernel-side file descriptor that was dynamically created in the “open” step, lookup the “VFS read” function within that file descriptor, and then invoke that “VFS read” function to fetch the contents of the requested file.

The appraisal process verifies the integrity of these “read” and “open” code paths and data structures among thousands of other invariants that are verified about the operating state of the kernel. A successful appraisal process provides strong evidence that the kernel was in a safe state when the measurement was taken and justifies trust in the kernel for some period of time afterwards. By continually measuring and appraising the kernel this trust can be extended for the life of the system being secured.

The appraisal process itself is complex. Many thousands of code sequences and data structures must be verified and the process must complete quickly. At an abstract level, the appraisal process verifies the integrity of an operating system kernel by:

  1. Ensuring that a specific set of nodes exist in the graph and have the expected shape based on the baseline produced in the software analysis phase.
  2. Traversing the kernel’s object graph looking at all nodes, edges, or subgraphs of interest and then verifying that one or more invariants hold at the identified section of the object graph given the kernel’s baseline.

The algorithm used by Invary’s appraisal process is efficient and is run outside of the measured target environment. This ensures that Invary’s integrity measurement approach can produce fresh, usable results with minimal impact on measured targets.

Exposing the Next Drovorub

The joint announcement from the NSA and FBI on Drovorub [1] is interesting not just because it exposes a novel threat but also because it shows that the agencies approach this kind of malware with urgency and seriousness. This is because rootkits are particularly damaging and their use is on the rise:

Many modern malware families use rootkits to try to avoid detection and removal [3].

Rootkit malware is on the rise [5].

These rootkit types have been used to create devastating attacks [6].

The announcement contains a section on detection methodologies for determining the presence of the Drovorub malware. This section lists various techniques for detecting Drovorub because it is now a well known threat, but each section comes with downsides that make it clear that detecting a kernel mode rootkit across a fleet of machines is a daunting task using existing security solutions even when the rootkit is known. Furthermore, the rootkit can easily evade most detection methodologies with simple changes like using TLS encryption or avoiding the use of known file prefixes.

In the section on detection, the announcement discusses “memory analysis” in detail with the advantage that this methodology:

Provides greatest level of visibility into specific rootkit behaviors and artifacts such as files, other processes, and network connections hidden by the malware [1].

The disadvantage, according to the announcement, is the potential for system disruption while acquiring the memory for analysis and the difficulty in scaling the approach to a large number of endpoints due to a manual and complex process.

Invary’s technology directly challenges these disadvantages, providing a powerful methodology for protecting behavioral integrity of operating systems.

Invary’s integrity measurement approach does not have the problem of high levels of system disruption by virtue of our binary analysis step. As discussed, this step analyzes the software binary to determine a measurement approach that minimizes system disruption, ensuring that integrity measurement can continuously protect running systems with low impact.

Our technology scales to a large number of endpoints in two ways. First, by leveraging our SaaS platform an organization can measure and appraise the operating systems installed across their fleet of servers quickly and easily. The process of acquiring and analyzing memory is completely automated, continuous, and low impact with the results being fresh, usable information about the integrity of each endpoint’s kernel.

Second, our approach to memory analysis is unique and powerful. The typical approach to system security analysis, like discussed by the NSA/FBI announcement, is to run tools that look for indicators of compromise (IoC), either by looking for well known threats or by using AI techniques. The integrity measurement approach does not look for IoC, instead it looks at the shape of the executing target software and ensures that the shape of that software matches a known good baseline. This approach can detect threats even when the IoC are unknown and difficult to detect.

To illustrate the difference, consider how each high level approach would detect the presence of a new, Drovorub-like kernel mode rootkit. In the traditional approach the tools are looking for IoC, using the operating system itself, while the Drovorub-like rootkit is actively hiding itself from those tools by virtue of having compromised the operating system interfaces that the security tools are utilizing to search for IoC. Even doing memory analysis using the traditional approach of looking for IoC can be difficult for this new Drovorub-like malware because malware is stealthy by design, leaving as few IoC as possible and avoiding IoC from past threats.

By measuring and appraising the executing operating system, the integrity measurement approach can detect that the system has exhibited unexpected behavior and has entered into an unsafe state. In the case where the new Drovorub-like malware infected the machine, this unexpected behavior and unsafe state were caused by the system call hooking and kernel function hooking that the malware used to compromise the kernel. The appraisal process can give detailed guidance on what changes were made where they occurred.

The integrity measurement approach is powerful enough that it can detect unexpected behavior and unsafe states for a variety of reasons beyond endpoints being compromised by malware. Unexpected behavior caused by hardware failures or unauthorized configuration changes can also lead to unsafe states and untrustworthy endpoints. These are detected and reported the same as any other unexpected and unsafe behavior, giving you peace of mind that your endpoints are behaving as expected.