vendredi 4 décembre 2015

MachineKit LTTng kernel trace

How does MachineKit execution look like at the system level? Here are some observations done using an LTTng kernel trace on Linux preempt-rt 4.1.13-rt15 on a 4 cores machine. The workload traced is the abs.0 unit test. This test spawns halcmd, simulating the position control of a motor (or something). Here is the command:

cd machinekit/
. scripts/rip-environment
cd tests/abs.0/
halcmd -v test.hal

The overall command takes 3.3s to complete. Let's check what's in the trace.

First, the rtapi processing takes about 600ms to execute and is shown in Figure 1. One of the related process, named fast:0, seems to do the actual real-time job and is shown in Figure 2. This process sleeps periodically using the system call clock_nanosleep(). I observed a wake-up latency of about 30us and the thread itself runs for about 2-3us in user-space before returning to sleep.


Figure 1: Overview of the rtapi execution.
Figure 2: Real-time process with period of about 300us.

The scripts and utilities to manage the test environment behave quite differently than the RT process. In particular, there is a bunch of processes that interact for very short durations, namely halcmd and rtapi, as shown in Figure 3. They perform sendto() and recvfrom() system calls, probably for round-trip communication. Maybe using shared memory could help here to streamline the communication. There is numerous sleep of 10ms to 200ms performed by halcmd in midway. 

Figure 3: Interactions between halcmd and rtapi.
There is a pattern of fork-exec-wait done by the script realtime shown in Figure 4. This script spawns inivar, flavor and halrun executables, together running for more than 400ms. Using programmatic API instead of spawning executables would make the processing more efficient. 


Figure 4: Repeated fork-exec-wait pattern.
Finally, there is a giant 2.5s sleep at the end of the test, shown in Figure 5. It represents therefore 75% of the test execution time. This kind of sleep is usually done for quick-n-dirty synchronization, but should be replaced by wait for the actual event required to execute as quickly as possible. This way, the time to run the unit tests can be decreased significantly, possibly representing greater productivity for developers.

Of course, if we had a user-space trace in addition to the kernel trace, we could have a greater insight of the internal state of machinekit. But even without, we can observe interesting behavior. The next step will be to actually plug and control some actual hardware in real-time.

jeudi 2 avril 2015

Why malware detection based on syscalls n-grams is unlikely to work

In the field of computer security, it was suggested to use n-grams of system calls to detect malware running on a computer. Let's demonstrate that there is no relationship between n-gram counts and malicious behavior.

False-negative example. The first trace is a weather app that opens a config file, then performs some remote procedure call to get the data, and then closes file descriptors. The other is the same sequence, but for a malicious app that gained root privilege by some means, and sends the content of the password file.


Trace A1: open(“app.conf”); read(); connect(); send(); recv(); close(); close()
Trace A2: open(“/etc/passwd”); read(); connect(); send(); recv(); close(); close()


The probability of each n-gram is trivially the same for the same sequence. Therefore, the algorithm fails to detect the malware.

False-positive example. Both sequences are functionally equivalent and non-malicious, but the second trace reduces the maximum number of opened file descriptor at one time.


Trace B1: open(); open(); open(); read(); read(); read(); close(); close(); close()
Trace B2: open(); read(); close(); open(); read(); close(); open(); read(); close()


Let’s use Python to generate all n-grams of size 3.


ngrams = [x for x in itertools.product(['open','read','close'],['open','read','close'],['open','read','close'])]
                                        B1 B2 (B2 sym diff B1)
open  open  open  1 0  \empty
open  open  read  1 0  \empty
open  open close  0 0
open  read  open  0 0
open  read  read  1 0  \empty
open  read close  0 3  \empty
open close  open  0 0
open close  read  0 0
open close close  0 0
read  open  open  0 0
read  open  read  0 0
read  open close  0 0
read  read  open  0 0
read  read  read  1 0  \empty
read  read close  1 0  \empty
read close  open  0 2  \empty
read close  read  0 0
read close close  1 0  \empty
close  open  open  0 0
close  open  read  0 2  \empty
close  open close  0 0
close  read  open  0 0
close  read  read  0 0
close  read close  0 0
close close  open  0 0
close close  read  0 0
close close close  1 0  \empty
total              7 7


The n-gram counts for both sets are totally disjoints. Therefore, the algorithm will wrongly detect the trace B2 as malware code, while it is in fact the result of a simple optimization.

Moreover, a malicious app can trick the detection by calling dummy system calls randomly, which will swamp the potentially mallicious sequences. Because each n-gram probability will be non-statistically significant, the algorithm will fail to identify the change of behavior, and classify it as malware.

Therefore, we conclude that this technique is unlikely to provide any benefits to identify malware automatically.