jeudi 2 avril 2015

Why malware detection based on syscalls n-grams is unlikely to work

In the field of computer security, it was suggested to use n-grams of system calls to detect malware running on a computer. Let's demonstrate that there is no relationship between n-gram counts and malicious behavior.

False-negative example. The first trace is a weather app that opens a config file, then performs some remote procedure call to get the data, and then closes file descriptors. The other is the same sequence, but for a malicious app that gained root privilege by some means, and sends the content of the password file.


Trace A1: open(“app.conf”); read(); connect(); send(); recv(); close(); close()
Trace A2: open(“/etc/passwd”); read(); connect(); send(); recv(); close(); close()


The probability of each n-gram is trivially the same for the same sequence. Therefore, the algorithm fails to detect the malware.

False-positive example. Both sequences are functionally equivalent and non-malicious, but the second trace reduces the maximum number of opened file descriptor at one time.


Trace B1: open(); open(); open(); read(); read(); read(); close(); close(); close()
Trace B2: open(); read(); close(); open(); read(); close(); open(); read(); close()


Let’s use Python to generate all n-grams of size 3.


ngrams = [x for x in itertools.product(['open','read','close'],['open','read','close'],['open','read','close'])]
                                        B1 B2 (B2 sym diff B1)
open  open  open  1 0  \empty
open  open  read  1 0  \empty
open  open close  0 0
open  read  open  0 0
open  read  read  1 0  \empty
open  read close  0 3  \empty
open close  open  0 0
open close  read  0 0
open close close  0 0
read  open  open  0 0
read  open  read  0 0
read  open close  0 0
read  read  open  0 0
read  read  read  1 0  \empty
read  read close  1 0  \empty
read close  open  0 2  \empty
read close  read  0 0
read close close  1 0  \empty
close  open  open  0 0
close  open  read  0 2  \empty
close  open close  0 0
close  read  open  0 0
close  read  read  0 0
close  read close  0 0
close close  open  0 0
close close  read  0 0
close close close  1 0  \empty
total              7 7


The n-gram counts for both sets are totally disjoints. Therefore, the algorithm will wrongly detect the trace B2 as malware code, while it is in fact the result of a simple optimization.

Moreover, a malicious app can trick the detection by calling dummy system calls randomly, which will swamp the potentially mallicious sequences. Because each n-gram probability will be non-statistically significant, the algorithm will fail to identify the change of behavior, and classify it as malware.

Therefore, we conclude that this technique is unlikely to provide any benefits to identify malware automatically.