Here is the small program. It opens a file and write a integer into it. Then, it adds number in a for loop.
#include <stdio.h> #define INT_MAX 2147483647 int main(volatile int argc, char **argv) { int i = 3; FILE *fd = fopen("test.out", "w"); fwrite(&i, sizeof(int), 1, fd); fclose(fd); volatile long a; int x; for (x = 0; x<INT_MAX; x++) { a++; } return 0; }
Compile it, test it and run it under strace:
$ gcc -o usecase usecase.c $ ./usecase $ strace -o usecase.strace ./usecaseOk. Now, let's run the program under tracing, but before, we have to enable all trace points in the kernel.
$ sudo ltt-armallTo get a tight window around the command, I'm using the small script trace-cmd.sh (see at the end of the post). The script simply starts tracing, run the program and stop the trace. Invoke it like this (don't forget to set the script executable):
./trace-cmd.sh ./usecaseNow, we are ready to launch lttv-gui, which is the utility to look at the trace. To load a trace, select File->Add trace, then select the "trace-cmd" directory created in the same directory as the trace-cmd.sh script. Then, scroll down to the "./usecase" process in the control flow viewer, which is in the middle of the window. Each trace events are displayed under the graph. I will present three phase of the process. First, the creation and file access, then computation phase and the process exit. Trace is rotated to make it more readable. Events name and comments are embedded in the graph. strace events are in red, while related code is in blue.
In addition to system calls, we can look at the paging of an application and see process starvation. This information is not shown by strace. Another cool side effect is that, contrary to strace, the running program can't detect it's under tracing, then we can trace viruses that otherwise quit before they can be analyzed. Tracing reports much more information about kernel internals, which helps a lot to understand what's going on. Events timestamps of the kernel trace are in nanoseconds, so the precision is really high, and since trace points are static (simple function calls compiled right into the kernel), performance impact is ridiculously low, below 3% for core trace points! I'm running LTTng in background on my system without noticeable impact, in flight recorder mode.
I will be working on advanced trace analysis for the next three years or so for my PhD. Since LTTng got all the feature I need for tracing, I will use this tools. My first goal is to create a CPU usage view. Say that something was eating your CPU a minute ago, then with kernel tracing you would be able to go in the "flight recorder" history and see what was running at that time, and understand what was going on.
You can help me in two ways. First, by submitting your ideas about cool trace analysis we could perform and/or you need. Second, you can write to Linus Torvalds and ask him to merge LTTng in mainline. LTTng is the right stuff we need for tracing and it's a bit a shame that this not merged yet while other operating systems have this since a decade.
Next post will be about TMF, an eclipse plugin for trace analysis! Stay tuned!
Script trace-cmd.sh:
#!/bin/sh if [ -z "$@" ]; then echo "missing command argument" exit 1 fi cmd="$@" name="cmd" dir="$(pwd)/trace-$name" sudo rm -rf $dir sudo lttctl -o channel.all.bufnum=8 -C -w $dir $name echo "executing $cmd..." $cmd echo "return code: $?" sudo lttctl -D $name
3 commentaires:
Quite an interesting post. Would it be possible to automatically trigger a dump of the tracing history when certain events occur (say we identify something is in fault during a nagios check run) to make it possible to look at what happened?
Near the end, you say: "LTTng is the right stuff we need for tracing and it's a bit a shame that this not merged yet while other operating systems have this since a decade."
Hmm.. I actually never heard of anything like LTTng on other OSes. Do you have examples?
It's called dtrace on Solaris.
Real time intrusion detection. I have worked on various tools over the years. Biggest issue is system overhead. Would love to see lightweight tracking of fork() and it's kin then filter on what was forked (with lightweight rules to track). When we did this in an early SunOS saw 15% overhead. On HP/UX, was 10-12%.
So, Realtime IDS with filters?
Enregistrer un commentaire