mardi 10 mai 2016

Function graph tracing with LTTng

FTrace has the ability to hook function entry and return in the kernel. I did a small prototype demonstrating the possibility of recording these events with LTTng, and viewing the result with TraceCompass.

The idea is to register callbacks using register_ftrace_graph(). This symbol is not exported, therefore we use kallsyms to lookup the address at runtime. The callbacks contains themselves a tracepoint that records entries and returns. The kernel module implementation lttng-fgraph.c is quite simple. It limits the tracing to the normal kernel mode (no interrupt) and the function to hook is hardcoded (do_sys_open()) for prototyping purpose.

The implementation of the analysis module in TraceCompass is relatively simple (LttngFtraceCallStackProvider.java). In fact, it consists mostly to define entry and return events, and how to decode the event payload. In order to show function names instead of addresses, I implemented a symbol provider for kallsyms. It walk the trace directory to find the kallsyms file and parse it to map addresses to symbols. The user just needs to copy /proc/kallsyms below the trace directory, like this:

mkdir my-trace/kernel/syms
sudo cat /proc/kallsyms > my-trace/kernel/syms/kallsyms

Sudo (root) is required, because symbol addresses are usually hidden by default. The kallsyms file must be within its own directory, otherwise the trace parser will try to open the kallsyms file as a trace stream and will report an error.

Here is the result of some open system call on the system. Because the views are synchronized, the open system call interval of the processes shown in the control flow view overlap the call stack graph. It looks great, isn't?


However, the overhead of function tracing is somewhat high. For each function call, about 300ns are spent writing trace events, which may represent more time than the processing done in the function itself. To address the situation, FTrace has the ability to record an event only of the delay inside the function is greater than a given threshold. The call graph is no longer the exact representation of the execution, but it contributes to reduce the overhead and the trace size by not recording short duration functions, while still identifying slow functions.

At the moment, FTrace does not exports the registration functions, because external users were not intended. Only one set of hooks can be defined at a time, so one cannot use ftrace and lttng at the same time. In the future, that could be a nice extension to FTrace to make it more general and allow other tracers to use the function graph. For instance, kprobe could get callbacks not only on function entry and return, but also on each of its children.