mardi 10 mai 2016

Function graph tracing with LTTng

FTrace has the ability to hook function entry and return in the kernel. I did a small prototype demonstrating the possibility of recording these events with LTTng, and viewing the result with TraceCompass.

The idea is to register callbacks using register_ftrace_graph(). This symbol is not exported, therefore we use kallsyms to lookup the address at runtime. The callbacks contains themselves a tracepoint that records entries and returns. The kernel module implementation lttng-fgraph.c is quite simple. It limits the tracing to the normal kernel mode (no interrupt) and the function to hook is hardcoded (do_sys_open()) for prototyping purpose.

The implementation of the analysis module in TraceCompass is relatively simple (LttngFtraceCallStackProvider.java). In fact, it consists mostly to define entry and return events, and how to decode the event payload. In order to show function names instead of addresses, I implemented a symbol provider for kallsyms. It walk the trace directory to find the kallsyms file and parse it to map addresses to symbols. The user just needs to copy /proc/kallsyms below the trace directory, like this:

mkdir my-trace/kernel/syms
sudo cat /proc/kallsyms > my-trace/kernel/syms/kallsyms

Sudo (root) is required, because symbol addresses are usually hidden by default. The kallsyms file must be within its own directory, otherwise the trace parser will try to open the kallsyms file as a trace stream and will report an error.

Here is the result of some open system call on the system. Because the views are synchronized, the open system call interval of the processes shown in the control flow view overlap the call stack graph. It looks great, isn't?


However, the overhead of function tracing is somewhat high. For each function call, about 300ns are spent writing trace events, which may represent more time than the processing done in the function itself. To address the situation, FTrace has the ability to record an event only of the delay inside the function is greater than a given threshold. The call graph is no longer the exact representation of the execution, but it contributes to reduce the overhead and the trace size by not recording short duration functions, while still identifying slow functions.

At the moment, FTrace does not exports the registration functions, because external users were not intended. Only one set of hooks can be defined at a time, so one cannot use ftrace and lttng at the same time. In the future, that could be a nice extension to FTrace to make it more general and allow other tracers to use the function graph. For instance, kprobe could get callbacks not only on function entry and return, but also on each of its children.

mercredi 24 février 2016

Real-life real-time problem

I installed Linux RT on the Jetson TK1 last month. Since then, cyclictest is running to test that latency is always correct. The maximum wake-up latency is usually ~100us, and it can run like that for a long time. However, on rare occasion, a latency of ~8ms occurs when running for a long period of time (about one week). This kind of problem is tricky to find without proper tools. Here is how I found the problem.

I configured LTTng kernel and userspace tracing to continuously trace the system. The kernel trace records scheduling events and system calls while the userspace trace indicates the location of the missed deadline. When a high latency is detected, the trace is stopped and a snapshot of the trace buffers are written on the disk. I let the system running with tracing enabled for a couple of days, and then a snapshot was created, indicating that the problem occurred.

The following figure shows the result of the traces loaded in TraceCompass. We can see this bug is the result of interactions between three tasks, namely ktimersoftd/2, irq/154-hpd and cyclictest. The cyclictest wake-up signal is delayed for a long time. This signal puts back the task in the running queue, and it is delivered by ktimersoftd/2. The ktimersoftd task has lower priority than irq/154-hpd that happens to run for a long time. Even though the cyclictest has higher priority than the IRQ thread, as long as cyclictest is in the wait queue, the priority has no effect. Priority inheritance does not work here because we don't know in advance that ktimersoftd will actually wake-up a high priority task.


Obviously, the ktimersoftd threads should have greater priority than other IRQ threads as they are required by cyclictest to meet its deadline. There are few questions left: how the default priorities are defined, what is actually doing the irq/164-hpd task, and does increasing ktimersoftd priority will fix the issue? More investigation is required.

In the meantime, if you want to look for yourself, the trace is here. Cheers!

lundi 1 février 2016

Real-time setup on Jetson TK1

The Linux kernel is a popular operating system for real-time applications. Here is a summary of experimentations with the real-time kernel on the Jetson TK1 board from NVIDIA.

Setup the board

The Jetson TK1 from an internal memory. NVIDIA supplies a script to "flash" the device with the file system root and the kernel. I use the manual method from the NVIDIA Jetson TK1 documentation. Download the Driver Package and the Sample Root File System, and follow the Quick Start Guide. You should have a working system after these steps. 

I tried to use the JetPack installer to flash the device, but it requires Ubuntu 14.04, the procedure failed and it was re-downloading the huge files, so I don't recommend it. Also, I tried the procedure to use the SD card instead of the internal memory of the board and that failed too, so I stick with flashing the internal memory. 

Compile linux-rt

Clone Linux-RT from git.kernel.org:

git clone https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
cd linux-rt-devel
git checkout linux-4.4.y-rt-rebase

While it downloads, install the cross-compiler for ARM (the "hf" in arm abi hf stands for hard-float, meaning that the compiler will output code optimized for on-chip floating point arithmetic):

sudo apt-get install gcc-arm-linux-gnueabihf

Configure the kernel. When cross-compiling, the ARCH and CROSS_COMPILE variables must be set.

  • Under "Kernel Features ->  Preemption Model", select the preemption level (i.e. PREEMPT_RT_BASE)
  • Under "Kernel hacking" uncheck "Debug preemptible kernel" (known to cause slowdown)

# update configuration
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- tegra_defconfig
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- menuconfig # compile make -j12 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 
make -j12 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs
# install the modules into the rootfs (adjust according to your setup) sudo make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=../Linux_for_Tegra/rootfs/ modules_install

There should be a directory created under Linux_for_Tegra/rootfs/lib/modules/
The last step consists to flash the device with the fresh kernel. However, the flash.sh script dit not work correctly for me. The options "-K " and "-d " do not configure the rootfs correctly. This step must be done manually and the settings are not overwriten by the flash script. Beware that the script apply_binaries.sh will copy the zImage from the kernel directory. To avoid using the previous kernel by mistake, replace the default zImage in the kernel directory with your own.

The Device Tree Blob (DTB) describes the hardware addresses that are specific to the device. Previously, one had to write these memory locations in C, whereas it is done declaratively and converted to a binary format. In my understanding, it simplifies the support of a wide range of devices. I were not able to boot the board without the proper DTB file set in the bootloader.


sudo cp arch/arm/boot/zImage ../Linux_for_Tegra/rootfs/boot/
sudo cp arch/arm/boot/zImage ../Linux_for_Tegra/kernel/

sudo cp arch/arm/boot/dts/tegra124-jetson-tk1.dtb ../Linux_for_Tegra/rootfs/boot/

Setup the bootloader

Edit the bootloader configuration file Linux_for_Tegra/rootfs/boot/extlinux/extlinux.conf to set the FDT field with the path of the Device Tree Blob (DTB) (do not change the other fields):

MENU TITLE Jetson-TK1 eMMC boot options LABEL primary MENU LABEL primary kernel LINUX /boot/zImage FDT /boot/tegra124-jetson-tk1.dtb

Then, put the device in recovery mode (hold the recovery button while pressing reset, the board should be displayed in dmesg) and execute the usual flash command:

cd Linux_for_Tegra/
./flash.sh jetson-tk1 mmcblk0p1

After the board reboots, it should run the real-time kernel. The board has a serial port and spawning minicom with a serial-to-usb device allowed to see the boot logs. This is required because the screen is blank at this early boot stage.

I still have an issue with the CONFIG_PREEMPT_RT_FULL. Without the serial cable, it would be not possible to diagnose the problem.

[    5.170764] Unable to handle kernel paging request at virtual address ffefe574
[    5.186139] kernel BUG at kernel/locking/rtmutex.c:1011!
[    5.981440] Fixing recursive fault but reboot is needed!



I use instead CONFIG_PREEMPT_RT_BASE, but I don't know exactly what is the difference between the full RT config, this is something to investigate.