mercredi 24 février 2016

Real-life real-time problem

I installed Linux RT on the Jetson TK1 last month. Since then, cyclictest is running to test that latency is always correct. The maximum wake-up latency is usually ~100us, and it can run like that for a long time. However, on rare occasion, a latency of ~8ms occurs when running for a long period of time (about one week). This kind of problem is tricky to find without proper tools. Here is how I found the problem.

I configured LTTng kernel and userspace tracing to continuously trace the system. The kernel trace records scheduling events and system calls while the userspace trace indicates the location of the missed deadline. When a high latency is detected, the trace is stopped and a snapshot of the trace buffers are written on the disk. I let the system running with tracing enabled for a couple of days, and then a snapshot was created, indicating that the problem occurred.

The following figure shows the result of the traces loaded in TraceCompass. We can see this bug is the result of interactions between three tasks, namely ktimersoftd/2, irq/154-hpd and cyclictest. The cyclictest wake-up signal is delayed for a long time. This signal puts back the task in the running queue, and it is delivered by ktimersoftd/2. The ktimersoftd task has lower priority than irq/154-hpd that happens to run for a long time. Even though the cyclictest has higher priority than the IRQ thread, as long as cyclictest is in the wait queue, the priority has no effect. Priority inheritance does not work here because we don't know in advance that ktimersoftd will actually wake-up a high priority task.


Obviously, the ktimersoftd threads should have greater priority than other IRQ threads as they are required by cyclictest to meet its deadline. There are few questions left: how the default priorities are defined, what is actually doing the irq/164-hpd task, and does increasing ktimersoftd priority will fix the issue? More investigation is required.

In the meantime, if you want to look for yourself, the trace is here. Cheers!

lundi 1 février 2016

Real-time setup on Jetson TK1

The Linux kernel is a popular operating system for real-time applications. Here is a summary of experimentations with the real-time kernel on the Jetson TK1 board from NVIDIA.

Setup the board

The Jetson TK1 from an internal memory. NVIDIA supplies a script to "flash" the device with the file system root and the kernel. I use the manual method from the NVIDIA Jetson TK1 documentation. Download the Driver Package and the Sample Root File System, and follow the Quick Start Guide. You should have a working system after these steps. 

I tried to use the JetPack installer to flash the device, but it requires Ubuntu 14.04, the procedure failed and it was re-downloading the huge files, so I don't recommend it. Also, I tried the procedure to use the SD card instead of the internal memory of the board and that failed too, so I stick with flashing the internal memory. 

Compile linux-rt

Clone Linux-RT from git.kernel.org:

git clone https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git
cd linux-rt-devel
git checkout linux-4.4.y-rt-rebase

While it downloads, install the cross-compiler for ARM (the "hf" in arm abi hf stands for hard-float, meaning that the compiler will output code optimized for on-chip floating point arithmetic):

sudo apt-get install gcc-arm-linux-gnueabihf

Configure the kernel. When cross-compiling, the ARCH and CROSS_COMPILE variables must be set.

  • Under "Kernel Features ->  Preemption Model", select the preemption level (i.e. PREEMPT_RT_BASE)
  • Under "Kernel hacking" uncheck "Debug preemptible kernel" (known to cause slowdown)

# update configuration
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- tegra_defconfig
make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- menuconfig # compile make -j12 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- 
make -j12 ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- dtbs
# install the modules into the rootfs (adjust according to your setup) sudo make ARCH=arm CROSS_COMPILE=arm-linux-gnueabihf- INSTALL_MOD_PATH=../Linux_for_Tegra/rootfs/ modules_install

There should be a directory created under Linux_for_Tegra/rootfs/lib/modules/
The last step consists to flash the device with the fresh kernel. However, the flash.sh script dit not work correctly for me. The options "-K " and "-d " do not configure the rootfs correctly. This step must be done manually and the settings are not overwriten by the flash script. Beware that the script apply_binaries.sh will copy the zImage from the kernel directory. To avoid using the previous kernel by mistake, replace the default zImage in the kernel directory with your own.

The Device Tree Blob (DTB) describes the hardware addresses that are specific to the device. Previously, one had to write these memory locations in C, whereas it is done declaratively and converted to a binary format. In my understanding, it simplifies the support of a wide range of devices. I were not able to boot the board without the proper DTB file set in the bootloader.


sudo cp arch/arm/boot/zImage ../Linux_for_Tegra/rootfs/boot/
sudo cp arch/arm/boot/zImage ../Linux_for_Tegra/kernel/

sudo cp arch/arm/boot/dts/tegra124-jetson-tk1.dtb ../Linux_for_Tegra/rootfs/boot/

Setup the bootloader

Edit the bootloader configuration file Linux_for_Tegra/rootfs/boot/extlinux/extlinux.conf to set the FDT field with the path of the Device Tree Blob (DTB) (do not change the other fields):

MENU TITLE Jetson-TK1 eMMC boot options LABEL primary MENU LABEL primary kernel LINUX /boot/zImage FDT /boot/tegra124-jetson-tk1.dtb

Then, put the device in recovery mode (hold the recovery button while pressing reset, the board should be displayed in dmesg) and execute the usual flash command:

cd Linux_for_Tegra/
./flash.sh jetson-tk1 mmcblk0p1

After the board reboots, it should run the real-time kernel. The board has a serial port and spawning minicom with a serial-to-usb device allowed to see the boot logs. This is required because the screen is blank at this early boot stage.

I still have an issue with the CONFIG_PREEMPT_RT_FULL. Without the serial cable, it would be not possible to diagnose the problem.

[    5.170764] Unable to handle kernel paging request at virtual address ffefe574
[    5.186139] kernel BUG at kernel/locking/rtmutex.c:1011!
[    5.981440] Fixing recursive fault but reboot is needed!



I use instead CONFIG_PREEMPT_RT_BASE, but I don't know exactly what is the difference between the full RT config, this is something to investigate.