vendredi 4 décembre 2015

MachineKit LTTng kernel trace

How does MachineKit execution look like at the system level? Here are some observations done using an LTTng kernel trace on Linux preempt-rt 4.1.13-rt15 on a 4 cores machine. The workload traced is the abs.0 unit test. This test spawns halcmd, simulating the position control of a motor (or something). Here is the command:

cd machinekit/
. scripts/rip-environment
cd tests/abs.0/
halcmd -v test.hal

The overall command takes 3.3s to complete. Let's check what's in the trace.

First, the rtapi processing takes about 600ms to execute and is shown in Figure 1. One of the related process, named fast:0, seems to do the actual real-time job and is shown in Figure 2. This process sleeps periodically using the system call clock_nanosleep(). I observed a wake-up latency of about 30us and the thread itself runs for about 2-3us in user-space before returning to sleep.


Figure 1: Overview of the rtapi execution.
Figure 2: Real-time process with period of about 300us.

The scripts and utilities to manage the test environment behave quite differently than the RT process. In particular, there is a bunch of processes that interact for very short durations, namely halcmd and rtapi, as shown in Figure 3. They perform sendto() and recvfrom() system calls, probably for round-trip communication. Maybe using shared memory could help here to streamline the communication. There is numerous sleep of 10ms to 200ms performed by halcmd in midway. 

Figure 3: Interactions between halcmd and rtapi.
There is a pattern of fork-exec-wait done by the script realtime shown in Figure 4. This script spawns inivar, flavor and halrun executables, together running for more than 400ms. Using programmatic API instead of spawning executables would make the processing more efficient. 


Figure 4: Repeated fork-exec-wait pattern.
Finally, there is a giant 2.5s sleep at the end of the test, shown in Figure 5. It represents therefore 75% of the test execution time. This kind of sleep is usually done for quick-n-dirty synchronization, but should be replaced by wait for the actual event required to execute as quickly as possible. This way, the time to run the unit tests can be decreased significantly, possibly representing greater productivity for developers.

Of course, if we had a user-space trace in addition to the kernel trace, we could have a greater insight of the internal state of machinekit. But even without, we can observe interesting behavior. The next step will be to actually plug and control some actual hardware in real-time.