jeudi 29 mai 2014

Performance comparison of CTF trace readers

I'm implementing a kernel trace analysis using the Java CTF reader, and I wanted to know if I would increase the performance by going with the babeltrace reader in C. Here are my findings.

One of the expansive step of trace analysis is the actual trace reading. It consists to read the binary trace and retrieve the timestamps, the event type and the value of each fields. The event is then added to a priority queue for reading multiple streams in order. We compare three trace readers, namely the Babeltrace reader 1.2 (C implementation, dummy output format), the Java CTF reader and the Eclipse TMFEventRequest (which uses the Java CTF reader as back-end). We compare the trace reading in cache cold conditions (drop_caches prior to reading) to when the trace is resident in page cache. I ensure that the lazy loading of fields values are effectively loaded in all cases, but without printing the content. The trace size is 855MB and contains 22 million kernel events from a Django web app benchmark (recorded with lttng), and it takes between 20s and 80s to process the trace. The drive used is an SSD Crucial M500 1TB, and the host is an i7-4770 with 32GB of RAM. The following figure shows the performance results in thousands of events per second.



We observe that babeltrace is roughly 2 times faster than the CTFTraceReader in Java, and about 3.5 times faster than TmfEventRequest. I measured also the CPU usage, and the processing is mostly serial and single threaded in every cases, so babeltrace seems effectively more efficient.

The other important observation is that trace parsing is CPU bound, not I/O bound. Reading a trace from a hard drive is certainly I/O bound, but it's not the case with SSD. The difference between cache cold and cache hot is between 2% and 5% either in Java or in C. It means we could probably speed-up the reading using parallel threads.

In conclusion, both the C and the Java libraries provides acceptable performance level for CTF trace reading. For high-performance processing, the babeltrace library is preferable. The CTFTraceReader provides nonetheless a good performance, and may be a compromise between the rapid development and runtime efficiency. If abstracting the trace source is a requirement, then TMF is what you need, but you now know the cost of this additional flexibility. Yours to choose!

Notice: the TMFEventRequest can be done either with ExecutionType.FOREGROUND or ExecutionType.BACKGROUND. The background parameter introduce delays (sleeps) and reduces the throughput. Use the foreground parameter to reduce processing time and improve UI interactivity, but uses background for low priority processing.