jeudi 5 décembre 2013

System-level profiling of APT

APT is the package manager of Debian based distributions. Understanding where the time is spent installing a package is difficult, because there are numerous process involved. We discuss limitations of profiler tools and present system-level trace analysis that highlight waiting between tasks.

Callgrind is a profiler that helps to find hotspot in the application. One limitation of a profiler is that the time spent waiting is not taken into account, for instance I/O, resources or other tasks. This situation is shown in figure 1, where we traced the command "sudo apt-get install tree --yes". The actual computation done in the apt-get process accounts for 63% of the elapsed time, while waiting time is about 37%. The callgrind results covers only instructions from the application, not the waiting time.

Figure 1: State of apt-get process according to time.

How to effectively understand what the application was waiting for? For this matter, we performed a waiting dependency analysis on all tasks on the system. The event sched_wakeup indicates the waiting source. Figure 2 shows the result of waiting dependencies among tasks for the command.

Figure 2: Overview of  apt-get waiting dependencies according to time.
The analysis resolves waiting time recursively. In this run, apt-get process is waiting for dpkg, and it accounts for about 20% of the execution time. Figure 3 shows the wait I/O that occurs in dpkg, up to the kernel thread responsible for the file system journal of the partition on which the I/O is performed.

Figure 3: I/O wait of dpkg according to time.
Another interesting part of the execution is the update of man database performed by mandb. Figure 4 shows that about 200 threads are spawned. In this case, the overhead of starting the threads and setup communication pipe is high compared to the amount of actual work, in consequence running time could be greatly improved by using a thread pool instead.

Figure 4: Execution of mandb according to time.
We located the code responsible for creating all these threads and processes using perf callchain. It seems to be related to how the library libpipeline is working. Here is an example of an interesting call stack:

7ffff729a30b     __execvpe (/lib/x86_64-linux-gnu/
7ffff75a4de4     pipecmd_exec (/usr/lib/x86_64-linux-gnu/
409813           [unknown] (/usr/local/bin/mandb)
7ffff75a4e59     pipecmd_exec (/usr/lib/x86_64-linux-gnu/
7ffff75a662a     pipeline_start (/usr/lib/x86_64-linux-gnu/
408d96           find_name (/usr/local/bin/mandb)
40449c           test_manfile (/usr/local/bin/mandb)
404708           testmandirs (/usr/local/bin/mandb)
405035           update_db (/usr/local/bin/mandb)
40a1c6           mandb (/usr/local/bin/mandb)
40a4fb           process_manpath (/usr/local/bin/mandb)
403777           main (/usr/local/bin/mandb)
7ffff71f9ea5     __libc_start_main (/lib/x86_64-linux-gnu/
403b09           _start (/usr/local/bin/mandb)

At the end of apt-get execution, there is a sleep of 500 ms, shown in blue in the figure 2. Here is the location in the code of this nanosleep, again using perf callchain feature: 

ffffffff816cc0c0 __schedule ([kernel.kallsyms])
ffffffff816cc799 schedule ([kernel.kallsyms])
ffffffff816cb66c do_nanosleep ([kernel.kallsyms])
ffffffff810827f9 hrtimer_nanosleep ([kernel.kallsyms])
ffffffff8108292e sys_nanosleep ([kernel.kallsyms])
ffffffff816d616f tracesys ([kernel.kallsyms])
7ffff72978c0     __GI___libc_nanosleep (/lib/x86_64-linux-gnu/
7ffff7b8ac95     pkgDPkgPM::DoTerminalPty(int) (/usr/lib/x86_64-linux-gnu/
7ffff7b93720     pkgDPkgPM::Go(int) (/usr/lib/x86_64-linux-gnu/
7ffff7b26585     pkgPackageManager::DoInstallPostFork(int) (/usr/lib/x86_64-linux-gnu/
415a1b           [unknown] (/usr/bin/apt-get)
418dfc           [unknown] (/usr/bin/apt-get)
7ffff7af96c2     CommandLine::DispatchArg(CommandLine::Dispatch*, bool) (/usr/lib/x86_64-linux-gnu/
40a2ec           [unknown] (/usr/bin/apt-get)

Here is the detail of pkgDPkgPM::DoTerminalPty() function from apt/apt-pkg/deb/

 * read the terminal pty and write log
void pkgDPkgPM::DoTerminalPty(int master)
   unsigned char term_buf[1024] = {0,0, };

   ssize_t len=read(master, term_buf, sizeof(term_buf));
   if(len == -1 && errno == EIO)
      // this happens when the child is about to exit, we
      // give it time to actually exit, otherwise we run
      // into a race so we sleep for half a second.
      struct timespec sleepfor = { 0, 500000000 };
      nanosleep(&sleepfor, NULL);
   if(len <= 0)
   FileFd::Write(1, term_buf, len);
      fwrite(term_buf, len, sizeof(char), d->term_out);

The comment mentions a possible race, but the caller may be fixed to account for this situation. IMHO, this function should not sleep if an error occurs while reading the file descriptor, and should instead return immediately.

This was a small example of what can be achieved with the waiting dependency analysis from a kernel trace. The analysis plug-in will be available soon in Eclipse Linux Tools, and uses standard tracepoints from the Linux kernel, recorded with LTTng. I hope that this information will help the community to improve performance of installing packages with APT. 


The APT licence plate! (thanks to Yannick)