#!/bin/sh tc qdisc add dev lo root netem delay 100ms netcat -l localhost 8765 > /dev/null &; echo "lttng" | netcat localhost 8765 tc qdisc del dev lo root
Here, the traffic shapping command sets the latency of packets to 100ms on the loop-back interface. The first netcat process is configured to listen on port 8765. The server exits immediately when a message is received. Next, the netcat client is spawn and a small string is transfered. The last command removes the traffic shapping configuration. The Linux kernel used is 2.6.38 on x86_64.
Tracing this script reveal three main blockings, as reported by the blocking analysis module of flightbox.
# server process Blocking report for task /bin/netcat [7495] Start Duration (ms) Syscall Wakeup 8073461215283 300,873 sys_accept SOFTIRQ
# client process Blocking report for task /bin/netcat [7497] Start Duration (ms) Syscall Wakeup 8073461820815 200,155 sys_select SOFTIRQ 8073662056512 200,135 sys_poll SOFTIRQ
The next figure shows blockings occurring in the system, including messages sent at each steps.
The server process create a new AF_INET socket, binds it and start to listen on the selected port. The accept is then performed for an incoming connection. From the trace, we observe that sys_accept blocks for about 300ms. This delay is very close to three times the network latency and is also almost equal to the process duration. Once the accept returns, the read on the socket doesn't block and the process exits.
In the case of the client, it creates the socket, then perform a sys_connect to the server. The connect returns immediately the value EINPROGRESS without blocking. The next step performed is a sys_connect, in which the client blocks for about two times the network latency. When the select returns, the actual message is sent to the server without blocking and the socket is closed. Finally, the client waits on sys_poll for about twice the network delay.
From this observation, the sys_accept performed by the server blocks until the final handshake ACK is received. Hence, unfinished handshake is completely hidden from the application and handled at the OS level. When the read is done, the data is already buffered, such that the read doesn't block in this case. In the case of the client, the connect system call returns in an optimistic fashion. The wait for the socket to be ready is differed to a select system call. This may allow many connect to be performed simultaneously. As for the poll, this may be related to the tear-down procedure of the socket, waiting for the final FIN from the server.
In the case of the traffic shaper used, the delay between consecutive packets is preserved. For example, the client sends three packets when the sys_select returns in a short burst. Hence, this experiment may not highlight all possible blockings. An alternative to traffic shaper is the iptables NFQUEUE. It allow to forward each packet in userspace for arbitrary processing. More on this in the next blog.