Linux Trace Toolkit have a synchronization feature to adjust trace time stamps to a reference trace. It works by matching sent and received network packets found in traces. Two parameters are computed: an offset, but also a drift, that is the rate of clock change.
To experiment it, I did a virtual Ubuntu cluster of 10 nodes installed with LTTng. Virtual machines are sending each other TCP packets with the small utility pingpong while tracing is enabled. All traces are then sync to the host for analysis. Here are the results:
$ lttv -m sync_chain_batch --sync --sync-stats -t trace1 -t trace2 [...] [...] Resulting synchronization factors: trace 0 drift= 1 offset= 6.80407e+09 (2.551684) start time= 2433.182802117 trace 1 drift= 1 offset= 4.12691e+09 (1.547687) start time= 2433.254700670 trace 2 drift= 0.999999 offset= 0 (0.000000) start time= 2433.229372444 trace 3 drift= 1 offset= 1.53079e+09 (0.574083) start time= 2433.366687088 trace 4 drift= 1 offset= 2.31812e+10 (8.693471) start time= 2433.217954184 trace 5 drift= 1 offset= 1.48957e+10 (5.586227) start time= 2433.280525047 trace 6 drift= 1 offset= 1.76688e+10 (6.626211) start time= 2433.291125350 trace 7 drift= 1 offset= 2.0463e+10 (7.674116) start time= 2433.325424020 trace 8 drift= 0.999999 offset= 1.2206e+10 (4.577534) start time= 2433.204361289 trace 9 drift= 1 offset= 9.55947e+09 (3.585022) start time= 2433.293578996 [...]
The trace 2 has been selected as the reference trace. We see inside parenthesis the offset between the current trace and the reference trace. We see a delay between each trace that match the delay between virtual machines startup. When loading these traces into lttv with --sync option, all traces events align perfectly, which is not the case without the sync option.
If you want to know more about trace synchronization, I recommend the paper Accurate Offline Synchronization of Distributed Traces Using Kernel-Level Events from Benjamin Poirier.
One simple note if you want to do this analysis, you need extended network trace events. To enable them, arm ltt with
$ sudo ltt-armall -n
I found it really interesting, because it allows to see cluster-wide system state.
Aucun commentaire:
Enregistrer un commentaire