This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Integration of a NPTL Trace Tool into the glibc
- From: Guillaume Duranceau <guillaume dot duranceau at bull dot net>
- To: libc-alpha at sourceware dot org
- Cc: tony dot reix at bull dot net
- Date: Mon, 29 May 2006 15:42:03 +0200
- Subject: Integration of a NPTL Trace Tool into the glibc
Dear glibc hackers,
We've worked for three years on a tool aimed at tracing NPTL internals and at
measuring multi-threaded applications performance and contention. This tool
is now mature and we think it would be a good idea to include it into the
glibc. I explain why hereafter.
Version 0.90.0 of the NPTL Trace Tool (http://nptltracetool.sourceforge.net/)
is available for download at http://sourceforge.net/projects/nptltracetool/.
Our work was motivated by industrial needs. Indeed, trace tools are not used
by desktop users, but rather by professionals working with precious data on
critical activities. In an industrial context, people work is based on
methods and tools. They need reliable, available and serviceable (RAS)
systems.
Trace tools help to easily diagnose the system when problems arise, thus
improving the level of serviceability. They also allow to ensure the "First
Failure Data Capture" concept, another important need in industry, to
understand a problem the first time it occurs.
Unfortunately, it seems that this kind of trace tools is not as popular in
Linux systems as it might be.
Companies may be reluctant to migrate to Linux because tools they need are
missing: it is a reason why they prefer to choose more professional systems
widely used in the industry. That's why we strongly believe that trace tools
are not only useful but absolutely needed in opensource systems.
On that point, Linux is late but is improving. Efforts are already made at
kernel level: some tools (like LTT or SystemTap) enable to trace kernel
events, even if they are not as well integrated as similar tools are in other
operating systems. However, there is, by now, no way to trace glibc events,
while the glibc seems to be such a central and critical system component.
Here is an analysis of the main arguments against trace tools in the glibc:
1. "Glibc internals are constantly modified. Any added code might break at the
next update."
Right, but most parts of the glibc are now stable: updates mainly consist in
fixing some minor bugs, and so should not break trace points code.
2. "Glibc is a runtime library. No unnecessary work is done."
Ok, but even if glibc is a high quality library, there will always be
remaining bugs and critical situations. Trace tools are precisely necessary
under these circumstances.
3. "Glibc is a high performance library. Trace tools would reduce
performance."
False. Two versions of libraries can be built: production libraries (not
spoiled by trace points) and instrumented libraries.
4. "How to find maintainers for these trace tools ?"
Industrial users will be natural maintainers of tools they use.
Moving Linux into industrial systems will be facilitated if Linux provides
tools to answer the specific needs of this kind of customers. Glibc, as a
part of Linux systems, should provide such tools like our NPTL Trace Tool in
order to help industrial users to develop multi-threaded applications.
Note that Intel provides on Linux a tool similar to our NPTL Trace Tool: Intel
Threading Tools. If Intel provides such a product, why wouldn't there be an
equivalent opensource tool ? We give a brief comparison between both tools
here below.
Kind Regards,
Guillaume Duranceau
Tony Reix
------------------------------------------------------------------------------
Features comparison between the NPTL Trace Tool and the Intel Threading Tools
------------------------------------------------------------------------------
ITT: Intel Threading Tools
PTT: Posix Thread Trace Toolkit (NPTL Trace Tool)
- ITT - - PTT -
Ability to not rebuild the program NO YES
Few modification of the application dynamic NO (instrusive) YES
Search errors in source code YES NO
Trace calls to and exits from thread routines NO YES
Handle large volume of traces NO YES
Name traced objects YES YES
Graphical interface YES (yes Pajé)
Identify performance bottlenecks YES YES
Measure scalability YES NO
Handle bad situations (crash...) NO YES
Supported architectures ia32, ia64 ia32, ia64, ppc