I recently diagnosed a problem in Debian's pam-p11 package. This package allegedly permits logging into a computer using a smart card or USB security token containing an ssh key. If you know the PIN and have the token, then your login attempt is authorized against the ssh authorized keys file. This seems like a great way to permit console logins as root to machines without having a shared password.
Unfortunately, the package didn't work very well for me. It worked once, then all future attempts to use it segfaulted. I'm familiar with how PAM works. I understand the basic ideas behind PKCS11 (the API used for this type of smart card), but was completely unfamiliar with this particular PAM module and the PKCS11 library it used. The segfault was in an area of code I didn't even expect that this PAM module would ever call.
Back in 1994, that would have been a painful slog. Gdb has improved significantly since then, and I'd really like to thank all the people over the years who made that possible. I was able to isolate the problem in just a couple of hours of debugging. Here are some of the cool features I used:
"target record-full" which allows you to track what's going on so you can go backwards and potentially bisect where in a running program something goes wrong. It's not perfect; it seems to have trouble with memset and a few other functions, but it's really good.
Hardware watch points. Once you know what memory is getting clobbered, have the hardware report all changes so you can see who's responsible.
Hey, wait, what? I really wish I had placed a breakpoint back there. With "target record-full" and "reverse-continue," you can. Set the breakpoint and then reverse continue, and time runs backwards until your breakpoint gets hit.
I didn't need it for this session, but "set follow-fork-mode" is very handy for certain applications. There's even a way to debug both the parent and child of a fork at the same time, although I always have to go look up the syntax. It seems like it ought to be "set follow-fork-mode both," and there was once a debugger that used that syntax, but Gdb uses different syntax for the same concept.
Anyway, with just a couple of hours and no instrumentation of the code, I managed to track down how a bunch of structures were being freed as an unexpected side effect of one of the function calls. Neither I nor the author of the pam-p11 module expected that (although it is documented and does make sense in retrospect). Good tools make life easier.