April 23rd, 2009

The debugger strikes

The last couple of days I've been debugging some things for a client involving Miredo, the Linux Teredo implementation. I spent a frustratingly long time only to discover that while I wasn't looking my Teredo address had changed and so the reason my packets weren't quite working out right is that they went to the wrong place. Along the journey I was trying to debug the procedure Teredo uses to avoid address spoofing. This proved difficult. Miredo forks off a daemon process to monitor its child (and I think handle some privilege separation issues). Eventually the actual worker process gets started; that's a multi-threaded complex process. Many events have timeouts, so if you spend too long in the debugger, peer entries or short-lived authentication checksums may time out. In addition, pthread_cancel is used in some cases where there is a timeout, so you may find that the thread you are debugging has been blasted from on high.

The trivial approach of setting a breakpoint sure didn't work. Somehow in all the forking around, Gdb failed to remove the break point in the child. So, I got the amusing message

Program terminated with signal SIGTRAP, Trace/breakpoint trap.
The program no longer exists.
With that auspicious start, I began my adventure. I'll skip the play-by-play , but I want to pass along some useful observations.
  • The best way to deal with forking is to run the program, find the right process through some other means and attach to it. Life is very frustrating when this doesn't work because you need an early breakpoint, because it's hard to find the process or the like.
  • When that fails, the catch fork command can be used to break after a fork succeeds.
  • Don't forget to set follow-fork to indicate whether the debugger should stay with the parent or child. (I really wanted the semi-mythical set follow-fork both)
  • I found the amazing set scheduler-lock command. This allows you to disable execution of other threads while you're debugging.
  • Don't forget to turn off the scheduler lock from time to time: multi-threaded programs get into some fairly unusual deadlocks when only one of the threads is permitted to run.