Session 16: Debugging and Profiling Parallel Programs

The same issues that can happen in serial or single threaded programs can occur in the both shared memory and distributed memory programs. Fortunately the same debugging and profiling techniques discussed in previous sessions will work to a degree. Debugging parallel programs becomes increasing complicated as the number of parallel threads or processes increases.

An additional issue common to parallel applications are race conditions. This is where parallel calculations are locked in circular pattern of attempting to access needed information. Race conditions can be difficult to debug as they frequently depend on the run-time dynamics of the program.

Retrieve the files for session by clicking this link or via the following commands:

$ cp /hpc/examples/workshops/hpc/session16.tgz .
$ tar -xf session16.tgz
$ cd session16

Debugging Shared Memory Programs Using GDB

GDB, the GNU debugger, has good support for threads. First lets make sure that we are using the correct GCC for the GDB version we’ll be using.

$ module purge
$ module load gcc-7.3

Then lets compile and run our broken code.

$ gcc pthread_bug.c -o pthread_bug -lpthread
$ ./pthread_bug

Hmmm… I guess it really does have a bug. Lets recompile the code with the standard debugging options and run the program again in GDB.

$ gcc -O0 -g pthread_bug.c -o pthread_bug -lpthread
$ gdb pthread_bug

Running our broken program in GDB yields something like the following:

GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /users/rkalescky/hpc/pthread_bug...done.
(gdb) run
Starting program: /users/rkalescky/hpc/pthread_bug
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7fe4700 (LWP 24370)]
[New Thread 0x7ffff2fe3700 (LWP 24371)]
Thread 1: 0
Thread 1: 1
Thread 1: 2
Thread 1: 3
Thread 1: 4
Thread 1: 5
Thread 1: 6
Thread 1: 7
Thread 1: 8
Thread 1: 9
Thread 1: 10
Thread 1: 11
Thread 1: 12

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff2fe3700 (LWP 24371)]
0x0000003700000007 in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.7.x86_64
(gdb)

Let’s rerun the program setting breakpoints at the calls of each function defined.

(gdb) break thread3
Breakpoint 1 at 0x40060c: file pthread_bug.c, line 11.
(gdb) break thread2
Breakpoint 2 at 0x400650: file pthread_bug.c, line 24.
(gdb) break main
Breakpoint 3 at 0x40069e: file pthread_bug.c, line 35.
(gdb) run

We can check the state of program’s threads with info threads. The program will run until our first breakpoint, which is when the function thread3 is called.

(gdb) info threads
* 1 Thread 0x7ffff7fe6700 (LWP 28685)  main () at pthread_bug.c:35

Let’s step to the next breakpoint and check the state of program’s threads again.

(gdb) next
[New Thread 0x7ffff7fe4700 (LWP 31485)]
36       pthread_create (&thread, NULL, thread3, NULL);
(gdb) info thread
  2 Thread 0x7ffff7fe4700 (LWP 31485)  thread2 (d=0x0) at pthread_bug.c:24
* 1 Thread 0x7ffff7fe6700 (LWP 31091)  main () at pthread_bug.c:36

We can see that the second thread has been created, which is running the thread2 function. The first thread is main. We can use next until we get to the next breakpoint, which should be for the thread3 function. Again, we can check the thread state.

(gdb) next
[Switching to Thread 0x7ffff7fe4700 (LWP 31485)]

Breakpoint 2, thread2 (d=0x0) at pthread_bug.c:24
24       for(i = 0; i < 8; i++) {
(gdb) next
25           sleep(4);
(gdb) next
[New Thread 0x7ffff2fe3700 (LWP 31822)]
Thread 1: 0
Thread 1: 1
Thread 1: 2
Thread 1: 3
Thread 1: 4
Thread 1: 5
Thread 1: 6
Thread 1: 7
[Switching to Thread 0x7ffff2fe3700 (LWP 31822)]

Breakpoint 1, thread3 (d=0x0) at pthread_bug.c:11
11       for(c = 0; c < 8; c++) {
(gdb) info thread
* 3 Thread 0x7ffff2fe3700 (LWP 31822)  thread3 (d=0x0) at pthread_bug.c:11
  2 Thread 0x7ffff7fe4700 (LWP 31485)  0x0000003765aaca7d in nanosleep () from /lib64/libc.so.6
  1 Thread 0x7ffff7fe6700 (LWP 31091)  0x0000003765adb57d in write () from /lib64/libc.so.6

Now that all three threads have been created, we can switch to thread 2 and move forward. In the following case the next breakpoint was in thread 3. We can then watch the values of l and c to see that the a non-existing array index access is attempted, which results in the segmentation fault. In the following code some of the standard out text has been truncated.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff7fe4700 (LWP 31485))]#0  0x0000003765aaca7d in nanosleep ()
   from /lib64/libc.so.6
(gdb) next
Single stepping until exit from function nanosleep,
which has no line number information.
Thread 1: 8
.
.
Thread 1: 29
[Switching to Thread 0x7ffff2fe3700 (LWP 31822)]

Breakpoint 1, thread3 (d=0x0) at pthread_bug.c:11
11       for(c = 0; c < 8; c++) {
(gdb) info thread
* 3 Thread 0x7ffff2fe3700 (LWP 31822)  thread3 (d=0x0) at pthread_bug.c:11
  2 Thread 0x7ffff7fe4700 (LWP 31485)  0x0000003765aaca89 in nanosleep () from /lib64/libc.so.6
  1 Thread 0x7ffff7fe6700 (LWP 31091)  0x0000003765adb57d in write () from /lib64/libc.so.6
(gdb) next
12           l = c/2*2;      /* should have been l = c/(2*2); */
(gdb) watch (l >= 2)
Hardware watchpoint 4: (l >= 2)
(gdb) continue
Continuing.
Thread 1: 30
.
.
Thread 1: 104
Hardware watchpoint 4: (l >= 2)

Old value = 0
New value = 1
thread3 (d=0x0) at pthread_bug.c:13
13           w[l] = c;
(gdb) info thread
* 3 Thread 0x7ffff2fe3700 (LWP 31822)  thread3 (d=0x0) at pthread_bug.c:13
  2 Thread 0x7ffff7fe4700 (LWP 31485)  0x0000003765aaca7d in nanosleep () from /lib64/libc.so.6
  1 Thread 0x7ffff7fe6700 (LWP 31091)  0x0000003765adb57d in write () from /lib64/libc.so.6
(gdb) print l
$1 = 2
(gdb) print c
$2 = 2
(gdb) quit

Your analysis of the problem may be slightly different due the the precise timing of the threads. This is a perfect example of the difficulty in debugging parallel applications.

Debugging Distributed Memory Programs Using GDB

GDB does not natively support debugging multiprocess applications, but it can be used to debug each process spawned as part of a distributed memory program.

In this example a deliberate error has been introduced that will cause one of the processes to crash. The following technique helps to determine where the error is occurring. The method requires that X11 forwarding be enabled when the SSH session is established.

Load the appropriate module to setup the MPI environment.

$ module purge
$ module load gcc-7.3 hpcx/2.1.0

Compile the code using the C MPI wrapper script.

$ mpicc -g prime_mpi.c -o prime_mpi

We will run the program using two MPI processes using srun on a single node. The following command will request that two instances of GDB be instantiated in two separate terminals.

$ srun -p workshop -N 1 -n 2 --x11=first xterm -e gdb ./prime_mpi

Just as with previous uses of GDB the loaded program needs to be run. As in this case there are two instances of GDB, we simply need type run in both. The application will run until the coding mistake is encountered and then GDB will give information about the specific error.