TotalView

From ACENET
Jump to: navigation, search

Summary

The TotalView debugger can be used for debugging both serial and parallel (MPI, OpenMP) applications. However, parallel program users will find TotalView extremely useful due to its focus on multi-processor programs debugging. It contains both a graphical and a command line interface; and it includes several features for MPI and OpenMP debugging.

Availability
Mahone, Glooscap, Placentia, Fundy: fully supported (head node and compute nodes)
Replay Engine only available at Mahone, Glooscap
Modulefile
totalview
Usage
Users are advised to have a .tvdrc file in their home directory, which contains recommended settings for debugging Open MPI applications. This will configure TotalView to skip mpirun and jump right into your MPI application, otherwise it will stop deep in the machine code of mpirun itself, which is not what most users want. The file .tvdrc gets generated automatically when you load the totalview modulefile:
 $ module load totalview
If you already have .tvdrc in your home directory, it will not be overwritten. If you want to ensure to that you got the right file, you may need to remove the existing one and then reload the modulefile.
Documentation
TotalView Technologies homepage
TotalView Support, Documentation, Video Tutorials, Tips & Tricks
Getting Started with TotalView (Video)
Printable PDF Documentation
TotalView Tutorial from Lawrence Livermore National Laboratory
TotalView Release Notes

Compiling the code

In order to provide necessary symbolic debug information for a debugger, you need to recompile your code. Usually, this requires the -g flag to your compiler.

 $ mpif90 -g -o test test.f90

When trying to do memory debugging across nodes, you need to link in the Totalview library tvheap to your code. In order to do this, your compile line would look like

 $ mpif77 -g -ltvheap_64 source.f77

Basics

  • Graphical Interface: totalview
  • Command Line Interface: totalviewcli

If you want to use the GUI-based TotalView parallel debugger then you need to make sure that you are connecting to the head node of the cluster with the X11 forwarding enabled in your SSH client. That will allow you to get windows of a remotely started application shown on your own desktop. Unix users need to run the X11 server on their desktops (if you are running any window manager then you already have the X11 server installed) and connect to the head node with the -X option for the SSH client (ssh -X servername.ace-net.ca). Those using PuTTY on Windows need to install XMing and enable X11 forwarding in PuTTY.

Debugging Open MPI programs

You can use the Totalview debugger either on the head node, or through the grid engine queues. To debug a job you just need to include --debug in the command line. Open MPI will automatically invoke TotalView to run your MPI process if you have totalview module loaded in your shell profile.

On the head node

If your application is not computationally intensive, does not use a lot of memory, and you are running debugging sessions for short periods of time (any process run on the head node should not consume more than 15 minutes of CPU time) with a small number of processes (no more than 2), then you can debug your program on the head node. For example:

 $ mpirun --debug -np 2 my_parallel_application

Debugging a serial or OpenMP job:

 $ totalview my_program

On the compute nodes (through the grid engine queues)

If your debugging sessions do not qualify to run on the head node, then you need to use dedicated test.q resources, which allow to run a job for less than 1 hour.

 $ qrsh -cwd -pe "ompi*" 4 -l h_rt=00:30:00,test=true mpirun --debug myapplication

If you are debugging large jobs, and require more slots than what test.q can provide, then you can request free slots for an interactive job in the production short.q queue. If free resources are available, they will be granted to you.

 $ qrsh -cwd -pe "ompi*" 20 -l h_rt=00:30:00 mpirun --debug myapplication

Debugging a serial or OpenMP job:

 $ qrsh -cwd -l h_rt=00:30:00,test=true totalview myapplication

Read more about running interactive jobs.

Issues

Debugging an MPI code

If you are debugging an MPI code, please be aware of the following situation might occur, which may lead you to believe that there is a problem with the debugger. Once you have launched your code in the debugger, and have answered "yes" in the dialog window to stop the parallel job, you will need to set a break point somewhere below the MPI_Init() call and then click "Go". If you do not set a break point below MPI_Init() and just click "Next", then you will get a message "Waiting to reach location" that does not go away, until you cancel it. If you set a break point before MPI_Init(), then the debugger will ignore it.

PGI C++

There is an issue with the pgCC compiler, where Totalview doesn't correctly match the code line it is executing with the code line it is displaying as executing. Note that some MPI wrappers (mpiCC, mpicxx or mpic++) that use the pgCC compiler will also suffer from this issue. To debug with the pgCC compiler and the MPI wrappers use the Portland group debugger PGDBG.

Sun Studio

There is currently an issue with programs that are compiled using the Sun compiler. When you try and debug programs that are compiled with the Sun compiler, Totalview can't get the values for variables in the program. This problem doesn't exist if you use the Portland compiler or the GNU compiler.