DMTCP: Distributed MultiThreaded CheckPointing

About

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's program nor the operating system.

Among the applications supported by DMTCP are OpenMPI, MATLAB, python, perl, and many programming languages and shell scripting languages. With the use of TightVNC, it can also checkpoint and restart X-Windows applications, as long as they do not use extensions (e.g.: no OpenGL, no video). Among the Linux features supported by DMTCP are open file descriptors, pipes, sockets, signal handlers, process id and thread id virtualization (ensure old pids and tids continue to work upon restart), ptys, fifos, process group ids, session ids, terminal attributes, and mmap/mprotect (including mmap-based shared memory). See the QUICK-START file of the distribution for further details.

DMTCP does not yet support Infiniband or Myrinet for OpenMPI. This is planned for near term. Additional developers are welcome.

DMTCP is also the basis for URDB, the Universal Reversible Debugger. URDB is still experimental. Nevertheless, it currently adds reversibility to gdb, MATLAB, python (pdb), and perl (perl -d). It also supports reverse expression watchpoints, a form of temporal search within a process lifetime.

For further information, see the DMTCP Sourceforge project page.

Documentation

The dmtcp manpage contains an overview of commands and usage. Similar information can also be found in here (slightly outdated).

Getting DMTCP

Release 1.1.1 (Nov 13, 2009) is now available, including: The latest stable version of DMTCP can always be obtained through the sourceforge.net downloads page.

To obtain the most recent (possibly unstable) source from subversion, run the following command:

Programming Interface

DMTCP is completely transparent and can checkpoint unmodified Linux binaries. However, if you wish to call DMTCP from within your checkpointed program, we provide an optional programming interface called DMTCP Aware. To use DMTCP Aware:

Publications

See Also

URDB (Universal Reversible Debugger) is a reversible debugger for gdb, MATLAB, Python, Perl, and soon others. It is based on DMTCP. A technical report on URDB is also available.

Authors

DMTCP and its standalone single-process component MTCP (MultiThreaded CheckPointing) are currently maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Artem Polyakov, Mike Rieker, Praveen Solanki, and Ana Maria Visan. The list of active developers continues to evolve.
SourceForge.net Logo