DMTCP: Distributed MultiThreaded CheckPointing

About DMTCP:

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Among the applications supported by DMTCP are OpenMPI, MATLAB, Python, Perl, and many programming languages and shell scripting languages. Starting with release 1.2.0, DMTCP also supports GNU screen sessions, including vim/cscope and emacs. With the use of TightVNC, it can also checkpoint and restart X Window applications, as long as they do not use extensions (e.g.: no OpenGL, no video). See the QUICK-START file for further details.

DMTCP does not yet support Infiniband or Myrinet for OpenMPI. This is planned for near term. Additional developers are welcome.

DMTCP is also the basis for URDB, the Universal Reversible Debugger. URDB was an experimental project for reversibility for four debuggers: gdb, MATLAB, python (pdb), and perl (perl -d). It is now obsolete, and work is continuing on a newer internal project, which will be released as open source in the future.

News | See Also | Authors | Acknowledgement

Announcement!

We are currently looking for well qualified applicants who are interested in joining a Ph.D. program in order to do research on checkpointing and reversible debugging. Interested applicants should write to Gene Cooperman (gene@ccs.neu.edu) at Northeastern University.
[2013-03-13]: DMTCP 1.2.7 released!
This is primarily a bug fix release. Check release notes for more details.
[2012-07-31]: DMTCP 1.2.6 released!
Previous release had issues with compilation on older kernels, this release fixes that. It also contains some changes needed for gcc 4.7. Check release notes for more details.
[2012-05-27]: DMTCP 1.2.5 released!
Along with numerous bug fixes, this release features support for ARM processors along with support for epoll/eventfd/singalfd system calls. Check release notes for more details.
[2012-01-23]: DMTCP 1.2.4 released!
Along with lot of bug fixes, this release focuses on robust treatment of processes that rapidly create and destroy threads. Users of DMTCP 1.2.3 are highly encouraged to upgrade to this release. Check release notes for more details.
[2011-07-22]: DMTCP 1.2.3 released!
This is primarily a bug-fix release with lots of bug fixes that improve overall stability. Users of DMTCP 1.2.2 are highly encouraged to upgrade to this release. Check release notes for more details.
[2011-06-22]: DMTCP 1.2.2 released!
Along with a lot of bug fixes, this release provides support for a module system allowing users to write their own extension to DMTCP, and removed dependency on libc.a for building. Check release notes for more details.
[2011-03-12]: DMTCP 1.2.1 released!
Along with a lot of bug fixes, this release provides support for MPICH2 1.3.x (transparently checkpointing MPICH2 under DMTCP), and calling dmtcpaware API (dmtcpCheckpoint(), etc.) directly from inside a python session. Check release notes for more details.

[2010-11-04]: DMTCP 1.2.0 released!
This is a semi-major release of DMTCP. The biggest change is the support for GNU screen sessions. It also fixes some instabilities in checkpointing Matlab under certain environments. Also includes numerous bug fixes were implemented as a part of review of DMTCP sub-systems.
DMTCP and its standalone single-process component MTCP (MultiThreaded CheckPointing) are currently maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Artem Polyakov, Mike Rieker, Ana Maria Visan, and Tyler Denniston. The list of active developers continues to evolve.
The DMTCP project is partially supported by the National Science Foundation under grant OCI-0960978. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Click here for comments.

SourceForge.net Logo