DMTCP: Distributed MultiThreaded CheckPointing

About DMTCP:

DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpoint the state of multiple simultaneous applications, including multi-threaded and distributed applications. It operates directly on the user binary executable, without any Linux kernel modules or other kernel modifications.

Among the applications supported by DMTCP are Open MPI, MATLAB, Python, Perl, and many programming languages and shell scripting languages. Starting with release 1.2.0, DMTCP also supports GNU screen sessions, including vim/cscope and emacs. With the use of TightVNC, it can also checkpoint and restart X Window applications, as long as they do not use extensions (e.g.: no OpenGL, no video). See the QUICK-START file for further details.

DMTCP supports the OFED API for InfiniBand on an experimental basis. For older versions of OFED, the DMTCP 2.1 release should be adequate. For newer versions of OFED, please use the contrib/infiniband plugin from the svn, or from DMTCP 2.2 when it is released.

News | See Also | Authors | Acknowledgement

Announcement!

We are currently looking for well qualified applicants who are interested in joining a Ph.D. program in order to do research on checkpointing and reversible debugging. Interested applicants should write to Gene Cooperman (gene@ccs.neu.edu) at Northeastern University.
[2014-07-14]: DMTCP 2.3.1 released!
This is primarily a bug fix release.
[2014-07-03]: DMTCP 2.3 released!
This is primarily a bug fix release. However, if you are using DMTCP for the ARM v7 CPU, or if you are using DMTCP either with the InfiniBand network or with the SLURM batch system, then it is strongly recommended to upgrade. Check release notes for more details.
[2014-03-20]: DMTCP 2.2.1 released!
This is a bug fix release. Users relying on --enable-unique-checkpoint-filenames configure flag are highly recommended to upgrade to this release. Check release notes for more details.
[2014-03-14]: DMTCP 2.2 released!
In this release, the lowest layers have been re-organized and partially re-written for greater clarity of code and greater maintainability. Also, users relying on the use of DMTCP with MPI, InfiniBand or the Toruqe or SLURM batch queues are strongly advised to upgrade. Check release notes for more details.
[2014-01-12]: DMTCP 2.1 released!
This release includes enhancement to the core feature set and some newly stable plugins. Check release notes for more details.
[2013-10-03]: DMTCP 2.0 released!
This version 2.0 release represents the future of DMTCP. DMTCP version 2.0 has been re-designed around the concept of plugins. The older DMTCP version 1.2.x branch will continue to be maintained for bug fixes. Check release notes for more details.
[2013-08-03]: DMTCP 1.2.8 released!
This is primarily a bug fix release. It is particularly recommended to upgrade if you are using DMTCP with the ARM CPU, or if you will compile DMTCP with a C++11 compiler (e.g. GNU flag -std=c++11). Check release notes for more details.
[2013-03-13]: DMTCP 1.2.7 released!
This is primarily a bug fix release. Check release notes for more details.
[2012-07-31]: DMTCP 1.2.6 released!
Previous release had issues with compilation on older kernels, this release fixes that. It also contains some changes needed for gcc 4.7. Check release notes for more details.
[2012-05-27]: DMTCP 1.2.5 released!
Along with numerous bug fixes, this release features support for ARM processors along with support for epoll/eventfd/signalfd system calls. Check release notes for more details.
[2012-01-23]: DMTCP 1.2.4 released!
Along with lot of bug fixes, this release focuses on robust treatment of processes that rapidly create and destroy threads. Users of DMTCP 1.2.3 are highly encouraged to upgrade to this release. Check release notes for more details.
[2011-07-22]: DMTCP 1.2.3 released!
This is primarily a bug-fix release with lots of bug fixes that improve overall stability. Users of DMTCP 1.2.2 are highly encouraged to upgrade to this release. Check release notes for more details.
[2011-06-22]: DMTCP 1.2.2 released!
Along with a lot of bug fixes, this release provides support for a module system allowing users to write their own extension to DMTCP, and removed dependency on libc.a for building. Check release notes for more details.
[2011-03-12]: DMTCP 1.2.1 released!
Along with a lot of bug fixes, this release provides support for MPICH2 1.3.x (transparently checkpointing MPICH2 under DMTCP), and calling dmtcpaware API (dmtcpCheckpoint(), etc.) directly from inside a python session. Check release notes for more details.

[2010-11-04]: DMTCP 1.2.0 released!
This is a semi-major release of DMTCP. The biggest change is the support for GNU screen sessions. It also fixes some instabilities in checkpointing Matlab under certain environments. Also includes numerous bug fixes were implemented as a part of review of DMTCP sub-systems.
DMTCP and its standalone single-process component MTCP (MultiThreaded CheckPointing) are currently maintained by Jason Ansel, Kapil Arya, Gene Cooperman, Artem Polyakov, Mike Rieker, Ana Maria Visan, and Tyler Denniston. The list of active developers continues to evolve.
The DMTCP project is partially supported by Intel Corporation and by the National Science Foundation under grant OCI-0960978. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of Intel Corporation or of the National Science Foundation.

Click here for comments.

SourceForge.net Logo