DMTCP: Distributed MultiThreaded CheckPointing
About
DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently
checkpointing the state of an arbitrary group of programs spread across
many machines and connected by sockets. It does not modify the user's
program nor the operating system.
For further information, try the
DMTCP Sourceforge project page
Documentation
The dmtcp manpage contains an overview of commands and usage. Similar information can also be found
in here (slightly outdated).
Getting DMTCP
The latest stable version of DMTCP can be obtained through the sourceforge.net downloads
page. Including:
To obtain the most recent (possibly unstable) source from subversion,
run the following command:
Programming Interface
DMTCP is completely transparent and can checkpoint unmodified Linux binaries.
However, if you wish to call DMTCP from within your checkpointed program,
we provide an optional programming interface called DMTCP Aware. To use
DMTCP Aware:
- Add dmtcpaware.h and dmtcpware.c (or libdmtcpaware.a) to
your project. These files can be found in the /usr/lib/dmtcp/ directory
if you are using a binary distribution or in the .../dmtcpaware/
directory of the source distribution.
- Include dmtcpaware.h from your code. You should now have
access to the DMTCP Aware API.
- Documentation of the DMTCP Aware API functions can be found here.
To test if your program is running under dmtcp_checkpoint use
dmtcpIsEnabled(). Note that if your program is not running
under DMTCP, dmtcpIsEnabled() will return false, and all other API functions
will return -128 or NULL and have no effect.
Authors
DMTCP and its standalone single-process compontent MTCP (MultiThreaded CheckPointing) were created and are maintained by
Jason Ansel,
Kapil Arya,
Gene Cooperman,
Mike Rieker,
Ana Maria Visan,
and
Alex Brick.