DMTCP (Distributed MultiThreaded Checkpointing) is a tool to transparently checkpointing the state of an arbitrary group of programs spread across many machines and connected by sockets. It does not modify the user's program nor the operating system.
For further information, try the DMTCP Sourceforge project page
svn co https://dmtcp.svn.sourceforge.net/svnroot/dmtcp dmtcp
DMTCP is completely transparent and can checkpoint unmodified Linux binaries. However, if you wish to call DMTCP from within your checkpointed program, we provide an optional programming interface called DMTCP Aware. To use DMTCP Aware:
DMTCP: Transparent Checkpointing for Cluster Computations and the Desktop.
Jason Ansel, Kapil Arya, and Gene Cooperman.
23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS'09).
Rome, Italy. May, 2009.
Slides.
Bibtex.
Transparent User-Level Checkpointing for the Native POSIX Thread Library for Linux.
Michael Rieker, Jason Ansel, and Gene Cooperman.
The 2006 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'06).
Las Vegas, NV. Jun, 2006. Bibtex.