DMTCP: Distributed MultiThreaded CheckPointing

DMTCP Supported Apps:

In general, we try to support all mainstream applications and most others. The list below provides a sample of commonly used applications for which we have directly verified support from DMTCP. If your application does not work with DMTCP, this is a bug in DMTCP! We would appreciate it if you can notify us of such applications, so that we can work with you on fixing this bug in DMTCP.

The Closed World Assumption: The largest source of incompatibility with DMTCP is applications that violate the "closed world assumption". This means applications that connect with external services. In these cases, we must add heuristics that know about the external world. A classic example is X Window apps (see the list below). Other examples for which DMTCP has added heuristic support are: NSCD (Name Service Caching Daemon), LDAP (Lightweight Directory Access Protocol), re-connecting stdin/stdout/stderr to the current terminal device, and so on.

The Closed World Assumption and End-User Customizations: Finally, DMTCP provides an end-user API that allows users to customize DTMCP to come closer to the closed world assumption. For example, a database client can disconnect from the server before checkpoint, and reconnect after resume/restart. A database server can delay checkpointing until all current transactions have completed.

Time (rdtsc) and Sleep: Time and sleep present special cases during restart. Any counter or timer active since the beginning of a process will consider the restarted process to be a new process. This can affect the x86/x86_64 assembly instruction rdtsc, which counts the number of clock cycles since the beginning of the process. Currently, if a checkpointed process was inside the sleep system call, then DMTCP restart will cause sleep to resume based on the number of seconds remaining to sleep. (This last policy may change in the future.)

Three newer non-POSIX Linux system calls: DMTCP supports three newer Linux system calls (epoll, eventfd, and signalfd) as of DMTCP release 1.2.6.

Work in Progress: Support for inotify will be provided in a future release. Please contact us for the latest status, or if you have an additional application for which you need support.

Some of the commonly used applications supported by DMTCP:

  • Programming Languages: C/C++, Java, GNU Lisp, ...
  • Parallel Computing: Open MPI, MPICH2, OpenMP, Cilk, InfiniBand, SLURM and Torque resource managers, ... (see Parallel Computing page)
  • Scripting and Shell Languages: Python, Perl, dash/bash/tcsh/zsh, screen, ...
  • Web/Internet: MySQL, PHP, Firefox (see X Window Apps), Apache, ...
  • Text-based Editors: vi/vim/cscope (vim mouse support not available), emacs, ...
  • X Window Apps: Supported with VNC -- disconnect VNC viewer, and then checkpoint VNC server. Reconnect viewer after resume/restart. (This method will not currently work for checkpointing 3D graphics (e.g. OpenGL). We are in the progress of adding support for OpenGL.)
  • Not Supported: Statically Compiled Applications (using libc.a instead of Could be supported through use of trampolines. Please contact us if you have an important application.

Click here for comments. Logo