DMTCP: Distributed MultiThreaded CheckPointing

Citing DMTCP (please cite this publication):


DMTCP Publications (reverse chronological order):


Publications using DMTCP in their work (not simply citing DMTCP) (in reverse chronological order):

  1. Be Kind, Rewind --- Checkpoint & Restore Capability for Improving Reliability of large-scale Semiconductor Design,
    Igor Ljubuncic, Ravi Giri, Avikam Rozenfeld, and Andrew Goldis,
    2014 IEEE High Performance Extreme Computing Conference (HPEC-2014),
    IEEE Press, Sept., 2014, Bibtex.

  2. DMTCP: System-Level Checkpoint-Restart in User-Space,
    Kapil Arya and Gene Cooperman
    MVAPICH User's Group (MUG'14),
    Columbus, Ohio, Aug. 26, 2014; MUG'14 program, slides, and video; Bibtex.

  3. Metodología para Predecir el Consumo Energético de Checkpoints en Sistemas de HPC,
    Javier Balladini, Marina Morán, Dolores Rexachs, and Emilio Luque,
    XX Congreso Argentino de Ciencias de la Computación (CACCIC'14),
    10 pages, Oct., 2014, Bibtex.

  4. Using SAGA and the Open Science Grid to Search for Aptamers,
    Kevin Shieh, Pilib Ó Broin, David Rhee, Matthew Levy, and Aaron Golden,
    Proc. of 2014 Ann. Conf. on Extreme Science and Engineering Discovery Environment (XSEDE'14),  Art. No. 27,
    Bibtex.

  5. Simulation Speedup of ns-3 using Checkpoint and Restore (WNS3'14),
    Kyle Harrigan and George Riley,
    Proceedings of the 2014 Workshop on ns-3 (WNS3'14),  Art. No. 7, 2014
    Bibtex.

  6. User-Space Process Virtualization in the Context of Checkpoint-Restart and Virtual Machines,
    Kapil Arya, PhD thesis, Northeastern University, August, 2014, Bibtex.

  7. Use of Checkpoint-Restart for Complex HEP Software on Traditional Architectures and Intel MIC,
    Kapil Arya, Gene Cooperman, Andrea Dotti and Peter Elmer,
    J. Physics: Conference Series 523, Conference 1,
    (from Proc. of 15th Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2013)),
    IOPScience, 8 pages, 2014, Bibtex.

  8. HOL(y)Hammer: Online ATP Service for HOL Light,
    Cezary Kaliszyk and Josef Urban,
    Mathematics in Computer Science, pp. 1--18, Jun 28, 2014, Springer, Bibtex.

  9. GemFI: A Fault Injection Tool for Studying the Behavior of Applications on Unreliable Substrates,
    K. Parasyris, S.Tziantzoulis ; C.D. Antonopoulos, and N. Bellas,
    44th Ann. IEEE/IFIP Int. Conf. on Dependable Systems and Networks (DSN), pp. 622--629 , IEEE Press, Jun., 2014, Bibtex.

  10. Optimization Tools of Parallel Simulation of Nanostructures with Quantum Dots,
    K. V. Pavskii, M. G. Kurnosov, and A. Yu. Polyakov,
    Opetoelectronics, Instrumentation and Data Processing 50(3), pp. 260--265,
    Springer Press,  May, 2014,
    Bibtex.
    (Original Russian Text at: K.V. Pavskii, M.G. Kurnosov, A.Yu. Polyakov, 2014, published in Avtometriya, 2014, Vol. 50, No. 3, pp.  56--61.)

  11. Modular Software Model Checking for Distributed Systems,
    Leungwattanakit, W., Artho, C., Hagiya, M., Tanabe, Y., Yamamoto, M., and Takahashi, K.,
    IEEE Trans. on Software Engineering 40(5), pp. 483--501, May, 2014, IEEE Press, Bibtex

  12. Improving the Efficiency of Fuzz Testing Using Checkpointing,
    Erenst-Friedrich Zachow,
    Master Thesis, ETH-Zürich, April 1, 2014,
    Bibtex.

  13. jmodeltest.org: Selection of Nucleotide Substitution Models on the Cloud,
    Jose Manuel Santorum, Diego Darriba1, Guillermo L. Taboada1, and David Posada,
    Bioinformatics 30(9),
    pp. 1310-1311, Oxford Journals, Jan. 21, 2014, Bibtex.

  14. Explorations of the viability of ARM and Xeon Phi for physics processing,
    David Abdurachmanov, Kapil Arya, Josh Bendavid, Tommaso Boccali, Gene Cooperman, Andrea Dotti, Peter Elmer, Giulio Eulisse, Francesco Giacomini, Christopher D. Jones, Matteo Manzali and Shahzad Muzaffar,
    J. Physics: Conference Series 513, Track 5,
    (from Proc. of 20th Int. Conf. on Computing in High Energy and Nuclear Physics (CHEP13)),
    IOPScience, 7 pages, 2014, Bibtex.

  15. DMTCP: Bringing Checkpoint-Restart to Python,
    Kapil Arya and Gene Cooperman, Proc. of the 12th Python in Science Conf. (SciPy 2013),
    6 pages, 2013, Bibtex.

  16. A Framework for an In-depth Comparison of Scale-up and Scale-out,
    Michael Sevilla, Ike Nassi, Kleoni Ioannidou, Scott Brandt, and Carlos Maltzahn,
    Proc. of 2013 Int. Workshop on Data-Intensive Scalable Computing Systems (DISCS'13), pp. 13--18, 2013
    Bibtex.

  17. A Tool for Selecting the Right Target Machine for Parallel Scientific Applications,
    Javier Panadero, Alvaro Wong, Dolores Rexachs, and Emilio Luque,
    Procedia Computer Science 18, pp. 1824--1833, Elsevier, 2013,
    Bibtex.

  18. Formal Mathematics on Display: A Wiki for Flyspeck,
    Carst Tankink, Cezary Kaliszyk, Josef Urban, and Herman Geuvers,
    Intelligent Computer Mathematics,
    Lecture Notes in Computer Science Volume, vol. 7961, pp. 152--167, Springer, 2013,
    Bibtex.

  19. Towards Computing as a Utility via Adaptive Middleware: An Experiment in Cross-paradigm Execution,
    Jaroslaw Slawinski and Vaidy Sunderam,
    Parallel Processing Letters 23(2), 18 pages,
    World Scientific,  June, 2013,
    Bibtex.

  20. Calculation of the Subgroups of a Trivial-Fitting Group,
    Alexander J. Hulpke,
    Proc. of 38th International Symposium on Symbolic and Algebraic Computation, pp. 205--210, 2013, ACM Press,
    Bibtex.

  21. Semi-Automated Debugging via Binary Search through a Process Lifetime,
    Kapil Arya, Tyler Denniston, Ana-Maria Visan, and Gene Cooperman,
    Proc. of 7th Workshop on Programming Languages and Operating Systems (PLOS) (part of Proc. of 24th ACM Symp. on Operating System Principles (SOSP)), 2013,
    ACM Press, Oct., 2013, Bibtex.

  22. Shorten Device Boot Time for Automotive IVI and Navigation Systems (slides),
    Jim Huang and Shi-Wu Lo (developers, 0xlab),
    Automotive Linux Summit (ALS2013), May 28, 2013.
    (See "Part II: Userspace solution: Checkpointing"; begins at slide 66)

  23. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes,
    P. Pavlidis, D. Živkovic, A. Stamatakis, N. Alachiotis and P. Pavlidi,
    Heidelberg Institute for Theoretical Studies, Technical report Exelixis-RRDR-2013-1, February, 2013

  24. A Survey of Fault Tolerance Mechanisms and Checkpoint/Restart Implementations for High Performance Computing Systems,
    I.P. Egwutuoha, D. Levy, B. Selic and S. Chen,
    The Journal of Supercomputing, Feb., 2013, Springer

  25. Proposal of Incremental Software Simulation for Reduction of Evaluation Time,
    Atsushi Shina, Kanemitsu Ootsu, Takeshi Ohkawa, Takashi Yokota and Takanobu Baba,
    Third Int. Conf. on Networking and Computing (ICNC), pp. 311--315, IEEE Press, Dec., 2012, Bibtex.

  26. Implement Checkpointing for Android (to speed up boot time and development process) (slides),
    Jim Huang and Kito Cheng (developers, 0xlab),
    Embedded Linux Conference Europe (ELCE2012),
    Barcelona, Spain; Nov. 5--7, 2012.

  27. Adapting MPI to MapReduce PaaS Clouds: An Experiment in Cross-Paradigm Execution,
    Jaroslaw Slawinski and Vaidy Sunderam,
    Proc. of 2012 IEEE/ACM Fifth Int. Conf. on Utility and Cloud Computing (UCC '12), pp. 199--203, 2012, Bibtex.

  28. Creating and Improving Multi-Threaded Geant4.
    Xin Dong, Gene Cooperman, John Apostolakis, Sverre Jarp, Andrzej Nowak, Makoto Asai and Daniel Brandt,
    Journal of Physics: Conference Series, Volume 396, Part 5, 2012

  29. Temporal Meta-Programming: Treating Time as a Spatial Dimension,
    Ana-Maria Visan, PhD thesis, Northeastern University, April, 2012, Bibtex.

  30. Verification of Embedded Control Systems by Simulation and Program Execution Control,
    Stefan Resmerita and Wolfgang Pree,
    American Control Conference (ACC), pp. 3581--3586, June, 2012, IEEE Press

  31. Checkpointing in Distributed Heterogeneous Environments,
    Michael Schöttner and John Mehnert-Spahn,
    Technical Report, Heinrich Heine University, Duesseldorf, Germany, 26 pages, March, 2012,
    (from Universität Düsseldorf: Publications),
    Bibtex.

  32. Source-Level Transformation of Legacy Sequential Program into Scalable Thread-Parallel Code,
    Xin Dong, PhD thesis, Northeastern University, Dec., 2011, Bibtex.

  33. Model Checking Distributed Systems by Combining Caching and Process Checkpointing,
    Watcharin Leungwattanakit, Cyrille Artho, Masami Hagiya, Yoshinori Tanabe, and Mitsuharu Yamamoto,
    26th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 103--112,
    IEEE Press, Dec., 2011. Bibtex.

  34. Including the Workload Effect in the Parallel Program Signature,
    J.M. Canillas, A. Wong, D. Rexachs, and E. Luque,
    Proc. of 13th Int. Conf. on High Performance Computing and Communications (HPCC), pp. 304--311,
    IEEE Computer Society, Sept., 2011. Bibtex.

  35. Predicting Parallel Applications Performance Using Signatures: the Workload Effect,
    J.M. Canillas, A. Wong, D. Rexachs, and E. Luque,
    9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), pp. 299--300,
    IEEE Computer Society, Dec., 2011. Bibtex.

  36. URDB: A Universal Reversible Debugger Based on Decomposing Debugging Histories,
    Ana-Maria Visan, Kapil Arya, Gene Cooperman, and Tyler Denniston,
    Proc. of 6th Workshop on Programming Languages and Operating Systems (PLOS) (part of Proc. of 23rd ACM Symp. on Operating System Principles (SOSP)), 2011,
    ACM Press, Oct., 2011. Bibtex.

  37. Direct Inference of Protein--DNA Interactions using Compressed Sensing Methods,
    Mohammed AlQuraishi and Harley H. McAdams,
    Proc. of National Academy of Sciences (PNAS) 108(36), pp. 14819--14824,
    Sept. 6, 2011. Full Text (html), Full Text (pdf), Bibtex.

  38. Hiroyuki Takizawa and Kentaro Koyama and Katsuto Sato and Kazuhiko Komatsu and Hiroaki Kobayashi,
    CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications,
    Proc. of 2011 IEEE International Parallel and Distributed Processing Symposium, pp. 864--876
    IEEE Computer Society, May, 2011. Bibtex.

  39. Distributed Speculative Parallelization using Checkpoint Restart,
    Devarshi Ghoshal, Sreesudhan R. Ramkumar, and Arun Chauhan,
    Procedia Computer Science4, pp. 422--431,
    May, 2011, Slides, Bibtex.

  40. Unibus: Aspects of Heterogeneity and Fault Tolerance in Cloud Computing M. Slawiñska, J. Slawinski, and V. Sunderam,
    Proc. of IEEE Int. Symp. on Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), pp. 1--10,
    Apr., 2010, Bibtex.


Click here for comments.

SourceForge.net Logo