Memory Machine™
Checkpoint Engine

Increases availability of AI workloads

Transparent Checkpointing & Restore

Memory Machine is a suite of powerful and intuitive container orchestration services for running data-intensive pipelines such as bioinformatics and interactive computing applications such as EDA.

Memory Machine Checkpoint Engine is a powerful checkpointing and restore engine that is designed for easy integration with Kubernetes, popular job scheduler, and AWS Batch environments. Once integrated, CPU and GPU resources can be checkpointed which allows hot re-starts of a pipeline, or interactive app, to a specific point in time.

The software is included with Memory Machine Cloud and Memory Machine AI and is available as a stand-alone application.

Delivering High QoS for Unreliable Spot Instances

Memory Machine Checkpoint Engine captures the entire running state of an AWS Batch Job into a consistent image and restores the Job on a new Compute Instance without losing any work progress. It ensures a high quality of service at the Batch level using low-cost, but unreliable Spot-based Compute Instances.

  • The MMC Batch Engine’s key features include:
  • Full integration into the customer Batch environment
  • Automated checkpoint and restore
  • No change to the customer workflow
  • No change to the Job applications and Workflow Manager scripts
  • Scalable across thousands of Batch Jobs and Compute Instances

Secure data processing within the customer VPC

Integrating Memory Machine Checkpoint Engine

Integrating Memory Machine Checkpoint Engine and job scheduler into your environment will provide:

  • Automated checkpoint and restore
  • Customer Platform/DevOps configuration at the job scheduler level, without requiring individual end-users to change their job scripts
  • Scalability in a production environment

Integration deliverables include:

  • Design doc describing architecture, components, and logic
  • MMC Engine installer
  • Sample scripts