Memory Machine Cloud

Case Study

IGIB Processes 6,200+ Metagenomic Samples in Record Time Using AWS Spot Instances and MemVerge Checkpoint Restore

Summary

6,200+ Metagenomic samples processed successfully
20,000+ AWS spot instance quota achieved, scaling from initial 500
9 Parallel batch operations across three availability zones
Cost-effective HPC through spot instance optimization

Challenge

The field of metagenomics is rapidly growing. Samples are getting larger, which makes it difficult for researchers to process large-scale environmental samples.

IGIB needed to analyze over 6,200 metagenomic samples, requiring massive computational resources that would be prohibitively expensive with traditional infrastructure.

These samples provide critical insights into microbial communities from diverse ecosystems including soil, ocean, and the human gut – data essential for environmental monitoring, disease diagnostics, antibiotic resistance tracking, and biotechnology innovations.

The team faced several critical challenges:

Processing thousands of samples with varying computational requirements
Maintaining cost-effectiveness while scaling to meet demand
Ensuring resilience against potential spot instance interruptions
Managing complex workflow orchestration across multiple availability zones

Solution

PeriMatrix IT implemented a sophisticated cloud architecture leveraging AWS spot instances, MemVerge’s orchestration platform, Nextflow, the nf-core/mag pipeline, and Amazon FSx for Lustre in a three-zone deployment model within a single region.

The solution featured:

Strategic workload segmentation into 9 parallel batches (3 per zone, 150 samples per batch)
Deployment of MemVerge’s software platform that enables streamlined orchestration of containerized applications on AWS spot instances with support for various file systems and workflow management systems
Implementation of nf-core/mag, a bioinformatics best-practice pipeline for assembly, binning, and annotation of metagenomes

The team employed a methodical approach to optimization:

Started with a single test sample to observe costs and resource utilization
Made configuration changes to match compute requirements with utilization
Scaled progressively from 1 to 10 to 100 samples before full deployment
Fine-tuned batch size, number of batches, and time lag between batches to maximize spot instance availability

Results

The implementation delivered exceptional results across multiple dimensions:

Massive Scalability: Successfully processed over 6,200 metagenomic samples with compute requirements ranging from 1 vCPU to 32+ vCPU per task

Resource Optimization: Achieved continuous job execution by carefully balancing batch sizes and introducing time lags between batches, ensuring diverse compute requirements that matched available spot instances

Infrastructure Growth: Scaled AWS spot quota from 500 to over 20,000 instances through methodical utilization and strategic quota increases

Cost Efficiency: Delivered a highly robust solution at significantly reduced costs compared to on-demand instances

“Through methodical pipeline refinement, fault resilient cloud orchestration, and appropriate architecture deployment, we successfully met our compute requirements. Our project demonstrates the ability of cloud-native solutions to power cutting-edge scientific research at scale.”

Memory Machine Cloud

Case Study

IGIB Processes 6,200+ Metagenomic Samples in Record Time Using AWS Spot Instances and MemVerge Checkpoint Restore

Summary

Challenge

Solution

Results

— Usman Rashid, Data Engineer

MemVerge.ai

Memory Machine™ Batch

Memory Machine™ for CXL^®