Memory Machine Cloud
Case Study

IGIB Processes 6,200+ Metagenomic Samples in Record Time Using AWS Spot Instances and MemVerge Checkpoint Restore
Summary
- 6,200+ Metagenomic samples processed successfully
- 20,000+ AWS spot instance quota achieved, scaling from initial 500
- 9 Parallel batch operations across three availability zones
- Cost-effective HPC through spot instance optimization
Challenge
The field of metagenomics is rapidly growing. Samples are getting larger, which makes it difficult for researchers to process large-scale environmental samples.
IGIB needed to analyze over 6,200 metagenomic samples, requiring massive computational resources that would be prohibitively expensive with traditional infrastructure.
These samples provide critical insights into microbial communities from diverse ecosystems including soil, ocean, and the human gut – data essential for environmental monitoring, disease diagnostics, antibiotic resistance tracking, and biotechnology innovations.
The team faced several critical challenges:
- Processing thousands of samples with varying computational requirements
- Maintaining cost-effectiveness while scaling to meet demand
- Ensuring resilience against potential spot instance interruptions
- Managing complex workflow orchestration across multiple availability zones
Solution
PeriMatrix IT implemented a sophisticated cloud architecture leveraging AWS spot instances, MemVerge’s orchestration platform, Nextflow, the nf-core/mag pipeline, and Amazon FSx for Lustre in a three-zone deployment model within a single region.
The solution featured:
- Strategic workload segmentation into 9 parallel batches (3 per zone, 150 samples per batch)
- Deployment of MemVerge’s software platform that enables streamlined orchestration of containerized applications on AWS spot instances with support for various file systems and workflow management systems
- Implementation of nf-core/mag, a bioinformatics best-practice pipeline for assembly, binning, and annotation of metagenomes
The team employed a methodical approach to optimization:
- Started with a single test sample to observe costs and resource utilization
- Made configuration changes to match compute requirements with utilization
- Scaled progressively from 1 to 10 to 100 samples before full deployment
- Fine-tuned batch size, number of batches, and time lag between batches to maximize spot instance availability
Results
The implementation delivered exceptional results across multiple dimensions:
Massive Scalability: Successfully processed over 6,200 metagenomic samples with compute requirements ranging from 1 vCPU to 32+ vCPU per task
Resource Optimization: Achieved continuous job execution by carefully balancing batch sizes and introducing time lags between batches, ensuring diverse compute requirements that matched available spot instances
Infrastructure Growth: Scaled AWS spot quota from 500 to over 20,000 instances through methodical utilization and strategic quota increases
Cost Efficiency: Delivered a highly robust solution at significantly reduced costs compared to on-demand instances

“Through methodical pipeline refinement, fault resilient cloud orchestration, and appropriate architecture deployment, we successfully met our compute requirements. Our project demonstrates the ability of cloud-native solutions to power cutting-edge scientific research at scale.”
— Usman Rashid, Data Engineer
