Memory Machine Cloud

Case Study

TGen improves the visibility, efficiency, & performance of Nextflow workflows on AWS

Translational Genomics Research Institute (TGen) is a leading nonprofit based in Phoenix and focused on translating genomic research into life changing results.

“WaveRider picks the most appropriate VM type for each of the thousands of jobs launched by NextFlow. That is giving me more efficiency than ever. Very cool.”

– Vince Pagano, Senior Scientific Programmer, TGen – Translational Genomic Research Institute

Cloud Cost Optimization is Not Easy at Scale

TGen struggled to run its large Nextflow workflows cost-efficiently on AWS. Using on-demand EC2 instances was too expensive, especially for larger jobs requiring thousands of nodes. Using cheap Spot EC2 instances sounded good on paper, but in practice failure rates could be as high as 80%, resulting in longer completion times as failed jobs had to be restarted multiple times and less predictable cost savings.

More Visibility, Efficiency, and Performance = Better Research

TGen leverages MemVerge Memory Machine Cloud to execute its Nextflow workflows on AWS. Memory Machine Cloud features such as FLOAT, SpotSurfer, WaveRider, and WaveWatcher seamlessly integrate with Nextflow as a supported cloud executor.

  • Automated cloud resource management (FLOAT)
  • Deep insights into cloud resource utilization (WaveWatcher)
  • Cost and performance optimization (SpotSurfer + WaveRider)

Improving the Nextflow Runtime Experience in the Cloud

Since deploying Memory Machine Cloud, TGen is now able to run its workflows easily and cost-efficiently on AWS without the high costs of on-demand EC2 and without the high failure rates that can come from running large workflows entirely on Spot EC2.

  • Organization-wide visibility into job-level resource utilization reports and insights
  • Enhanced workflow execution with Spot EC2 from 80% failure rates → below 1%
  • Automated rightsizing of EC2 instances at runtime

Before

~80%

Job Failure
due to Spot reclaim

After

0%

Job Failure
due to Spot reclaim

“I was getting up to 80% batch failure rates with Spot EC2, now with SpotSurfer we have already brought failure rates due to spot reclaims to below 1%, and we are just getting started.”

– Vince Pagano, Senior Scientific Programmer, TGen