Case Study

SPAdes with SpotSurfer

SpotSurfer enables long-running SPAdes Jobs to run on low-cost Spot instances for leading plant genomics company

This SPAdes user is a global market leader in seed potato trading, innovative breeding, and concept development. The company needs to work with its commercial growers, genetic partners, and government organizations to identify and develop potato variants with the desired disease and pest resistance, water consumption, crop yields, soil compatibility, visual characteristics, and flavors. Developing each variant requires many genomic simulations, and each job requires significant investments by researchers and data engineers to develop variant models. Each variant model requires job runs that consume expensive cloud resources and can take days to execute.

The Problem

The lead data engineer for and his colleagues faced multiple challenges when running SPAdes genomics job modeling on Azure. These workloads are massively parallel and involve multiple application instances; data sets to identify genomic variations and sequence the DNA. These data sets can be terabytes in size and consume a volume of CPU and memory resources is beyond their cloud budget.

 

  • VM type: M32-16ms(16 CPU, 875GB Mem, 1TB storage, Memory optimized instance)
  • Run time: 216 hours (9 days)
  • MMCE Cost ($0.61/hr): $131.76

Their Primary Challenges Included:

100% First-Pass Execution

With genomics jobs requiring days to run, HZPC needs jobs to run with 100% effectiveness the first time without failures due to resource limitations (CPU/Memory) caused by spikes in demand related to data copies or other resource-hungry applications.

 

Excessive Cloud Costs

One of the biggest challenges is over-allocations to accommodate demand spikes during application runs. This creates excess costs and reduces the number of models that can be run.

 

Manage Cloud Spot Instances

The team needs the ability to start and stop job runs from matching job resources to available spot instances without risking first-pass execution.

 

Mapping CPU vs. Memory Resources

Depending on the genomics work stage, different resources are under stress.  Developing a new variant sequence is more CPU intensive; versus comparing already assembled variants that require more memory storage than CPU resources.

 

Prioritizing Concurrent Job Runs

The team at HZPC needs to manage and reduce the size of jobs

The Solution

The data engineering team sought a new way to dynamically manage SPAdes jobs to improve first-time job runs, prioritize job run order, optimize cloud costs, and leverage spot instances on Azure cloud. The chose to respond with Memory Machine Cloud Edition for 3 reasons:

AppCapsule Snapshots

This featured enabled the data engineering team to pause a running application, take a snapshot of the entire application state (data in memory, storage, cache, registers, etc.). The ability to create AppCapsules opened a new world of SPAdes job mobility and predictability.

 

SpotSurfer

The end goal was to run their SPAdes jobs on lower cost Spot instances. SpotSurfer combined the use of AppCapsules and cloud service orchestration to ensure their jobs would recover gracefully from Spot terminations.

The Bottom Line

EC2 compute costs reduced by 90% by moving workload from On-Demand to Spot

Request a Demo or Free Trial

We are happy to provide you a full demo, free trial, or access to play in a Sandbox set-up in AWS so that you can see the capabilities of the MemVerge solution. Get started for free!