Power the democratization of genomics with the cloud

By Todd Bergeson, Microsoft Sr. Global Marketing Manager, Public Sector

May 3, 2018

We are on the cusp of the democratization of genomics. Cost has long been a key barrier to widespread application of genomics in healthcare. Indeed, as recently as 2007, the price tag for sequencing a single human genome was $2 million.[1]

Fortunately, advances in technology have rapidly reduced the cost of sequencing. Today sequencing a whole human genome has fallen to just above $1000,[2] which has long been considered the benchmark to reach for affordable sequencing.[3] The price of sequencing a genome is now closing in on the price range of other routine medical tests. Given this precipitous drop in price, experts predict that by 2025 researchers will have sequenced over 100 million human genomes.[4]

The democratization of genomics will usher in a new era of precision medicine

The insights from genomic data open up new possibilities for delivering more effective and personalized care. It is expected that genomics will drive profound advances in cancer treatments. Researchers anticipate that by analyzing genomic data from tumor tissue, a physician will be able to select the treatment that will be most effective for the patient. It is expected that genomics will enable scenarios like prediction of inherited disease and treatment and perhaps even prevention of disease through gene therapy.

But genome-powered precision medicine at scale is only possible if healthcare organizations overcome the significant challenges presented by the scale and complexity of genomic data.

The journey from raw biological sample to actionable insight presents big data hurdles

Sequencing a single human genome produces an enormous amount of data. Sequencing starts with a raw biological sample from the patient—generally blood or saliva. In the lab, the sample is fragmented and each fragment is measured, or sequenced, producing digital strings of As, Gs, Cs, and Ts.

This raw data consists of around 1 billion 100-basepair strings—or around 100 gigabytes of data. 100 gigabytes may not seem like such a huge amount of data, but consider it in perspective: As researchers sequence more and more genomes, gigabytes will quickly give way to petabytes and even exabytes. In fact, by 2025, it is estimated that as much as 40 exabytes of storage capacity will be needed for human genomic data.[5]

Given the scale and complexity of genomic data, it’s not surprising that achieving insights from this raw data requires massive compute power. After sequencing, the strings of raw data are reassembled and aligned to a standard human genome. Then, using advanced analytics, researchers call out where the patient’s genome differs from the standard human genome—this is called variant calling.

This process, commonly known as secondary analysis, requires hundreds of core-hours of compute time. To understand what this compute process looks like at scale, consider that it would require over 9 million core-hours to analyze the genomic data of everyone at a sold-out event in New York’s Madison Square Garden.[6]

The cloud puts genomics within reach for more organizations

Given the scale and complexity of the genomic workload, managing genomic data on premises is beyond the reach of many health organizations. Fortunately, the cloud, with its practically unlimited ability to scale, presents a viable alternative. The cloud can automatically provide additional storage as soon as it’s needed. In contrast, the process of requisitioning, purchasing, and setting up additional disk space on premises can take months.

Computing resources are similarly scalable on demand. Translating raw genomic data into useful information requires extensive processing that only supercomputers or cloud computing resources can handle. While only a small number of elite research centers, major medical centers, or large pharmaceutical companies have access to super computers, any organization—from the smallest startup to the largest enterprise—can access the cloud.

Power your genomic analysis with Microsoft Genomics

With over a decade of research in genomic, Microsoft is committed to enabling healthcare organizations to overcome the hurdles to genomic analysis. That’s why we offer Microsoft Genomics, a scalable, secure, compliant cloud solution for genomic analysis.

With Microsoft Genomics service, healthcare organizations can support their most demanding sequencing needs by leveraging the Microsoft cloud to reliably run complex workloads at scale. The Genomics service orchestrates fast, efficient execution of a best-practices genomics pipeline, enabling healthcare organizations to take advantage of state-of-the-art analysis without the complexity of managing and updating their own hardware and software. And healthcare organizations can count on producing results that are highly accurate and consistent from run to run.

Naturally, security and compliance are top of mind for healthcare organizations, especially when it comes to the cloud. The Genomics service complies with the major global and local security, provenance, and privacy standards, including ISO and HIPAA. And healthcare organizations can protect their data with Azure, Microsoft’s trusted cloud platform built around security and privacy.

Last but not least, with the Genomics service, healthcare organizations pay as they go with no upfront commitment and can predict their costs accurately using a simple per-genome billing model that’s based on the number of gigabases in their samples. A single experience for billing and support keeps projects on track and within budget.

Accelerate genomics initiatives on a proven cloud platform

Leverage Microsoft’s expertise to power your genomic analysis and be confident that you will get the most from your data with a growing ecosystem of tools, datasets, and partner solutions. Visit the Microsoft Genomics page on Azure to learn more about the Genomics service and get started with a trial today

[1] Emily Singer, “The $2 Million Genome,” MIT Technology Review (June 1, 2007), https://www.technologyreview.com/s/407992/the-2-million-genome/

[2] National Human Genome Research Institute, “DNA Sequencing Costs: Data,” https://www.genome.gov/sequencingcostsdata/

[3] Leslie Pray, “A Cheap Personal Genome?,” Genome Biology (2002), https://doi.org/10.1186/gb-spotlight-20021007-02

[4] Zachary D. Stephens et al., “Big Data: Astronomical or Genomical,” PLOS Biology 13, no. 7 (2015)

[5] Ibid.

[6] Calculated based on 450 core-hours per genome

Todd Bergeson

Microsoft Sr. Global Marketing Manager, Public Sector

See more articles from this author

PublishedApr 18

4 min read read

Microsoft and partners: Securing the digital subsea environment through innovation
PublishedApr 11

5 min read read

3 ways Microsoft AI capabilities are helping public finance agencies reignite economies
PublishedApr 10

5 min read read

Maximize machine learning and data management in Azure Data Manager for Energy
PublishedApr 10

4 min read read

Beyond HIMSS24: Microsoft partners redefine healthcare with AI solutions

Power the democratization of genomics with the cloud

Related posts

Explore Microsoft industry solutions

Related posts

Microsoft and partners: Securing the digital subsea environment through innovation

3 ways Microsoft AI capabilities are helping public finance agencies reignite economies

Maximize machine learning and data management in Azure Data Manager for Energy

Beyond HIMSS24: Microsoft partners redefine healthcare with AI solutions