Cloud Case Study Precis


The purpose of this page is to concisely describe cloud computing / data science case studies at UW.

We are currently tracking over 75 projects/groups migrating their research computing to the public cloud. Below is a selected subset, organized roughly by domain. We emphasize that much of our work goes beyond simple migration to cloud platforms: It extends to data management techniques, reproducibility, implementing data services for collaboration and public distribution, and much more. All of this is in turn made possible by the UW eScience Institute.


  • Contact us for updates to this material
  • Focus here is topics for a degree of anonymity
  • Projects are in *various stages of completion*

General Systems and Tools

  • SQL Share: A system for managing, sharing and manipulating research data.
  • Myria: A distributed, shared-nothing Big Data management system and Cloud service from the University of Washington
  • IOT based on Arduino Yun leaf technology and cloud IOT endpoint services
  • Geohackweek and Neurohackweek: Hosting intensive workshops for learning and developing cloud-based tools and methods at UW

Medical research

  • Laboratory Medicine: Cloud-based system for genome analysis: (oncology and related) clinical annotation
  • Crossing the clinical-to-research data barrier
  • Data access and tool access for MRI- and EEG-based research
  • Gut biome metagenomics (Children’s Hospital)
  • Patterns in unexpected in-hospital mortality
  • Deep learning for patient behavior prediction: EEG data in relation to A/V transcripts of patient behavior
  • Canine longitudinal aging studies
  • Cyberinfrastructure in support of research laboratory groups
  • Biostatistics

Hydrology and Geochemistry

  • GDS: Geometabolomic Data System, a contribution-driven collaborative library for Dissolved Organic Matter (DOM) spectral data from the global hydrosphere.
  • HiMAT (NASA)
  • Dynamic Infomation Framework (DIF)(World Bank): Converting scientific modeling skill into actionable information in the public domain: Central and southeast Asia, central and south America


  • Epigenome imputation: From existing wet lab experiments infer relationships between particular proteins and cell types as a function of location on the human genome
  • Genetic architecture of autism
  • Metagenomics of methane-consuming microbial communities

Library science

  • With Suzzallo library: A pilot study for providing geospatial LIDAR data as a curated digital resource
  • The curation and provision of cloud-based resources for data analysis in specialized research domain communities

Molecular Engineering and Science

  • Peptide therapeutics research: Cloud-based scale computing using the Rosetta molecular folding analysis toolkit

Ocean science

  • LiveOcean: Ocean modeling forecast
  • Marine microbial ecology
  • Mesoscale eddie structure and correlation to marine life

Computer Science

  • Analysis of code fault detection: Student project
  • IOT: A design pattern and tutorial for using cloud-based support of Internet of Things implementations (NSF: Campus Cyberinfrastructure)
  • Data security on the cloud: A generic data system with automated and human protocols for working on sensitive data including elements of compliance with oversight regulations (NSF: Campus Cyberinfrastructure)
  • Scale on the cloud: See under Molecular Engineering and Science the protein folding case study (NSF: Campus Cyberinfrastructure)
  • Collaboration on the cloud: See case studies herein on GeoServer/THREDDS, on LIDAR, on Dynamic Information Frameworks and on HiMAT; thematically lightweight geospatial data system with the underlying theme of ‘access to data through pre-built frameworks, data APIs and minimal (non-redundant) software engineering. (NSF: Campus Cyberinfrastructure)

Mechanical and Civil Engineering

  • Computational fluid dynamics of hydrogen and methane combustion


  • Identifying stellar composition through spectral model superposition in nearby galaxies
  • Large Scale Synoptic Telescope (LSST) toolchain development


  • Implementation of GeoServer and a THREDDS server on the public cloud
  • Various data archival projects: Using the cloud for many 9s of reliability