Cloud Case Study Precis

Introduction

The purpose of this page is to concisely describe cloud computing / data science case studies at UW.

We are currently tracking over 75 projects/groups migrating their research computing to the public cloud. Below is a selected subset, organized roughly by domain. We emphasize that much of our work goes beyond simple migration to cloud platforms: It extends to data management techniques, reproducibility, implementing data services for collaboration and public distribution, and much more. All of this is in turn made possible by the UW eScience Institute.

Warnings

  • Contact us for updates to this material
  • Focus here is topics for a degree of anonymity
  • Projects are in *various stages of completion*

General Systems and Tools

  • SQL Share: A system for managing, sharing and manipulating research data.
  • Myria: A distributed, shared-nothing Big Data management system and Cloud service from the University of Washington
  • IOT based on Arduino Yun leaf technology and cloud IOT endpoint services
  • Geohackweek and Neurohackweek: Hosting intensive workshops for learning and developing cloud-based tools and methods at UW

Medical research

  • Laboratory Medicine: Cloud-based system for genome analysis: (oncology and related) clinical annotation
  • Crossing the clinical-to-research data barrier
  • Data access and tool access for MRI- and EEG-based research
  • Gut biome metagenomics (Children’s Hospital)
  • Patterns in unexpected in-hospital mortality
  • Deep learning for patient behavior prediction: EEG data in relation to A/V transcripts of patient behavior
  • Canine longitudinal aging studies
  • Cyberinfrastructure in support of research laboratory groups
  • Biostatistics

Hydrology and Geochemistry

  • GDS: Geometabolomic Data System, a contribution-driven collaborative library for Dissolved Organic Matter (DOM) spectral data from the global hydrosphere.
  • HiMAT (NASA)
  • Dynamic Infomation Framework (DIF)(World Bank): Converting scientific modeling skill into actionable information in the public domain: Central and southeast Asia, central and south America

Genomics

  • Epigenome imputation: From existing wet lab experiments infer relationships between particular proteins and cell types as a function of location on the human genome
  • Genetic architecture of autism
  • Metagenomics of methane-consuming microbial communities

Library science

  • With Suzzallo library: A pilot study for providing geospatial LIDAR data as a curated digital resource
  • The curation and provision of cloud-based resources for data analysis in specialized research domain communities

Molecular Engineering and Science

  • Peptide therapeutics research: Cloud-based scale computing using the Rosetta molecular folding analysis toolkit

Ocean science

  • LiveOcean: Ocean modeling forecast
  • Marine microbial ecology
  • Mesoscale eddie structure and correlation to marine life

Computer Science

  • Analysis of code fault detection: Student project
  • IOT: A design pattern and tutorial for using cloud-based support of Internet of Things implementations (NSF: Campus Cyberinfrastructure)
  • Data security on the cloud: A generic data system with automated and human protocols for working on sensitive data including elements of compliance with oversight regulations (NSF: Campus Cyberinfrastructure)
  • Scale on the cloud: See under Molecular Engineering and Science the protein folding case study (NSF: Campus Cyberinfrastructure)
  • Collaboration on the cloud: See case studies herein on GeoServer/THREDDS, on LIDAR, on Dynamic Information Frameworks and on HiMAT; thematically lightweight geospatial data system with the underlying theme of ‘access to data through pre-built frameworks, data APIs and minimal (non-redundant) software engineering. (NSF: Campus Cyberinfrastructure)

Mechanical and Civil Engineering

  • Computational fluid dynamics of hydrogen and methane combustion

Astronomy

  • Identifying stellar composition through spectral model superposition in nearby galaxies
  • Large Scale Synoptic Telescope (LSST) toolchain development

Geospatial

  • Implementation of GeoServer and a THREDDS server on the public cloud
  • Various data archival projects: Using the cloud for many 9s of reliability