Cloud Case Study Precis

Introduction

This page concisely describes public cloud use for research computing and data science at the University of Washington.

We track close to 100 research projects using the public cloud for computing across issues of access, time, compute power, storage, data management, cost and other issues. The summary below is organized by domain. Our work in consulting and advocating for use of the public cloud is integrated with the mission and operation of the UW eScience Institute.

CC*IIE Remarks

Objective and Approach

Solution

Results

Admonitions

  • Contact us regarding updates to this material
  • Focus here is topics; we try to preserve a degree of anonymity

Yun Zhao Guan: A cloud library – digital curation – in cooperation with Suzallo Library

General Systems and Tools

  • SQL Share: A system for managing, sharing and manipulating research data.
  • Myria: A distributed, shared-nothing Big Data management system and Cloud service from the University of Washington
  • IOT based on Arduino Yun leaf technology and cloud IOT endpoint services
  • Geohackweek and Neurohackweek: Hosting intensive workshops for learning and developing cloud-based tools and methods at UW

Student-driven research

The UW Student High Performance Computing Club has begun making cloud computing available to its members. This includes training and consulting on implementation as well as careful cost management and tracking. The following is a partial list of projects undertaken by students during the pilot phase of this program, spring 2017.

  • Epigenome imputation across a nucleotide-protein-cell tensor (Status: Successful completion)
  • Design of a high-reliability micropump for cooling high-heat semiconductors (In progress)
  • Novel peptide characterization of marine organic matter: insights into carbon cycling (In progress)
  • Characterizing the progression of three pathologies in ER electronic medical records (In progress)
  • Quora question pair intent comparison (In progress)
  • Novel peptide characterization of marine organic matter: insights into carbon cycling (In progress)
  • Schedular development and benchmarking for containerized bioinformatics workflows (UW Tacoma; in progress)
  • Empirical Studies of Docker Orchestration Tools for The Analyses of Big Biomedical Data (UW Tacoma; in progress)
  • Predictive models to optimize cloud computing using genomics data (UW Tacoma; in progress)
  • A Dynamic Scaling Engine in the Cloud (CSE; in progress)
  • LaraDB Experiments for the DARPA Graph Challenge (CSE; in progress)
  • Learning multiple outcomes with predictive coding (CSE; in progress)

Medical

  • Laboratory Medicine: Genome analysis and annotation (clinical oncology & co)
  • Clinical data availability for research
  • Data access and tool access for MRI- and EEG-based research
  • Gut biome metagenomics (Children’s Hospital)
  • Patterns in unexpected in-hospital mortality
  • Studies on Post-hospital-admission sepsis (blood infection)
  • Deep learning for patient behavior prediction: EEG data in relation to A/V transcripts of patient behavior
    • See above under Student research
  • Canine longitudinal aging studies
  • Biostatistics
  • Light-sheet microscopy for fast-turnaround biopsy analysis
  • Neuroimaging: Functional MRI
  • Neuroimaging: Visual cortex studies

Genomics and Biochemistry (not included in Medical above)

  • Epigenome imputation: See above under Student Research
  • Genetic architecture of autism
  • Metagenomics of methane-consuming microbial communities
  • Enzyme inhibition molecular structure
  • Peptide scaffolding enumeration and design: Large-scale computing using the Rosetta protein folding toolkit

Hydrology and Geochemistry

  • GDS: Geometabolomics Data System, a community library and reproducible workflow environment for molecular spectral analysis applied to naturally occurring Dissolved Organic Matter (DOM).
  • HiMAT (NASA): Atmosphere-land coupled analysis of the hydrological state and future of high mountain Asia
    • Hydrological studies and human impacts drawing from in situ, remote sensing, model, re-analysis and assimilation data and methods.
  • Dynamic Infomation Framework (DIF) (World Bank): Scientific hydrological expertise transferred into public information
    • In resource management and public safety domains the incorporation of scientific modeling is not well developed.
    • This program provides localized information building from a reproducible model of free and open access

Ocean science

  • LiveOcean: Ocean modeling forecast
  • Marine microbial ecology
  • Mesoscale eddie structure and correlation to marine life

Computer Science

  • Analysis of code fault detection: Student project
  • IOT: A design pattern and tutorial for using cloud-based support of Internet of Things implementations (NSF: Campus Cyberinfrastructure)
  • Data security on the cloud: A generic data system with automated and human protocols for working on sensitive data including elements of compliance with oversight regulations (NSF: Campus Cyberinfrastructure)
  • Scale on the cloud: See under Molecular Engineering and Science the protein folding case study (NSF: Campus Cyberinfrastructure)
  • Collaboration on the cloud: See case studies herein on GeoServer/THREDDS, on LIDAR, on Dynamic Information Frameworks and on HiMAT; thematically lightweight geospatial data system with the underlying theme of ‘access to data through pre-built frameworks, data APIs and minimal (non-redundant) software engineering. (NSF: Campus Cyberinfrastructure)

Mechanical and Civil Engineering

  • Computational fluid dynamics of hydrogen and methane combustion

Astronomy

  • Identifying stellar composition through spectral model superposition in nearby galaxies
  • Large Scale Synoptic Telescope (LSST) toolchain development

Geospatial

  • Implementation of GeoServer and a THREDDS server on the public cloud
  • Various data archival projects: Using the cloud for many 9s of reliability

Stubs and pending

  • IOT
  • Power consumption