Compute power in the cloud

Introduction

This page describes available processing power options on the public cloud. The basic rate per hour is taken to be on demand meaning that once you secure the machine you keep it for as long as you like. There are also pre-emptible instances which can be taken out of your control on short notice. These are available in lower quantity and at considerably reduced cost (40% to 20% or less of the on-demand rate typically.)

AWS Instance Types [Azure Virtual Machine types] (https://azure.microsoft.com/en-us/pricing/details/virtual-machines/linux/)

Basic Thesis

On cost: The more powerful the cloud VM the more it costs per hour; and obviously it will complete a given task more quickly; so there is the potential to benchmark different instance types to optimize. To first order however one can take them to be cost-equivalent and simply work empirically by timing your compute tasks.

On limits and runaway cost: Contact your cloud vendor and request a limit increase if you are unable to get the number of machines at once that you need. A limit is initially in place to prevent you from running up a huge bill accidentally via a typo in a configuration file. If for example your script requests 20 machines but you type ‘200’ you could find yourself spending thousands of dollars per hour. Another good way to incur these kinds of accidental charges is to allow your access keys to wind up on GitHub. So there are some pitfalls in using the cloud without knowing what you are doing; and there is consequently a learning curve. The first rule is ‘always test at small scale before scaling up’. The second rule is ‘know how to operate without putting your account access at risk of theft.’

On computing scale we do enjoy sharing this AWS case study using the Rosetta protein folding software.

GPU-based cloud instances

A comparison of $/GPU/hour on V100s gives preemptible rates of .93/.61/.84 dollars per GPU per hour for AWS, Azure and Google respectively. The on-demand rates are respectively 3.06/3.06/2.95 dollars per GPU per hour. These feature the current generation: NVIDIA Tesla V100 GPUs. Prior-generation GPUs (P100, P4, K80, M60) are also available at commensurately lower rates. These data are subject to change.

Note that data for Tensorflow Processing Units (TPUs) are still pending; available only on Google Cloud Platform.

Summary for high-end instances, 3 cloud providers

Vendor Instance $/GPU/hr preemptible $/GPU/hr on-demand Description
AWS p3.16xlg 0.93 3.06 32 core, V100 GPUs x 8
Azure NC24 v3 0.61 3.06 24 core, V100 GPUs x 4
Google n1-highmem-64 0.84 2.95 32 core, V100 GPUs x 8

Cost per hour (notice these prices are not scaled by number of GPUs)

Vendor Instance $/hr preemptible $/hr on-demand Description
AWS p3.2xl 1.36 3.06 4 core E5-2686 v4; 1 V100 GPU x (5120 CUDA + 640 Tensor cores)
  p3.8xl 3.77 12.24 16 core E5-2686 v4; 4 V100 GPU x (5120 CUDA + 640 Tensor cores)
  p3.16xl 7.47 24.48 32 core E5-2686 v4; 8 V100 GPU x (5120 CUDA + 640 Tensor cores)
  g3.4xl 0.35 1.14 8 core E5-2686 v4; 1 M60 GPU x (2048 cores, 8 GiB video memory)
  g3.8xl 0.68 2.28 16 core E5-2686 v4; 2 M60 GPU x (2048 cores, 8 GiB video memory)
  g3.16xl 1.37 4.56 32 core E5-2686 v4; 4 M60 GPU x (2048 cores, 8 GiB video memory)
  g3s.xl 0.23 0.75 2 core E5-2686 v4; 1 M60 GPU x (2048 cores, 8 GiB video memory)
Azure NC6 0.18 0.90 6 core 1 K80 GPU
  NC12 0.36 1.80 12 core 2 K80 GPU
  NC24 0.72 3.60 24 core 4 K80 GPU
  NC24r 0.79 3.96 24 core 4 K80 GPU with low latency high throughput network interface
  NC6 v3 0.61 3.06 6 core 1 V100 GPU
  NC12 v3 1.22 6.12 12 core 2 V100 GPU
  NC24 v3 2.45 12.24 24 core 4 V100 GPU
  NC24r v3 2.63 13.47 24 core 4 V100 GPU with low latency high throughput network interface
Google n1-highmem-8 0.84 2.95 4 core, 1 GPU V100
  n1-highmem-16 1.68 5.91 8 core, 2 GPU V100
  n1-highmem-32 3.36 11.81 16 core, 4 GPU V100
  n1-highmem-64 6.72 23.63 32 core, 8 GPU V100

General notes

  • Rates shown are for Linux. Windows users might find Azure pricing advantageous.

Microsoft Azure notes

  • Azure preemptible instances are ‘Low-Priority VMs’ available through Azure Batch Service

Google Cloud Platform notes

  • Add: Instance base rate (e.g. $0.10 / hr preemptible) plus per-GPU rate (e.g. $0.74 per Tesla V100 per hour preemptible)
  • vCPUs are counted as with AWS: Number of cores x 2 hyperthreads each

AWS notes

  • AWS uses ‘virtual CPUs’ or vCPU as a metric = 2 x number of cores via hyperthread
  • Preemptible prices (AWS Spot instances) subject to variability
  • p3 instances use V100 GPUs and Xeon E5-2686 v4 (Broadwell) processors; NVLink for GPU-GPU communication
  • p2 instances (not listed in the table above) use K80 GPUs and Xeon E5-2686 v4s; GPUDirect for GPU-GPU communication
  • g3 instances use NVIDIA Tesla M60 GPUS and Xeon E5-2686 v4 (Broadwell) processors

Topics for further elaboration

  • TPUs available from Google: How much do they run, performance comparison, development platform comparison
  • Threads, vCPUs, hyperthreading, blades…
  • Machine characteristics
  • Distinction between metal and various VM configurations
  • Testing guidelines
  • Overview of types and categories by vendor