Overview
Teaching: 0 min Exercises: 25 minQuestions
What are the moving parts of cfncluster?
Objectives
Take a Fourier transform or other non-trivial task as a basis for running a cluster compute task
Cfncluster uses three machine types: A Launcher, a Master and one or more Workers. (You can also think about shadow Masters that jump in if the Master fails; which in turn points up the notion of a robust cluster that can carry on despite failures… which in turn leads to failure-driven strategies; cf the AWS Spot market.)
The Launcher is as the name implies just about getting things set up. It can as easily be your laptop but for the sake of ‘everything in the cloud’ we will set it up as a small EC2 instance. The Master is just what it sounds like: where the cluster intelligence resides. The cloud elasticity comes in how the Master initiates, manages, and closes out the various jobs and Workers. jobs are compute tasks that are managed by means of a queue and Workers are EC2 instances. Each Worker should have custom software pre-installed (if it is needed for jobs) so that time is not wasted. We will keep an eye on how this is done through the following process.
Let’s start with the classic How It Works diagram provided by AWS. We will talk through the various components before doing the easy part of walking through an example.
The guts of this thing are the four sub-boxes in Cluster Management Process; so if we understand how they function together we will be in good shape.
We begin with a cronjob that runs once every minute on your Master instance. A cronjob is a task that executes periodically on a Linux machine as part of the cron scheduler execution process. We note that Linux machines also run daemon processes which are always active but generally quiet/passive, waiting for some condition on the computer to trigger their active behavior.
You would think that a scheduler like cron all by itself could do everything we show here; and like most solution ideas in Linux it is possible but it would require considerable manual effort. Rather we use cfncluster working with a different scheduler called SGE to take advantage of an existing solution. This solution includes a lot of detail management that has arisen as cluster computing has evolved. One SGE advocate says ‘powerful, flexible, scalable, useful’. But note that other schedulers are available: SLURM, Toruqe, OpenLava, … so you have options.
Ok, so the cronjob of interest is publish_pending_jobs, expanded here:
Notice that this queries something called a queue manager so let’s define that next. We first expand on the scheduler we mentioned above: Sun Grid Engine or SGE. (A circa-2009 for dummies tutorial can be found here.)
The SGE Master node runs a qmaster daemon. The SGE Worker nodes run an execution daemon. The qmaster daemon waits for you the User to submit a job. This job then goes through a three state sequence: pending (sitting on the job queue), running, and then done. So we see that the queue manager is actually the SGE qmaster daemon.
How do you submit a job? In our example we will use the qsub command (which is short for queue submit of course):
qsub -V -b y -cwd /home/ec2-user/fourier 37 19
Clusters deployed within CfnCluster use the configure file to determine:
Two CloudWatch alarms are created. They monitor a custom Amazon CloudWatch metric that is published by the Master node of the cluster. This metric is called pending. These CloudWatch alarms call ScaleUp policies associated with the ComputeFleet ASG. This is what handles the automatic addition of compute nodes when there are pending tasks in the cluster: Nodes are added until the alarms stop or the max_queue_size is reached. Auto Scaling also has a Cloudwatch alarm to remove instances that are no longer needed; hence there is a corresponding ScaleDown policy.
Each Worker instance in the ComputeFleet ASG runs a process called nodewatcher. This process monitors the instance and if idle AND close to the end of the current hour the Worker is removed from the ComputeFleet ASG using an API call. Again the overall concept here is that the cluster scales itself back as long as it has adequate capacity for the current perceived work load and as long as it is at or above its configuration minimum. To this end there is a third boolean parameter maintain_initial_queue_size that can be set to true in the config file.
Here are the corresponding diagrams:
Notice that Simple Queue Service (SQS) is an important part of this flowchart. It is discussed further below as the fourth of our four daemon processes in CfnCluster. Before turning to SQS though let us dispense with the Nodewatcher as follows.
This is a straightforward IF AND statement that causes Workers to be terminated.
SQS is used in cfncluster as the online/offline messaging system. The sqswatcher daemon running on the CfnCluster Master monitors for SQS messages emitted produced by Auto Scaling: When an instance comes online it submits “instance ready” to SQS which is picked up by sqs_watcher running on the Master. These messages are used to notify the queue manager (in our case the SGE qmaster daemon) that new instances are online. The reverse is also true: When instances are terminated the SQS is used to notify the Mater that they are no longer available.
These steps depend upon some pre-configuration, already done prior to the class:
In what follows you will want to have a Project Identifer Tag or PIT handy. This is just a short ID string that you will use to tag everything you create. Mine PIT = ‘kilroy’. Herein I use PIT or kilroy interchangeably.
We will…
Some of the key terms involved:
You now log in to your cfncluster Launcher using ssh (or PuTTY on Windows) where your login name is ‘ec2-user’. You do not enter a password as you are using your .pem (or .ppk) file to authenticate.
Once logged in you will update your machine, install the cfncluster tools, configure cfncluster and create a new named cluster. This in turn will lead to the last steps to run a large-scale compute task.
On your EC2 Launcher:
% sudo yum update -y
% sudo pip install --upgrade cfncluster
% cfncluster
We run configure as a sub-command of the cfncluster command. This creates a configuration file that governs the behavior of our subsequent cluster computing using this utility. Hence cfncluster configure produces a config file within the .cfncluster directory in your home directory. (Use ‘ls -al’ to see that this exists.) A good way to get the configure steps correct is to follow the details at a web page like this one.
You will need to have a copy-able view of your IAM User credentials file available for these config steps:
% cfncluster configure
<here is a guide to the answers to the ensuing dialog>
Cluster Template [default]: <hit enter>
AWS Access Key ID []: <cut and paste your Access Key Id string from your credentials file>
AWS Secret Access Key ID []: <cut and paste your Secret Access Key>
AWS Region ID []: <type in us-west-2 which is Oregon>
VPC Name [public]: <hit enter>
Key Name []: <enter your keypair name as in 'kilroy101'>
VPC ID []: <enter the cloud101_vpc ID which is 'vpc-34014a53'>
Master Subnet ID []: <enter the only choice, subnet-90382ac8>
At this point you can re-run cfncluster configure and see it allows you to default-enter your way through the same process or change values if you like. You can also examine the two files in the .cfncluster directory: config and cfncluster-cli.log.
The config file contents are interesting and include only a fraction of the complete config vocabulary. For a listing of this vocabulary see the Configuration page published by AWS.
Finally we are ready to create the cfncluster. This will spin up an EC2 instance as our Master node also referred to as the head node. It will be listed by default as Name = Master in the EC2 table.
% cfncluster create n101
n101 is simply the name of the cluster. Creation takes several minutes will launch the Master node as noted. By default this will be a small EC2 instance (T2).
On the AWS console in your browser you can monitor your progress using the CloudFormation service.
Once the Master is up: Log in using the same procedure as for the Launcher and do an update
sudo yum update -y
You have a running cfncluster including a Master node where you are logged in. Here you configure your processing task. Once that is done you can set up a script file that submits a number of processes to the processing queue using the Linux command qsub. These will be placed on a queue that we can view using the command qstat.
Let’s begin by writing a slightly non-trivial C program and compiling that using the gcc compiler. This program performs part of a Fourier transform on a simple dataset. The idea would be to bring these results together. To simulate a more extended task there is a built in 3.5 minute ‘sleep’ at the beginning of the program execution.
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <unistd.h>
#define MAX_N 32768
void main(int argc, char **argv)
{
sleep(210); // sleep for 210 seconds
if (argc < 3) { printf("\nfourier N i: i'th coeff of an N-element signal\n\n\n"); exit(0); }
int N = atoi(argv[1]); int i = atoi(argv[2]);
if (N < 0 || N > MAX_N || i < 0 || i >= N) { exit(0); }
int n;
double s[MAX_N], ar = 0.0, ai = 0.0, pi = acos(-1.0), dN = (double)N, di = (double)i;
double dShft = (double)(N/2), dScl = (2.0*pi*5.0)/dShft, dGScl = 2.0/dShft;
for (n = 0; n < N; n++) { // signal generator block
double x = ((double)n - dShft) * dScl; // ... to a number on [-5 * 2pi, 5 * 2pi]
double xg = ((double)n - dShft) * dGScl; // ... also to a number on [-2, 2]
s[n] = exp(-xg*xg)*sin(x); // ... and compile their product into the signal vector s[]
}
for (n = 0; n < N; n++) { // FT block
double exp_arg = -2.0*pi*((double)n/dN)*di; // argument of the exponential
double real_n = cos(exp_arg); // real component of the exponential
double imag_n = sin(exp_arg); // imag component of the exponential
ar += s[n]*real_n; // accumulate
ai += s[n]*imag_n; // "
}
printf ("\ncoefficient %d = (%lf, %lf).\n\n\n", i, ar, ai); exit(0);
}
Compile this program:
gcc -o fourier fourier.c -lm
The executable resides in the same location, by default /home/ec2-user.
Now we better save this image of the Master so that our Workers (Compute nodes) will know how to run fourier without compiling it. The good news is that the Compute nodes are built as images of the Master node so they will already know how to run fourier.
Here is a script that submits multiple fourier jobs to the job queue; I named it ‘go.sh’:
qsub -V -b y -cwd /home/ec2-user/fourier 37 0
qsub -V -b y -cwd /home/ec2-user/fourier 37 1
qsub -V -b y -cwd /home/ec2-user/fourier 37 2
qsub -V -b y -cwd /home/ec2-user/fourier 37 3
qsub -V -b y -cwd /home/ec2-user/fourier 37 4
qsub -V -b y -cwd /home/ec2-user/fourier 37 5
qsub -V -b y -cwd /home/ec2-user/fourier 37 6
qsub -V -b y -cwd /home/ec2-user/fourier 37 7
qsub -V -b y -cwd /home/ec2-user/fourier 37 8
qsub -V -b y -cwd /home/ec2-user/fourier 37 9
qsub -V -b y -cwd /home/ec2-user/fourier 37 10
qsub -V -b y -cwd /home/ec2-user/fourier 37 11
qsub -V -b y -cwd /home/ec2-user/fourier 37 12
qsub -V -b y -cwd /home/ec2-user/fourier 37 13
qsub -V -b y -cwd /home/ec2-user/fourier 37 14
qsub -V -b y -cwd /home/ec2-user/fourier 37 15
qsub -V -b y -cwd /home/ec2-user/fourier 37 16
qsub -V -b y -cwd /home/ec2-user/fourier 37 17
qsub -V -b y -cwd /home/ec2-user/fourier 37 18
qsub -V -b y -cwd /home/ec2-user/fourier 37 19
qsub -V -b y -cwd /home/ec2-user/fourier 37 20
qsub -V -b y -cwd /home/ec2-user/fourier 37 21
qsub -V -b y -cwd /home/ec2-user/fourier 37 22
qsub -V -b y -cwd /home/ec2-user/fourier 37 23
qsub -V -b y -cwd /home/ec2-user/fourier 37 24
qsub -V -b y -cwd /home/ec2-user/fourier 37 25
qsub -V -b y -cwd /home/ec2-user/fourier 37 26
qsub -V -b y -cwd /home/ec2-user/fourier 37 27
qsub -V -b y -cwd /home/ec2-user/fourier 37 28
qsub -V -b y -cwd /home/ec2-user/fourier 37 29
qsub -V -b y -cwd /home/ec2-user/fourier 37 30
qsub -V -b y -cwd /home/ec2-user/fourier 37 31
qsub -V -b y -cwd /home/ec2-user/fourier 37 32
qsub -V -b y -cwd /home/ec2-user/fourier 37 33
qsub -V -b y -cwd /home/ec2-user/fourier 37 34
qsub -V -b y -cwd /home/ec2-user/fourier 37 35
qsub -V -b y -cwd /home/ec2-user/fourier 37 36
Start the jobs (place them on the queue) on the Master node by issuing this command:
% source go.sh
Notice that ‘qhost’ produces an empty listing at the outset: Just the ‘global’ process (which kilroy thinks is the Master)
% qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT etcetera
------------------------------------------------------------------------------------------------
global - - - - - - -
After the queue has run for a few minutes this qhost command should expand, as in:
% qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
ip-172-31-13-176 lx-amd64 1 1 1 1 0.01 995.4M 137.1M 0.0 0.0
ip-172-31-3-231 lx-amd64 1 1 1 1 0.04 995.4M 135.9M 0.0 0.0
ip-172-31-6-55 lx-amd64 1 1 1 1 0.01 995.4M 136.3M 0.0 0.0
The qstat command will provide a view of the queue:
% qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
82 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
83 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
84 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
85 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
86 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
87 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
88 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
89 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
90 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
91 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
92 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
93 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
94 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
95 0.55500 fourier ec2-user qw 04/04/2017 18:53:53 1
.
.
.
(and so on)
.
.
This queue listing will gradually get shorter as the tasks complete. As they complete they will write both standard output and standard error files into the /home/ec2-user directory. The standard output is print statement in the C program giving the Fourier coefficient value.
As an alternative ‘hello world’ approach to running cfncluster tasks you can edit a simple bash script file ‘hello.sh’ to read as follows:
#!/bin/bash
sleep 4
echo "hello world shout out from $(hostname)"
Make this script executable and submit it to the queue
% chmod a+x hello.sh
% qsub hello.sh
All of these example jobs run quickly (in minutes). When the script runs to completion qstat will again show an empty queue. The Compute (Worker) node will stay available for the balance of an hour since that is the billing time unit. I paid for it so I will hang on to it. It then evaporates back into the instance pool if it is not doing anything.
As these tasks run: Where is the output going? We have a shared filesystem and the Workers are writing to the home directory on the Master node.
To connect this back to the Launcher: The cluster behavior is controlled by the cfncluster configure process which is described extensively in the CfnCluster config documentation. The directory on Launcher is where the configuration file resides is .cfncluster where we find two files: config and cfncluster-cli.log. The latter is the log file produced by prior cluster activity.
Notice that the Launcher has nothing whatever to do with the cluster compute task. It just builds the cluster entity; and we have to further configure that cluster to do the compute task.
Here is a .cfncluster/config file with an Elastic Block Storage (EBS) element of interest. Recall this file sits on the Launcher instance.
[aws]
aws_region_name = us-west-2
[cluster default]
vpc_settings = public
key_name = MPIPlay
ebs_settings = helloebs
shared_dir = /hello
ec2_iam_role = hello_world
initial_queue_size = 5
max_queue_size = 5
[vpc public]
master_subnet_id = subnet-37513940
vpc_id = vpc-a60a4ec3
[global]
update_check = true
sanity_check = true
cluster_template = default
[ebs helloebs]
ebs_snapshot_id = snap-f67581a4
This opens up the prospect of attaching some fast-access memory to each Compute instance as it comes online. This and many other details make cloud computing a mix of Wild West, art and computational results.
Key Points
Each cloud has its approach to cluster computing; this is a common example.