Designing Compute Solutions in AWS – EC2 Auto Scaling Options

Imagine that you have a web application designed and operating on EC2 instances;you have followed best practices and have the application load spread across multiple AZs on multiple instances. The application works well with adequate performance for about 4,000 users. But last week the application was made available to 1,000 additional users, and the application performance started to slow down. Developers added some additional instances, and everyone was happy once again.

One of the National Institute of Standards and Technology (NIST) characteristics of the public cloud is rapid elasticity, defined as follows in NIST SP 800-145: “capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand.” Amazon.com, which is powered by the AWS cloud, relies on auto-scaling.

Auto-scaling is a mechanism that automatically allows you to increase or decrease your EC2 resources to meet the demand based off of custom-defined metrics and thresholds. Auto Scaling has three major components:

  • The Auto Scaling Group
  • The configuration templates
  • Scaling Options

The Auto Scaling service works by organizing your EC2 instances into groups. An Auto Scaling group is treated as a logical unit for scaling and management purposes. A group must have a minimum, maximum, and desired number of EC2 instances.

You should also be aware that the process of launching or terminating your instances is not done in an instant. There’s a certain lead time when you are launching brand new EC2 instances since AWS has to fetch the AMI, do the required configuration, run the user data that you included and install the custom applications that you specify. All of this must be completed before the instance can accept live incoming requests. This duration is also called “instance warm-up”.
An Auto Scaling group also has a setting called “cool down”, which is technically the interval between two scaling actions. This is the number of seconds that must pass before another scaling activity can be executed. This prevents conflicts in your Auto Scaling Group, where one activity is adding an instance while the other one is terminating your resources. You can also set up a termination policy to control which EC2 instances will be terminated first when a scale-in event occurs.

The automatic scaling of compute resources at AWS is dependent on monitoring the compute resources that need to scale using CloudWatch metrics and alarms. Amazon EC2 Auto Scaling uses scaling policies to determine when to scale your Amazon EC2 resources up or down. You can create simple scaling policies based on a single metric, such as CPU utilization, or you can create more complex policies that use multiple metrics based on the size of your Amazon EC2 fleet of instances. When a scaling policy is triggered, Amazon EC2 Auto Scaling takes a scaling action to either launch or terminate Amazon EC2 instances.

Amazon EC2 Auto Scaling groups are logical collections of Amazon EC2 instances that are managed as a single entity. Multiple scaling groups can be created with their own scaling policies and configurations.

Amazon EC2 Auto Scaling uses health checks to ensure that only healthy Amazon EC2 instances are used to serve traffic.

Keep in mind that auto-scaling groups are regional services and do not span multiple AWS Regions. You can configure them to span multiple Availability Zones since they were designed in the first place to help you achieve high availability and fault tolerance.

EC2 Auto Scaling works with three main components: a launch template or launch configuration, an Auto Scaling group, and a defined scaling policy.

Launch Configuration

A launch configuration is a simple template used by an ASG to launch EC2 instances. The process of creating a launch configuration is much like the process you would follow to manually launch an EC2 instance from the AWS Management Console.

Launch Templates

A launch template is similar to a launch configuration, with added features related to versioning an existing template, so you can make changes. In addition, launch templates support all new AWS features related to EC2 instances and Auto Scaling, whereas launch configurations do not.

Auto Scaling Groups

An Auto Scaling group (ASG) is built from a collection of EC2 instances that have been generated from the associated launch configuration or launch template. Each ASG launches instances, following the parameters of the launch template, to meet the defined scaling policy. An ASG can function independently or be associated with a load balancer.

Scaling Options for Auto Scaling Groups

Target tracking scaling policy

You can select the metric type and target value to maintain the desired capacity and automatically have any instances that are determined to be unhealthy (by the Auto Scaling health check or the load-balancing health check) replaced.

  • Simple scaling: You can increase and decrease the size of an ASG based on a single metric and automatically manage the size of the EC2 instances in the Auto Scaling group.
  • Minimum size: The minimum size is the minimum number of Amazon EC2 instances that you want to have running in your Auto Scaling group at any given time.
  • Maximum size: The maximum size is the maximum number of Amazon EC2 instances that you want to have running in your Auto Scaling group at any given time.
  • Desired capacity: The desired capacity is the number of Amazon EC2 instances that you want to have running in your Auto Scaling group at a given time.

Step Scaling

A step scaling policy enables you to define a series of steps, or thresholds, for a metric, and specify a different number of Amazon EC2 instances or capacity units to launch or terminate for each step. For example,

  1. A first instance is added when CPU utilization is between 40% and 50%.
  2. The next step adds two instances when CPU utilization is between 50% and 70%.
  3. In the third step, three instances are added when CPU utilization is between 70% and 90%.
  4. When CPU utilization is greater than 90%, a further four instances are added.

Simple scaling policies are bound to a cooldown period. Because a cooldown period is in force, even after the ASG is directed to launch an EC2 instance on a scale-out request, all scaling requests are ignored until the cooldown period finishes. The default cooldown period is 300 seconds; you can change this value when creating an ASG or at a later point if you wish to make modifications.

Termination Policy

When a scale-in event occurs, defined default termination policies control which EC2 instances are first terminated. Termination of unhealthy instances occurs first; then Auto Scaling attempts to launch new instances to replace the terminated instances.


Resources:
CloudAcademy – Designing Compute solutions in AWS
Mark Wilkins – AWS Certified Solutions Architect – Associate (SAA-C03) Cert Guide (Certification Guide)
Jon Bonso – AWS Certified Solutions Architect Associate SAA-C03-Tutorials Dojo