Automated infrastructure deployments with CloudFormation

If you are like me (and many other people in the industry), I am sure you can relate to this: You get your hand on a new toy (like CloudFormation) and just want to get going. Therefore you ignore all the on-line help material that is kindly written and published by a myriad of technical writers. Instead you just ‘borrow’ a number of existing scripts (or templates in this case) from the web and tweak them for your purpose. Until you hit a dead end. Where something doesn’t work as expected and you need to start to debug the work you have done borrowed.

This is likely the time where you wish you had a better understanding of the inner workings of a tool. I have used CloudFormation for quite some time now. Looking back at my path of enlightenment I do remember a number of items I wish I had understood better or paid some more attention at the time. For that reason I would like to share a few nuggets that will provide new starters on the topic a somewhat flattened learning curve and provide an outlook on the opportunities and challenges that follow on your first discoveries with CloudFormation templates.

Open your mind to automation

From the standpoint of new adopter of AWS cloud services you may be tempted to disregard CloudFormation templates as a time waster: Automating the deployment of a vanilla EC2 instance with CloudFormation doesn’t save you any time over the manual provisioning of the resources through the AWS Console.
However, we need to be careful to not cheat ourselves here. In the ‘old’ days, when did you ever have to provision a bare metal server containing the core operating system only? Never! One always had to touch the machine to install or at least configure the application stack, database environment, etc. You name it. At this time it was also commonly accepted that the process of ‘crafting’ a new server instance was taking days if not weeks. And this doesn’t even include the time for ordering and delivery.

Also, when was the last time that you only ever needed one of each type? That must have been at one of our side projects where development, test and production was all based on a single code base running on a server hidden somewhere in the attic. Today you tend to require at least four environments to facilitate the software development life cycle. For good measure you probably also want to add one more for a BlueGreen Deployment.
This is the time where you (should) start to manage your infrastructure as code. And at the same time automation becomes the king of the kingdom.

AWS provides a good variety of helper scripts that come pre-installed with all Amazon provided machine images or as executables for the installation on your own images.
In combination with the instructions you provide within the stack templates those automation scripts enable you to deploy an entire infrastructure stack with the click of a few buttons – unless of course you even automate this bit through the use of the CloudFormation API.
Understanding the interdependencies and differences between the various sections of the template and automation scripts helps you with the successful development your stack.

CloudFormation init (cfn-init)

The most commonly used script, aside from cloud-init (but more on this in a later post) would unarguable be cfn-init (CloudFormation init).
CloudFormation init (cfn-init) reads and processes the instructions provided insight the instance metadata as part of the CloudFormation template.
To run cfn-init you need to call it from insight the user data instructions or as part of any of the start-up processes of your own image.

You might like to know that the user data instructions are ‘magically’ executed by cloud-init. Cloud-init is an open source package that is widely used for the bootstrapping of cloud instances and we can dive deeper into this tool set in another post.

Cfn-init accepts a number of command line options. As a minimum you need to provide the name of the CloudFormation stack and the name of the element that contains the instance metadata instructions.

/opt/aws/bin/cfn-init -v --stack YourStackName --resource YourResourceName

This either is the launch configuration or an EC2 instance definition inside the CloudFormation template.
It is important for you to understand that the instance itself does not get ‘seeded’ with the template instructions as part of the launch. In fact: the instance itself has no appreciation of the fact that its launch was initiated by CloudFormation. Instead, the cfn-init script reaches out to the public CloudFormation api endpoint to retrieve the template instructions. This is important to realize if you are launching your instance inside a VPC that has no connectivity to the internet or provides the connectivity via a proxy server that requires explicit configuration.

Configuration sets

CloudFormation init instructions can be grouped into multiple configuration sets.
I strongly suggest you take advantage of this functionality to support the separation of concerns and enable reuse of template fragments.
With its procedural template instructions, CloudFormation doesn’t necessarily support DRY coding practices – and neither is it supposed to do so.
However, if your set-up requires a common set of applications or configurations installed and configured on each instance (think anti-virus, compliance or log forwarding agents, etc.), you are well placed to keep those parts separated in their own configuration set. In combination with a centralised source control management system or an advanced text editor like sublime or notepad++, you can then fairly easy maintain and quickly re-use those common parts of your stack.
Bear in mind though that this isn’t the only solution to ensure common components are always rolled into the stack. In a previous post I have written about the advantages and trade-offs for scripted launches of instances vs the use of pre-baked, customised machine images.
Above solution doesn’t scale well for larger environments. If you want to automate your infrastructure across tens or hundreds of templates, you will soon hit the limits. As your environment requires patching, and you start re-factoring your code fragments, you need to ensure that every stack in your environment is kept up-to-date.
Once you have reached that point, you will start to investigate the use of continuous integration solutions that hook into AWS for a more automated management of stacks across multiple environments.

Be mindful of alternatives

Which leads me nicely to my closing words. I am sure everyone is aware of the popular saying that goes along the line of:

‘if all you have has a tool is a hammer,
everything looks like a nail’.

Rest assured that your infrastructure and deployment solution is subject to the same paradigm. When I started with scripted deployments in AWS I made good use of the user data script. I partioned the various steps within my script and made it re-usable. I split it up into individual bash or powershell scripts that I deployed them to the instances and called them from within the user data or cascaded them amongst each other. And felt very clever! Until my fleet of instances started increasing. Therefore I discovered that a lot of the effort managing the fleet could be saved in using CloudFormation. As part of this the instance definitions moved to CloudFormation Init metadata and provided me with additional flexibility. CloudFormation Init then allowed me to define in a declarative way what actions I wanted to perform on an instance and in which order – much alike the YAM based cloud-init configuration, but at the scale of a whole stack, not just a single instance. No longer did I have to navigate to a specific directory, download a RPM package using wget or curl, install it using the package manager, ensure the application is started at boot time, and so on. Instead I can just provided declarative instructions inside one or more of the 7 supported configuration keys.
As discussed earlier, I yet again felt very smart, I started to organise my individual declarative instructions in configuration sets, managed them in a central repository for re-use, etc. Until, well, you can probably already guess it by now: until I discovered that it is worth considering the use of AWS Opsworks and Elastic Beanstalk resources inside CloudFormation stack. AWS Opsworks abstracts your configuration instructions further away from the declarative configuration in the init metadata. Using a managed Chef service you have access to a large variety of pre-defined recipes for the installation and configuration of additional components of your system.
Since those recipes are continuously maintained and updated by the wider community you don’t need to re-invent the wheel over and over again.
While I do have to admit that the re-invention of the wheel has served humankind quite well to date (imagine our cars would still use stone wheels), it is quite obvious that the wisdom and throughput of a whole community can be much higher then the capability of an individual.
The same can be said about Elastic Beanstalk. Where OpsWorks helped you to accelerate the deployment of common components, Elastic Beanstalk allows you to automate the resilient and scalable deployment of your application into the stack without you even having to describe or configure the details for load balancing and scaling.

In summary

The point I would like to make is that in a world where “the slow eats the fast”, we can never settle at a given solution at any given time. Our whole community, including AWS, is constantly evolving to allow organisations to innovate, develop and ship features at an ever increasing rate. This is achieved in the continuous abstraction away from the core underlying infrastructure and services and the combination of traditional features with new functionality and innovation.
To stay on top of the game as an IT professional there is the need to constantly challenge the status quo and, where applicable, make the leap of faith to investigate and learn new ways of doing our business.

Architecting on AWS: Optimising the application design

In our practice we hear a variety of misconceptions and misinterpretations in relation to the benefits of moving workloads ‘into the cloud’. You should be very vary if someone wants to make you believe that the pure migration of a traditional application to a cloud services vendor will make it any more scalable or reliable. Of course, you can scale vertically in increasing the size of your compute nodes. However, this still restricts you to the maximum size of instances available. Scaling horizontally on the other hand, in distributing your workload over multiple instances, requires special considerationswith regard to optimizing the application design.

As part of your migration strategy you should also critically review your existing application components and consider if any of the high level capabilities provided by your cloud vendor can deliver the same functionality at reduced cost, increased reliability, higher flexibility, et cetera. This story from marketing software provider Moz also serves as a timely reminder that your mileage may vary depending on the type of your workload and non-functional requirements. Therefore we can only provide a general guide as each option needs to be considered against its advantages and disadvantages against the overall solution.

Please note that we tend to angle our posts along the service catalogue of Amazon Web Services (AWS). However, most of the strategies and patterns we describe will also apply to other cloud vendors that provide capabilities with similar characteristics. With that in mind, let’s review our two tier example application from the first post in this series. To support the elasticity and reliability of our solution, we should consider a few concerns from the application perspective.

Optimizing the application design: session management

In the design we are using the Elastic Load Balancer (ELB) for the distribution of load between instances. By default, the ELB routes each request to the instance with the smallest load. This is also referred to stateless load balancing and needs to be considered in the design of your application. Unless you are having strong reasons to store your session state in memory on the web server, you should migrate to an external session provider. A common approach for this is the use of a session table within a relational database system. However, this doesn’t come without risks as the database system may be unnecessary flooded with session requests. This is potentially impacting on the overall performance of the application and causing scalability issues in the database tier.

On AWS you are better off to use DynamoDB. Amazon DynamoDB is a key-value data store (with recently added support for document data model) which delivers configurable, predictable performance. For that reason DynamoDB is an ideal candidate to consider for the management of session state. As a fully managed service you won’t even have to worry about any operational or administrative cost.

Optimizing the application design

A word on RDS and scaling

If you are an observant reader you may have already spotted a snag. If you really want your overall solution to be truly scalable, you need to ensure that this is applied to each tier of the application. This is posing a couple of issues in the traditional relational database space. While you have the ability to ‘crank-up’ the amount of provisioned IOPS for your RDS instances, there is no ability to autoscale database instances in the same fashion as EC2.

A common design pattern is the separation of write operations from read operations. By the general nature of storage systems read operations tend to be proportionally higher than database write operations. For that reason you can offload read requests to a dedicated (set of) read replicas to provide some level of scalability. Based on the underlying design limitations within the storage engines though, this cannot be provided fully automated. Amongst other limitations we also need to highlight that the support of read replicas in RDS is limited to MySQL, PostgreSQL and the new Aurora storage engines.

Alternative database technologies

All this is obviously not going to be an issue for you if you are dealing with very steady and predictable workloads. If you do have the need to deliver a scalable solution though, you will eventually get to the stage where you need to consider alternative mechanisms to reduce the load on your relational database environment. An obvious choice would be the consideration of alternative database technologies like Amazon DynamoDB or Amazon SimpleDB for the data that doesn’t require a relational structure.

Content distribution

You can also reduce the strain on your application and database services in employing caching services like Amazon CloudFront. As described earlier, CloudFront provides a large number of edge locations across the globe that act like a massive cache for web and streaming content. The cache behaviour settings allow you to optimise the cache behaviour for the unique needs of your application. As an added bonus this will also improve the overall user experience for your customers.

Optimizing the application design

Object storage

Finally we briefly want to touch on the Amazon S3 storage service. Again, many traditional application designs make us of relational or file system resources for the storage of BLOB objects. While you can certainly continue on that path, we recommend to rethink that approach for a number of reasons. For one, you obviously need to continue to provide and manage your own file system or relational database environment. You also have to ensure that the systems are always up-to-date, ensure that you have got enough available disk space available and the systems are actually available to meet your service level agreements.

If those operational reasons haven’t put you off yet, you may want to consider the actual costs for storage. Based on the figures for my ‘home’ region Sydney, the cost of storing 100 GByte of data on S3 is approximately 30% of the cost of Elastic Block Storage or RDS. And the pure cost of storage doesn’t even include the cost for utility compute to power the relational database or file system environments. So unless there is a specific need for a direct attached, high performing local disk e.g. for the hosting of a COTS solution like SAP, we strongly recommend to consider the use of the S3 object store where applicable.

With those initial teasers in mind you should start exploring the AWS service catalogue and our rich training content on CloudAcademy to consider what other services you could utilise to address the unique concerns of your solution.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.

Architecting on AWS: utilising elastic compute

Our last post in this series has provided you with an overview of our example architecture on AWS. In this post we are going into some more detail in focusing on elasticity using AWS EC2 (Elastic Compute Cloud), and in particular we will see how to use AutoScaling to make your computing infraastructure elastic and highly available.

But what is that elasticity thing that people keep on going on about? According to Wikipedia elasticity is defined as “the degree to which a system is able to adapt to workload changes by provisioning and deprovisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible.”
This is different to scalability, or, if you like, a specialisation of scalability. Scalability provides the ability to increase (or decrease) the amount of resources in scaling up (more powerful instances) or out (additional instances), which is usually done through manual intervention. Elasticity does the same but in an autonomic manner, independent from human interaction.

But what does that mean for EC2? Sometimes EC2 instances only tend to be considered as virtual machines that are hosted in the cloud. However, this doesn’t take into account the auxiliary services that come as part of EC2. Therefore it is missing one key enabler to elasticity as defined above: AutoScaling.
AutoScaling is the ‘magic’ ingredient that allows a system that is hosted on EC2 to dynamically adapt to changes in demand. But how does it actually work?

How AutoScaling works

AutoScaling has two components: Launch Configurations and Auto Scaling Groups.

  • Launch Configurations hold the instructions for the creation of new instances. The instructions describe what type of instance AutoScaling needs to launch (e.g. t2.medium, m3.large), what Amazon Machine Image (AMI) the new instance is going to be based on, what roles or what storage is going to be associated with the instance, and so on.
  • Scaling Groups on the other hand manage the scaling rules and logic, which are defined in policies. Those could be based on schedule or CloudWatch metrics. The CloudWatch service allows you to monitor all resources and applications that you have deployed on AWS. CloudWatch allows you to define alarms on metrics, which the AutoScaling policies subscribe to. Through the use of metrics you can for example implement rules that elastic scale your environment based on performance of your deployed instances or traffic volumes on the network.

This doesn’t have to be the limit though. Since CloudWatch is collecting metrics from each and every resource deployed within your environment you can choose a variety of different sources as inputs to your scaling events. Assume you have deployed an application on EC2 that is processing requests from a queue like the Simple Queuing Service. With CloudWatch you can monitor the length of the queues and scale your computing environment in or out based on the amount of items in queue at the time. And since CloudWatch also supports the creation of custom metrics through the API, you can actually use any of your application logging outputs as a trigger for utility compute scaling events.

How to use AutoScaling to achieve elastic computing

Ignoring CloudWatch you can also use the AutoScaling APIs to amend your scaling configuration, trigger scaling events or define the health of an instance. Defining the health status of your instances allows you to go beyond the internal health checking that is done by AutoScaling, which is basically just confirming whether an instance is still running or not. As part of your internal application logic, you could set the health status as a result of certain error conditions. Once set to unhealthy, AutoScaling will take the instance out of service and spin up a fresh new instance instead.

Auto Scaling can also have a use outside of the traditional elasticity needs. Auto Scaling is commonly used in smaller environment to ensure that no less than a certain amount of instances are running at any point in time. So if you are just starting up with that flash new application that no one knows about just yet, or you are deploying an internal facing business application, it is still good practice to make those instances part of an Auto Scaling group. This brings a number of advantages with it.

Firstly and most importantly: you are forcing yourself to design your application in a way that lends itself to the paradigm of disposable infrastructure. Therefore you will ensure that no state or data is ever going to be stored on the instance.

Secondly, you ensure that the launch of a new instance is fully automated. While you may not yet start to use configuration management tools like Chef, Puppet or PowerShell DSC, you will set yourself on the right path in either maintaining a ‘master’ AMI image or make use of the default AMIs in combination with bootstrapping through the instance’ user data.

Finally, with the first two strategies implemented, you are ready to scale your environment in case that your idea becomes the hype of the month.

Summary

In summary we have provided you with a variety of examples that allow you to understand the use of elasticity and scalability in relation to EC2 and provided you with a summary of the services involved.

For scaling, particular using elastic scaling you need to be conscious about the other services in your environment that form part of your solution. For example you may need to consider whether your relational database can continue to respond to the increasing in demand from the additional web or application servers. If you are utilising the Elastic Load Balancer (ELB) to distribute the load between your instances, you need to be aware that the ELB is also designed as an elastic service, which is based on EC2. For huge spikes in demand unfortunately you don’t quite get the elasticity you would wish for. As you are ‘warming-up’ your own environment in spinning up new instances in anticipation for an expected increase in demand (e.g. through the launch of a marketing campaign), you are best to also contact the AWS support in advance of the expected spike to ensure that the ELB is ready to respond to the demand immediately.

You can learn more how to design a scalable and elastic infrastructure on AWS using the courses that are available from CloudAcademy. In particular, you might benefit from watching our course “How to Architect with a Design for Failure Approach“, where AutoScaling is used to help achieving high availability and fault-tolerance in a common architecture.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.

Architecting on AWS: the best services to build a two-tier application

The notion of a scalable, on-demand, pay-as-you go cloud infrastructure tends to be easy understood by the majority of today’s IT specialists. However, in order to fully reap the benefits from hosting solutions in the cloud you will have to rethink traditional ‘on-premises’ design approaches. This should happen for a variety of reasons with the most prominent ones the design-for-costs or the adoption of a design-for-failure approach.

This is the first of a series of posts in which we will introduce you to a variety of entry-level AWS services on the example of architecting on AWS to build a common two-tier application deployment (e.g. mod_php LAMP). We will use the architecture to explain common infrastructure and application design patterns pertaining to cloud infrastructure.
To start things off we provide you with a high level overview of the system and a brief description of the utilised services.

Architecting on AWS

Virtual Private Cloud (VPC)

The VPC allows you to deploy services into segmented networks to reduce the vulnerability of your services to malicious attacks from the internet. Separating the network into public and private subnets allows you to safeguard the data tier behind a firewall and to only connect the web tier directly to the public internet. The VPC service provides flexible configuration options for routing and traffic management rules.  Use an Internet Gateway to enabls connectivity to the Internet for resources that are deployed within public subnets.

Redundancy

In our reference design we have spread all resources across two availability zones (AZ) to provide for redundancy and resilience to cater for unexpected outages or scheduled system maintenance. As such, each availability zone is hosting at least one instance per service, except for services that are redundant by design (e.g. Simple Storage Service, Elastic Load Balancer, Rote 53, etc.).

Web tier

Our web tier consists of two web servers (one in each availability zone) that are deployed on Elastic Compute Cloud (EC2) instances. We balance external traffic to the servers using Elastic Load Balancers (ELB). Dynamic scaling policies allow you to elastically scale the environment in adding or removing web instances to the auto scaling group. Amazon Cloud Watch allows us to monitor demand on our environment and triggers scaling events using Cloud Watch alarms.

Database tier

Amazon’s managed Relational Database Service (RDS) provides the relational (MySQL, MS SQL or Oracle) environment for this solution. In this reference design it is established as multi-AZ deployment. The multi-AZ deployment includes a standby RDS instance in the second availability zone, which provides us with increased availability and durability for the database service in synchronously replicating all data to the standby instance.
Optionally we can also provision read replicas to reduce the demand on the master database. To optimise costs, our initial deployment may only include the master and slave RDS instances, with additional read replicas created in each AZ as dictated by the demand.

Object store

Our file objects are stored in Amazon’s Simple Storage Service (S3). Objects within S3 are managed in buckets, which provide virtually unlimited storage capacity. Object Lifecycle Management within an S3 bucket allows us to archive (transition) data to the more cost effective Amazon Glacier service and/or the removal (expiration) of objects from the storage service based on policies.

Latency and user experience

For minimised latency and an enhanced user experience for our world-wide user base, we utilise Amazon’s CloudFront content distribution network. CloudFront maintains a large number of edge locations across the globe. An edge location acts like a massive cache for web and streaming content.

Infrastructure management, monitoring and access control

Any AWS account should be secured using Amazon’s Identity and Access Management (IAM). IAM allows for the creation of users, groups and permissions to provide granular, role based access control over all resources hosted within AWS.
The provisioning of above solution to the regions is achieved in using Amazon CloudFormation. CloudFormation supports the provisioning and management of AWS services and resources using scriptable templates. Once created, CloudFormation also updates the provisioned environment based on changes made to the ‘scripted infrastructure definition’.
We use the Route 53 domain name service for the registration and management of our Internet domain.

In summary, we have introduced you to a variety of AWS services, each of which has been chosen to address one or multiple specific concern in regards to functional and non-functional requirements of the overall system. In our upcoming posts we’ll investigate a number of above services in more detail, discussing major design considerations and trade-offs in selecting the right service for your solution. In the meantime you can start to learn more about the individual AWS services using the courses that are available from CloudAcademy.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.