Amazon S3 vs. Amazon Glacier: a simple backup strategy in the cloud

When you start out to design your first application for the hosting on AWS (Amazon Web Services) you will eventually end up considering your options for the protection of your and your customers’ data against accidental losses.
While you may have designed a highly resilient and durable solution, this does not necessarily protected you from administrative mishaps, data corruption or malicious attacks against your system. This can only be mitigated with an effective backup strategy.
Thanks to Amazon’s Simple Storage Service (S3) and its younger sibling Amazon Glacier you have the right services at hand to establish a cost effective, yet practical backup solution.

Within Amazon S3 data is managed as individual objects. This is contrary to Amazon’s Elastic Block Store (EBS) or the local file system of your traditional PC, where data is managed in a directory hierarchy.
The abstraction, away from the lower layers of storage, and the separation of data from its metadata come with a number of benefits. For one, Amazon can provide a highly durable storage service for the fraction of the cost of block storage. You also only pay for the amount of storage you actually use. Therefore you don’t need to second-guess and pre-allocate disk space upfront.

Hierarchical storage with AWS Glacier

Lifecycle rules within S3 allow you to manage the life cycle of the objects stored on S3. After a set period of time you can either have your objects automatically delete or archived off to Amazon Glacier.

AWS S3 LifeCycle

Amazon Glacier is marketed by AWS as “extremely low cost storage”. The cost per Terrabyte of storage and month is again only a fraction of the cost of S3. Amazon Glacier is pretty much designed as a write once and retrieve never (or rather rarely) service. This is reflected in the pricing, where extensive restores come at a additional cost and the restore of objects require lead times of up to 5 hours.

Amazon S3 with Glacier vs. Amazon Glacier

At this stage we need to highlight the difference between the ‘pure’ Amazon Glacier service and the Glacier storage class within Amazon S3. S3 objects that have been moved to Glacier storage using S3 Lifecycle policies can only be accessed (or shall I say restored) using the S3 API endpoints. As such they are still managed as objects within S3buckets, instead of Archives within Vaults, which is the Glacier terminology.

This differentiation is important when you look at the costs of the services. While Amazon Glacier is much cheaper than S3 on storage, charges are approximatey ten times higher for archive and restore requests. This is re-iterating the store once, retrieve seldom pattern. Amazon also reserves 32KB for metadata per Archive within Glazier, instead of 8 KB per Object in S3, both of which are charged back to the user. This is important to keep in mind for your backup strategy, particularly if you are storing a large number of small files. If those files are unlikely to require restoring in the short term it may be more cost effective to combine them into an archive and store them directly within Amazon Glazier.

Tooling

Fortunately enough, there is a large variety of tools available on the web that allow you to consume AWS S3 and Glacier services to create backups of your data. They reach from stand-alone, local PC to enterprise storage solutions.

Just bear in mind that whatever third party tool you are using, you will need to enable with access to your AWS account. You need to ensure that the backup tool only gets the minimum amount of access to perform its duties. For this reason it is best to issue a separate set of access keys for this purpose. You may also want to consider the backup of your data to an entirely independent AWS account. Depending on your individual risk profile and considering that your backups tend to provide the last resort recovery option after a major disaster it may be wise to keep those concerns separated. Particularly to protect yourself against cases like Code Spaces where all services and data within the account got wiped out entirely.
For reference we have included instructions below for the configuration of dedicated backup credentials on the example for my backup tool of choice CloudBerry.

AWS Identity and Access Management

Identity and Access Management (IAM) allows you to manage users and groups for your AWS account and define fine grained policies for the access management of the various services and resources. To get started log-in to the AWS Management Console and open the link for IAM

This opens the IAM Dashboard. Once in the Dashboard you can navigate to Users and select the Create New Users option. Selecting the “Generate an access key for each User” option ensures that an access key is issued for each user at creation time. An access key can be issued at a later time as well though in case you miss that step.

After confirming the dialogue you will be given the opportunity to download the Security Credentials, consisting of an unique Access Key identifier and the Secret Access Key itself. Naturally the Access Key should be stored in a secure place.

As a default, new users will not have any access to any of the resources within the account. Access is granted in attaching an IAM policy directly to a user account or in adding the user to a group with an IAM policy attached. To attach a user policy to an account, select the user and open the Permissions tab.

IAM policies allow for very granular access to AWS resources; hence I am not going into too much detail here. Policies can be defined using pre-defined templates or the policy generation tool. For the purpose of allowing your backup tool write access to your AWS S3 bucket just select the Custom Policy Option.

Below policy grants three different sets of rights:

  • Access to AWS S3 to list all buckets for the account,
  • Access to the bucket MyBucketName and
  • The ability to read, write and delete objects within the MyBucketName bucket.
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:GetBucketLocation","s3:ListAllMyBuckets"],
      "Resource": "arn:aws:s3:::*"
    },
    {
      "Effect": "Allow",
      "Action": [ "s3:ListBucket" ],
      "Resource": [ "arn:aws:s3:::MyBucketName" ]
    },
    {
      "Effect": "Allow",
      "Action": [ "s3:PutObject", "s3:GetObject", "s3:DeleteObject"],
      "Resource": [ "arn:aws:s3:::MyBucketName/*"]
        }
  ]
}

If you don’t want to give access to list all available buckets within your account, just omit the first object within the JSON statement. In this case though the bucket name cannot be selected within the application.

    {
      "Effect": "Allow",
      "Action": ["s3:GetBucketLocation","s3:ListAllMyBuckets"],
      "Resource": "arn:aws:s3:::*"
    },

Finally

While this post primarily focussed on backup options for your hosted environments, it is not limited to this. Amazon S3 and Glacier are available world wide through public API endpoints.
Additionally, enterprises can make use of the AWS storage gateway to backup your on-premises data in AWS. Commonly known enterprise backup software from vendors like Commvault, EMC or Symantec also provide you with options to utilise Amazon’s cloud storage as an additional storage tier within your backup strategy.

Architecting on AWS – design for graceful service degradation

Throughout our series of posts we have already seen a variety of architectural patterns that allow us to design scalable and resilient solutions in using the capabilities provided to us by Amazon Web Services (AWS). However, even the best design can have flaws and may show signs of bottlenecks over time or as the demand on your application increases. This could be caused by additional load created by an influx of additional customers using your application or an increasing amount of data that needs indexing in your relational storage tier, to only name a couple.

As the saying goes; the devil is in the detail and your service quality can degrade for a large variety of reasons. While you may not be able to predict and detect each and every potential issue through load testing, you can use a number of architectural patterns to ensure that you continue to interact with your customers or users and therefore have a higher chance of keeping them satisfied.

No matter how well  you plan your solution, it’s unavoidable that some dependencies or processes will live beyond the control of the calling process.. A typical response to this has originally been described by Michael Nygard with the Circuit Breaker Pattern. Many sources already talk about the use of the pattern in the application development space.
The same concept can also be implemented in the AWS infrastructure layer.

Your key ingredients for this are the Route 53 managed DNS service in combination with Route 53 health checks. Route 53 allows you to create primary and secondary DNS record sets for a given record. This is best explained with an example.

Primary DNS record set

Imagine your web site is hosted on a number of web servers that are load balanced using an AWS Elastic Load Balancer (ELB). So in Route 53 you would create an alias record set that points to the ELB endpoint.

AWS Route 53 primary

We then set the routing policy to failover with a record type to Primary. This advises Route 53 to only respond with the IP address of the configured endpoint if the associated resource status is healthy.
For this to work you also need to create a Route 53 health check and associate it within the current record set.
In its most basic configuration you would point the health check to the same target as the DNS entry. Most of today’s modern web applications though are dependent on a variety of service tiers. Therefore you may want to consider the deployment of a custom health service as mentioned in my earlier post on AutoScaling. This way, the status of all sub-services contributing to the overall user experience can be included in your web site’s overall calculation.

Secondary DNS record set

Next we need to configure the secondary record set with the IP of your failover solution. Route 53 will respond with this target when the primary is considered unhealthy. Again, in its most basic form this could just be a public S3 bucket with a static web page that is enabled for website hosting.

AWS S3 Website

When setting up the static site, you need to ensure that the bucket has the same name as your domain as described above. When you finished configuring the static web site, you can jump back to Route 53 to associate the secondary DNS alias record for your domain. This time we are selecting the S3 bucket as the target.

AWS Route 53 secondary

In summary

We tend to stumble across a large variety of different needs in our daily work, each of which demands its own unique solution. For this reason this post can yet again only serve an appetiser.
AWS and Route 53 allows for far more complex scenarios and cascading DNS configurations that allows you to combine regional, weighted and failover records to cater for a wide variety of use cases.

Your solution can also be more sophisticated than a basic static web page that is hosted on S3. Instead you could also fail over to a secondary data centre in a different region or a secondary environment that may provide a limited set of features to the users or your site.
This again may be controlled by the logic in your health reporting service in combination with your application logic. You may, for example, still be able to take orders when the warehouse service is unavailable, though you may not be able to display real time availability information.
However, you would want to switch over to an alternative web site that is informing your customers when your web site is overloaded. The combination of an intelligent application and infrastructure design ensures that existing customers with an active transaction (e.g. a full shopping basket) can continue to check-out, while new visitor to the site are asked for a bit of patience.

As mentioned before, every solution is different. Therefore it is important for you to understand the capabilities that are provided as part of today’s Cloud offerings. This way you can start considering solutions that are beyond the limitations of your traditional infrastructure services.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.

Architecting on AWS: Optimising the application design

In our practice we hear a variety of misconceptions and misinterpretations in relation to the benefits of moving workloads ‘into the cloud’. You should be very vary if someone wants to make you believe that the pure migration of a traditional application to a cloud services vendor will make it any more scalable or reliable. Of course, you can scale vertically in increasing the size of your compute nodes. However, this still restricts you to the maximum size of instances available. Scaling horizontally on the other hand, in distributing your workload over multiple instances, requires special considerationswith regard to optimizing the application design.

As part of your migration strategy you should also critically review your existing application components and consider if any of the high level capabilities provided by your cloud vendor can deliver the same functionality at reduced cost, increased reliability, higher flexibility, et cetera. This story from marketing software provider Moz also serves as a timely reminder that your mileage may vary depending on the type of your workload and non-functional requirements. Therefore we can only provide a general guide as each option needs to be considered against its advantages and disadvantages against the overall solution.

Please note that we tend to angle our posts along the service catalogue of Amazon Web Services (AWS). However, most of the strategies and patterns we describe will also apply to other cloud vendors that provide capabilities with similar characteristics. With that in mind, let’s review our two tier example application from the first post in this series. To support the elasticity and reliability of our solution, we should consider a few concerns from the application perspective.

Optimizing the application design: session management

In the design we are using the Elastic Load Balancer (ELB) for the distribution of load between instances. By default, the ELB routes each request to the instance with the smallest load. This is also referred to stateless load balancing and needs to be considered in the design of your application. Unless you are having strong reasons to store your session state in memory on the web server, you should migrate to an external session provider. A common approach for this is the use of a session table within a relational database system. However, this doesn’t come without risks as the database system may be unnecessary flooded with session requests. This is potentially impacting on the overall performance of the application and causing scalability issues in the database tier.

On AWS you are better off to use DynamoDB. Amazon DynamoDB is a key-value data store (with recently added support for document data model) which delivers configurable, predictable performance. For that reason DynamoDB is an ideal candidate to consider for the management of session state. As a fully managed service you won’t even have to worry about any operational or administrative cost.

Optimizing the application design

A word on RDS and scaling

If you are an observant reader you may have already spotted a snag. If you really want your overall solution to be truly scalable, you need to ensure that this is applied to each tier of the application. This is posing a couple of issues in the traditional relational database space. While you have the ability to ‘crank-up’ the amount of provisioned IOPS for your RDS instances, there is no ability to autoscale database instances in the same fashion as EC2.

A common design pattern is the separation of write operations from read operations. By the general nature of storage systems read operations tend to be proportionally higher than database write operations. For that reason you can offload read requests to a dedicated (set of) read replicas to provide some level of scalability. Based on the underlying design limitations within the storage engines though, this cannot be provided fully automated. Amongst other limitations we also need to highlight that the support of read replicas in RDS is limited to MySQL, PostgreSQL and the new Aurora storage engines.

Alternative database technologies

All this is obviously not going to be an issue for you if you are dealing with very steady and predictable workloads. If you do have the need to deliver a scalable solution though, you will eventually get to the stage where you need to consider alternative mechanisms to reduce the load on your relational database environment. An obvious choice would be the consideration of alternative database technologies like Amazon DynamoDB or Amazon SimpleDB for the data that doesn’t require a relational structure.

Content distribution

You can also reduce the strain on your application and database services in employing caching services like Amazon CloudFront. As described earlier, CloudFront provides a large number of edge locations across the globe that act like a massive cache for web and streaming content. The cache behaviour settings allow you to optimise the cache behaviour for the unique needs of your application. As an added bonus this will also improve the overall user experience for your customers.

Optimizing the application design

Object storage

Finally we briefly want to touch on the Amazon S3 storage service. Again, many traditional application designs make us of relational or file system resources for the storage of BLOB objects. While you can certainly continue on that path, we recommend to rethink that approach for a number of reasons. For one, you obviously need to continue to provide and manage your own file system or relational database environment. You also have to ensure that the systems are always up-to-date, ensure that you have got enough available disk space available and the systems are actually available to meet your service level agreements.

If those operational reasons haven’t put you off yet, you may want to consider the actual costs for storage. Based on the figures for my ‘home’ region Sydney, the cost of storing 100 GByte of data on S3 is approximately 30% of the cost of Elastic Block Storage or RDS. And the pure cost of storage doesn’t even include the cost for utility compute to power the relational database or file system environments. So unless there is a specific need for a direct attached, high performing local disk e.g. for the hosting of a COTS solution like SAP, we strongly recommend to consider the use of the S3 object store where applicable.

With those initial teasers in mind you should start exploring the AWS service catalogue and our rich training content on CloudAcademy to consider what other services you could utilise to address the unique concerns of your solution.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.