Architecting on AWS: Dynamic configuration vs master AMIs

Infrastructure as a Service holds the promise of reduced costs and increased flexibility that is enabled through ease of operation and management. To seize that opportunity as IT professionals when we are architecting on AWS though, we need to adapt how we view, manage and operate today’s technology.

The desire to respond more agile to changing business needs and the ever increasing pace in innovation has helped to form the DevOps service delivery model where the development and operations domains moved closer together.
Modern cloud service providers support and continuously advance the model of coded infrastructure. As part of that they keep abstracting further away from our traditional understanding of IT infrastructure. This is reaching a point, where compute services become a true commodity similar to power or tap water.

If you are new to cloud computing and coded infrastructure, it is important for you to understand those underlying basics as we are going to built on them at the later stage as indicated in the ‘Outlook’ below.

architecting on aws

When you create a new AWS EC2 instance from one of the (admitting large variety) of Amazon Machine Images (AMIs) you will eventually require to customise certain configuration settings or deploy additional applications to tailor it for your solution. The Base Microsoft Windows Server AMI for example doesn’t have a web server pre-installed. This provides you with the flexibility to configure the web server of your choice.
While we could log-on to the machine after launch and manually deploy and configure our web server, this is obviously not going to be good enough long term. Particular not, if we eventually want to be able to dynamically scale our environment. Even if you just want to make a single instance more fault tolerant as described in our previous post in this series, you would need to employ a basic level of automation.

Architecting on AWS: the options continuum

As with any good IT problem there is more than one solution to the problem. Naturally those options are kind of on opposing ends of a scale. Your task is to weigh off the advantages and disadvantages of each option to find the optimal solution for your needs.

Dynamic configuration

The standard AWS AMIs can be instructed to perform automated tasks or configuration actions at launch time. This is enabled by the EC2Config service for Windows and cloud-init scripts under Linux. You provide those instructions as “user data” as part of the advanced launch configuration of your instances as shown in the example below.
architecting on aws

The user data instructions can either contain Microsoft script commands and PowerShell scripts on Windows or Shell Script and cloud-init directives on Linux based AMIs. The actual types of actions performed are only limited by your imagination and the total size limit of 16 kilobyte (a minor but important detail).

Pre-baked AMIs

Instead of configuring your instances dynamically at launch time, you can also create your own version of an Amazon Machine Image. Just launch a new instance, ensure that all your ‘static’ applications and settings have been applied, to finally create a new image from that instance. This is done in the AWS console using the Create Image option from the instance Actions menu or using the create-image command from the Command Line Interface.

architecting on aws

Trade-offs

Your decision of to decide for a dynamic configuration or master image approach depends on your individual use case. Each of the options does have its advantages and disadvantages that you need to understand and assess against each other in order to find the best solution for your scenario.

One advantage of using pre-baked AMIs is the reduced time to get a new instance from ‘launch’ to ‘ready’. With all components pre-configured and applications installed, you just need to wait for the instance to launch.
This obviously comes at a cost as the image requires constant maintenance. Even if your application code is fairly static, you still need to ensure that you keep your images patched regularly to ensure the resulting instances are not exposed to any new security threats.

On the other hand the dynamic configuration provides you with a lot of flexibility. Every instance you launch can have a ever so slightly different configuration.
Since you always ever start with an AWS managed AMI your security patches are ‘reasonable’ up-to-date (i.e. usually within 5 business days after Microsoft’s patch Tuesday for Windows AMIs).
You are ‘paying’ for this additional service through the time it takes for your instance to get itself ‘ready’ while executing all launch scripts. You also need to be aware that the ID of the AMI image changes whenever AWS releases a new version of the patched image. This is particularly important to note for your scripted launches or AutoScaling configurations as described in our previous post on this topic.

Fortunately we are able to combine the two options to get the best of two worlds. For this scenario you would create an AMI image that contains the applications and configurations items that are changing infrequently (e.g. Internet Information Server, Windows update configuration, etc.). Items that are changing frequently (e.g. your own application) are then injected as part of the dynamic launch configuration.
This approach minimises the time to get a new instance to the ‘ready’ state, yet still provides you with a level of flexibility to influence the final result through the user data instructions.

Outlook

While this post provided you with an introduction to the entry level functionality provided to you by AWS this is really just the tip of the iceberg to get your head into the right space towards the concept of a coded infrastructure.
Auxiliary configuration management solutions like Chef, Puppet and PowerShell DSC provide you with additional flexibility and control over your larger deployments.
Based on Chef, AWS OpsWorks also provides you with an application management solution, which is currently limited to Linux based AMIs.

At re:Invent 2014 AWS also released AWS CodeDeploy, supporting the automated deployment of code updates to your Linux and Windows environments, which is currently available for the North Virginia and Oregon regions. Knowing AWS though this is only going to be a short term limitation and we’ll be looking at this service, probably also in combination with Elastic Beanstalk and CloudFormation at a later stage. In the interim you can start to learn more about the individual AWS services using the courses that are available from CloudAcademy.

DISCLOSURE: This post has originally been created for and sponsored by CloudAcademy.com.

Hadoop streaming R on Hortonworks Windows distribution

The ability to combine executables with the hadoop map/reduce engine is a very powerful concept known as streaming. Thanks to a very active community, we have  ample of examples available on the web that caters for a large variety of languages and implementations. Sources like this and this have also provided some R language examples that are very easy to follow.
In my case I was only required to look at the integration between two tools. Unfortunately though, I hadn’t so far been able to find any detail on how to implement streaming of R on the Hortonworks Windows distribution.

Why Windows you are asking? – Well, I guess this is based on the same reason on which Hortonworks decided to even consider shipping a Windows distribution in first instance. Sometimes it is just easier to reach a market in a perceived familiar environment. But this may become a topic for another post someday.

In hindsight everything always appears quite straight forward. Still, I would like to briefly share my findings here to reduce the research time for anyone else who is presented with a similar challenge.

As a first requirement R obviously needs to be installed on every data node. This also applies to all additional R packages that are used within your application.

Next you are creating two files containing the instructions for the map and reduce tasks in R. In the example below the files are named map.R and reduce.R.

Assuming that your data is already loaded into hdfs you issue the following command on the hadoop command line:

hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.4.0.2.1.1.0-1621.jar -files "file:///c:/Apps/map.R,file:///c:/Apps/reduce.R" -mapper "C:\R\R-3.1.0\bin\x64\Rscript.exe map.R" -reducer "C:\R\R-3.1.0\bin\x64\Rscript.exe reduce2." -input /Logs/input -output /Logs/output

A couple of comments regarding the streaming jar options used in this command under Windows:

-files: in order to submit multiple files to the streaming process, the comma separated list needs to be encapsulated in double quotes. Access to the local file system is provided using the file:/// scheme.
-mapper and -reducer: since the R application can’t be pre-faced with a hashbang in Windows, we need to provide the execution environment as part of the option. As above, the path to the Rscript executable and the name of the R file again need to be encapsulated in double quotes.

Fixing crashes in Business Intelligence Development Studio (BIDS)

I recently started to look at SQL Server Analysis Services to get a better understanding of the design and usage of OLAP cubes.

Unfortunately the development environment (Business Intelligence Development Studio) which is still based on Visual Studio 2008 kept on crashing on me for no obvious reasons. Being relatively new to the topic I obviously couldn’t judge whether the crashes where caused by my wrongdoing or something happening in the application itself. To my defence though, I shall say that apps should terminate gracefully after a crash instead of just disappearing from the screen. Just saying!
Whenever I opened Translations, Calculations or Perspectives in the Cube Designer my UI just disappeared from the screen, rendering those few features unusable.
At first, the big world of the wide web unfortunately didn’t yield any results until I tried to work with Cube Actions, where I was finally presented with an error message that helped me moving into the right direction:

Could not load file or assembly 'msmgdsrv, Version=9.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.

Researching the error message I got a few pointers that explained how the issue would be solved for MDX queries affecting SSMS. However, it did not fix my issue with Business Intelligence Development Studio, which is built on top of Visual Studio 2008.

Looking at the syntax of the fix though I decided that if above fixes are applied to the ssms.exe.config, they should work equally well for the devenv.exe in customising the devenv.exe.config.

So in editing devenv.exe.config on C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE (your path may be different) I added the snippet below as an additional  <dependentAssembly> element in the XML file (screenshot).

<!– CP 24/09/2012 added to fix missing msmgdsrv error –>
<dependentAssembly>
<assemblyIdentityname=msmgdsrvpublicKeyToken=89845dcd8080cc91/>
<codeBaseversion=9.0.0.0href=C:\Program Files (x86)\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll/>
</dependentAssembly>
<!– end of customisation –>

Needless to say that this solved all the issues I encountered so far with the development environment as otherwise I wouldn’t have documented them here!

Changing time format for Domino service

For a special purpose described later I was asked to change the time format on my Domino server running as a service on Windows 2003 Server. The setting in question was the a.m./p.m. symbol that had to be altered to display AM/PM instead.

It is important to note that changing the time format in the Regional and Language Options in the control panel does not change it for the Local System account, which in case is the account that Lotus Domino runs under when configured to run as a service. Unfortunately there is no special Control Panel to change the settings for the Local Systems account in Windows.

There are to possible options to overcome this issue. First and foremost a special user could be created that is solely used to authenticate the Domino service against the system. This is certainly the preferred option for a professional Windows systems administrator.

Alternatively the setting can be changed tweaking the user settings in the HKEY_USERS\ .Default registry hive. In Windows 2003 the keys in question where s1159 (a.m.) and s2359 (p.m.). Both keys are found at HKEY_USERS\ .Default\Control Panel\International.

Please note that as with all changes to the Windows registry special care should be taken and I would like to take the same approach as all other vendors telling that no changes to the registry should be made without consulting the Microsoft support first and all changes are made at your own risk.