Hadoop streaming R on Hortonworks Windows distribution

The ability to combine executables with the hadoop map/reduce engine is a very powerful concept known as streaming. Thanks to a very active community, we have  ample of examples available on the web that caters for a large variety of languages and implementations. Sources like this and this have also provided some R language examples that are very easy to follow.
In my case I was only required to look at the integration between two tools. Unfortunately though, I hadn’t so far been able to find any detail on how to implement streaming of R on the Hortonworks Windows distribution.

Why Windows you are asking? – Well, I guess this is based on the same reason on which Hortonworks decided to even consider shipping a Windows distribution in first instance. Sometimes it is just easier to reach a market in a perceived familiar environment. But this may become a topic for another post someday.

In hindsight everything always appears quite straight forward. Still, I would like to briefly share my findings here to reduce the research time for anyone else who is presented with a similar challenge.

As a first requirement R obviously needs to be installed on every data node. This also applies to all additional R packages that are used within your application.

Next you are creating two files containing the instructions for the map and reduce tasks in R. In the example below the files are named map.R and reduce.R.

Assuming that your data is already loaded into hdfs you issue the following command on the hadoop command line:

hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.4.0.2.1.1.0-1621.jar -files "file:///c:/Apps/map.R,file:///c:/Apps/reduce.R" -mapper "C:\R\R-3.1.0\bin\x64\Rscript.exe map.R" -reducer "C:\R\R-3.1.0\bin\x64\Rscript.exe reduce2." -input /Logs/input -output /Logs/output

A couple of comments regarding the streaming jar options used in this command under Windows:

-files: in order to submit multiple files to the streaming process, the comma separated list needs to be encapsulated in double quotes. Access to the local file system is provided using the file:/// scheme.
-mapper and -reducer: since the R application can’t be pre-faced with a hashbang in Windows, we need to provide the execution environment as part of the option. As above, the path to the Rscript executable and the name of the R file again need to be encapsulated in double quotes.

Using IAM credentials to grant access to AWS services

Having used Amazon Web Services (AWS) for quite some time now, I realised that it should be time to start sharing some of my experiences on this blog, particularly considering that I haven’t contributed to the World Wide Web community for quite some time.
Today’s post is about Amazon’s Identity and Access Management (IAM) service and why it is a good idea to use it.
I am using the great backup solution from CloudBerry to backup important files from my laptop on Amazon S3. While I am absolutely excited about the capabilities of the application, I still did not feel comfortable to provide my AWS root account access keys (as described in CloudBerry’s help file) to the application.

Why is that?
To fully understand the issue we need to be aware about the difference between AWS root credential and an IAM identity. The AWS root account is provisioned for all AWS users and has full access to all resources and services in the account. Sharing the secret access key for this identity with a 3rd party potentially gets you into big trouble; a malicious piece of software may use the credentials to wipe out all your data, terminate instances or, potentially worse, subscribe to a new raft of additional services that you will have to pay for at the end of the month. Scary stuff right? So please read on. Continue reading

Fixing crashes in Business Intelligence Development Studio (BIDS)

I recently started to look at SQL Server Analysis Services to get a better understanding of the design and usage of OLAP cubes.

Unfortunately the development environment (Business Intelligence Development Studio) which is still based on Visual Studio 2008 kept on crashing on me for no obvious reasons. Being relatively new to the topic I obviously couldn’t judge whether the crashes where caused by my wrongdoing or something happening in the application itself. To my defence though, I shall say that apps should terminate gracefully after a crash instead of just disappearing from the screen. Just saying!
Whenever I opened Translations, Calculations or Perspectives in the Cube Designer my UI just disappeared from the screen, rendering those few features unusable.
At first, the big world of the wide web unfortunately didn’t yield any results until I tried to work with Cube Actions, where I was finally presented with an error message that helped me moving into the right direction:

Could not load file or assembly 'msmgdsrv, Version=9.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91' or one of its dependencies. The system cannot find the file specified.

Researching the error message I got a few pointers that explained how the issue would be solved for MDX queries affecting SSMS. However, it did not fix my issue with Business Intelligence Development Studio, which is built on top of Visual Studio 2008.

Looking at the syntax of the fix though I decided that if above fixes are applied to the ssms.exe.config, they should work equally well for the devenv.exe in customising the devenv.exe.config.

So in editing devenv.exe.config on C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE (your path may be different) I added the snippet below as an additional  <dependentAssembly> element in the XML file (screenshot).

<!– CP 24/09/2012 added to fix missing msmgdsrv error –>
<dependentAssembly>
<assemblyIdentityname=msmgdsrvpublicKeyToken=89845dcd8080cc91/>
<codeBaseversion=9.0.0.0href=C:\Program Files (x86)\Microsoft Analysis Services\AS OLEDB\10\msmgdsrv.dll/>
</dependentAssembly>
<!– end of customisation –>

Needless to say that this solved all the issues I encountered so far with the development environment as otherwise I wouldn’t have documented them here!

Using regular expression with Connections profiles population

Today I was challenged to make use of regular expressions in Tivoli Directory Integrator to further filter LDAP distinguished names for the import into IBM Connections Profiles.

According to the Connections 3.0.1 documentation this can be achieved in specifying your regular expression in the source_ldap_required_dn_regex property of the profiles_tdi.PROPERTIES file.

Looking at the underlying TDI Assembly Line code though I discovered that source_ldap_required_dn_regex is never being read. Instead a property of source_ldap_required_dn_regex_pattern is being evaluated.

 

Using the right property unfortunately still doesn’t make the regular expression filter work properly. Instead I received the following error in the ibmdi.log file when running the sync_all_dns Assembly Line:

ERROR [AssemblyLine.AssemblyLines/_internal_ldap_iterate.5] - [ldap_iterate]
CTGDIS181E Error while evaluating Hook 'After GetNext' in the Component
'ldap_iterate' (ldap_iterate.after_getnext).
com.ibm.jscript.InterpretException: Script interpreter error, line=26, col=74:
Unknown member 'test' in Java class 'java.lang.String'

Hunting down the offending line of code the cause of the error becomes quite obvious. According to the JavaScript introduction from the TDI Users online community, TDI specific objects are Java objects instead of JavaScript objects. In our example the source_ldap_required_dn_regex_pattern variable is a java.lang.String object, which does not have a method named test. The test method is part of the JavaScript RegExp object, which apparently has been expected in this case.

After a brief moment where I was wondering whether IBM ever tested that bit of code or if I am the only one with that issue I replaced:

if(!lcConf.source_ldap_required_dn_regex_pattern.test(sdn)) {

with:

var re = new RegExp(lcConf.source_ldap_required_dn_regex_pattern);
if(!re.test(sdn)) {

This results in my ‘re’ variable to be properly instantiated as a RegExp JavaScript object with the value of the source_ldap_required_dn_regex_pattern string variable. I can then successfully use the JavaScript RegExp.test function to test the value of sdn. As another way of solving the issue I could have also rewritten above line of code and use the matches function of the java.lang.String class, passing in the regular expression itself.

Misleading error message at ST Meeting Server installation

This week I was challenged with a completely deceptive error message during the installation of the Sametime Meeting Server. The error message claimed that

System Clocks are not synchronized within 5 minutes of one another, Please synchronize for federation.

IBM Installation manager error message

Since I was working in a somewhat unreliable test environment I obviously believed in the message and dutifully compared the system times between the Sametime Systems Console server and the Sametime Meeting server. Without a lot of surprise the times were synchronised and the time zone settings didn’t cause any trouble either. This obviously didn’t help me in any shape or form hence I started to investigate the various log files created by the system.

Looking at the Deployment Manager’s System log I discovered an error, which was logged every time I tried to confirm the deployment plan for the Sametime Meeting Server:

[20/04/12 16:21:13:150 NZST] 00000065 exception     W com.ibm.ws.wim.adapter.file.was.FileAdapter login CWWIM4512E The password match failed.
[20/04/12 16:21:13:151 NZST] 00000065 exception     W com.ibm.ws.wim.adapter.file.was.FileAdapter login
                                com.ibm.websphere.wim.exception.PasswordCheckFailedException: CWWIM4512E The password match failed.
at com.ibm.ws.wim.adapter.file.was.FileAdapter.login(FileAdapter.java:2025)
at com.ibm.ws.wim.ProfileManager.loginImpl(ProfileManager.java:3519)

[20/04/12 16:21:13:153 NZST] 00000065 LTPAServerObj E   SECJ0369E: Authentication failed when using LTPA. The exception is <null>.

This triggered my memory. We recently had to change the password for the local Websphere administrative user. Researching above error message I learned from Dave (Thanks!) that the local file registry could still hold the wrong password. The link in Dave’s post pointed me to the instructions on how to How to reset the administrator’s password in the file registry. While those instructions obviously aim for Websphere Portal they still helped me solve my issues with the Sametime Systems Console deployment manager.

After synchronising all the nodes and restarting the application servers, node agents and the deployment manager I managed to successfully deploy the Sametime Meeting Server.

 

How to reset the administrator’s password in the file registry