May 29, 2014

After working with the raw materials to install Hortonworks Data Platform HDP on Windows in an Azure IaaS VM, I came across the automated install tool from Hortonworks.

http://hortonworks.com/blog/automated-install-hdp-2-1-hadoop-windows/

You get it from github and run the exe included.  It uses a config file to describe where all the install files are such as python, java, etc. and it goes through and installs them all.  It then runs smoke tests to see if everything is working.

An easy way to get a cluster up and running!


May 14, 2014

I was working with Power Query for Excel and needed to parse a tab separated file.  Unfortunately there isn’t a TSV option directly in Power Query.  I couldn’t find much information on how to tackle this so I’ll post here.

The first thing I ended up doing was importing the tab separated file as a CSV.  This added three steps to Power Query.  The first step looked like this. 

= Csv.Document(File.Contents(“<FILE LOCATION>”),null,{0,42,53},null,1252)

If you look at the documentation for Csv.Document, you’ll see that the third parameter is the delimiter.  Unfortunately something like “/t” doesn’t work in PQ so I had to figure out what it is.

If you use #(tab), that is what you need to “escape” tab and send the value to the function.  You’ll need to have quotes around it like this:

= Csv.Document(File.Contents(“<FILE LOCATION>”),null,”#(tab)”,null,1252)

I found that this messed up the next steps so I ended up creating a blank query and adding this as the source step.  This set up the FirstRowAsHeader step and the ChangedType step.