Automating SSIS Deployment

August 31, 2006

After we had the data warehouse built in our automated build, we needed a way to automatically deploy our SSIS packages and then call them.  You can use a tool called DTUtil to manipulate Integration Services on the server.  You can create and remove directores in msdb storage for SSIS packages, sign packages – and lots of other things.

I didn’t find many examples of using DTUtil so I’ll post an example.

Here is the command to deploy a package to the server.

dtutil /DestS [servername] /FILE [filename].dtsx /COPY SQL;[foldername]\[filename] /QUIET

If you will notice, you need to put .dtsx in the /FILE argument but not in the /COPY argument.  Also, it doesn’t matter what order the arguments are in so don’t worry about that.  In the /COPY argument the SQL means that I’m copying the package to the msdb on a SQL server.  The /QUIET means that it won’t complain if the package is already on the server – it will just overwrite it.

We were able to implement this right inside our NAnt script for deploying the SSIS packages.  Next we need to call the package right after deploy.  This might involve sp_cmdshell or calling a job out of a stored procedure.  We’ll see what works best.


Automating PowerDesigner

August 30, 2006

One more step of our automated build is solved.  I was able to automate PowerDesigner using Windows Script Host.  It isn’t the most elegant solution for sure, but it will do for now.

Here is a snippet of code:

set WshShell = CreateObject(“WScript.Shell”)
WshShell.Run “pdshell11.exe”
WScript.Sleep 2000
WshShell.SendKeys “^o”
WshShell.SendKeys MyModel.pdm

You can see that with this method, you could automate just about any piece of software – given that you can manipiulate it with keystrokes instead of mouse clicks.  I bet that eventually we will find a way to directly automate PD so othat we don’t have to use WSH.  Right now – while a build is in progress – you have to be very careful that you don’t touch the machine where the build is running.  If you change the active window while WSH is running, it will perform the .SendKeys on that window instead of PD.

This is something we would like to improve in the future, but for now it works.

Automated Builds for Data Warehouses

August 28, 2006

A colleague of mine named Greg Graham and I have been spending a lot of time thinking about and trying out automated builds for data warehouses.  I attended a lunch and learn on Agile/Scrum methods for software and wondered if this approach would translate to benefiting a warehouse.  Tight development cycles of 2 or 3 weeks means we have to automate the build of the warehouse so that we can push one button and the warehouse is completely (or partially) rebuild from the create scripts and then loaded through our SSIS packages.

So far so good – but there are some difficulties that we are coming across.

We chose NAnt as our build manager.  We tried MSBuild but it didn’t integrate well with VSS 6.  NAnt worked right off the bat.  I suppose we might have been able to get MSBuild working, but we gave up because NAnt worked so well so quickly.  We may revisit MSBuild at a later time – especially if we upgrade to VSS2005.

We have also abstracted out the server location and username and password so that we can go into the NAnt script and deploy to a different server without any major changes.  We tested this just the other day and it works great.  Within a matter of seconds, the whole project was deployed on a brand new server with all the tables created and the default insert scripts completed.

First of all, we keep our data warehouse models in PowerDesigner.  This tool seems to just about as good of a job as any for modeling, but it isn’t well designed for automating the process of creating the table create scripts.  Evidently there is a VBScript interface to it and we’ll be looking into that eventually.  We’re pretty sure we can get this part done, but we are putting it off for now since it will take some learning.

The place where we decided to start was the automation of the deployment of the create table scripts from PD.  We go in and manually run the Generate Database commands in PD and then check in the scripts into VSS.  After that is done, we are using NAnt to get a local copy of the scripts and then execute them.  The scripts are broken into 2 parts – the create table scripts from PD and the default inserts for the tables.  These default inserts are any rows in any of these tables that we know a build time.  These could be default members in dimension tables or even rows in dimension tables that we know already because they do not change much ever.  We also turn on IDENTITY INSERT so that the keys never change.  We use this nifty little stored procedure called sp_generate_inserts to dump out all of the records in the dimension table into INSERT statements that we can later call.

The next few tasks include digging into PD VBScript so that we can automate the creation of the create table scripts and also automating the deployment of SSIS packages and then calling those packages from NAnt.  Calling them won’t be hard – we’ll use sp_start_job.  The problem with this is that it completes right after the job is called – we won’t know when the package is complete.  We’ll figure out a way around this by having the SSIS package alert the NAnt process that it is complete and can finish up.

Just to mention our goal process for the automated build, it would be this:

  1. Create create table scripts in PD
  2. Check them into VSS
  3. Get the most recent scripts from VSS
  4. Run the scripts
  5. Get the most recent SSIS package from VSS
  6. Deploy the SSIS Package
  7. Call the SSIS Package

This should enable us to create and populate the warehouse from scratch at a push of a button.  Hopefully we can get there – I bet we can.

Return a NULL DT_STR in a conditional statement

August 2, 2006

I have a requirement to convert all blank strings in a data flow into NULLs.  Here is the expression I tried first:

TRIM( [Some Column] ) == “” ? NULL( DT_STR, [length], 1252) : [Some Column]

This threw an error:

Error at Master DataFlow [Derived Column Task]: For operands of the conditional operator, the data type DT_STR is supported only for input columns and cast operations.  The expression “TRIM( [Some Column] ) == “” ? NULL( DT_STR, [length], 1252) : [Some Column]” has a DT_STR operand that is not an input column or the result of a cast, and cannot be used with the conditional operation.  To perform this operation, the operand needs to be explicitily cast with a cast operator.

After much messing around, I foud that if I cast the NULL statement as a DT_STR, the expression compiler was happy:

TRIM( [Some Column] ) == “” ? (DT_STR, 8, 1252)NULL(DT_STR, [length], 1252) : [Some Column]

Is this an expected behavior of Integraion Services?

I guess I would have expected that since I specified the type in the NULL statement that I would have to cast it as a DT_STR afterwords.

Any thoughts?