A colleague of mine named Greg Graham and I have been spending a lot of time thinking about and trying out automated builds for data warehouses. I attended a lunch and learn on Agile/Scrum methods for software and wondered if this approach would translate to benefiting a warehouse. Tight development cycles of 2 or 3 weeks means we have to automate the build of the warehouse so that we can push one button and the warehouse is completely (or partially) rebuild from the create scripts and then loaded through our SSIS packages.
So far so good – but there are some difficulties that we are coming across.
We chose NAnt as our build manager. We tried MSBuild but it didn’t integrate well with VSS 6. NAnt worked right off the bat. I suppose we might have been able to get MSBuild working, but we gave up because NAnt worked so well so quickly. We may revisit MSBuild at a later time – especially if we upgrade to VSS2005.
We have also abstracted out the server location and username and password so that we can go into the NAnt script and deploy to a different server without any major changes. We tested this just the other day and it works great. Within a matter of seconds, the whole project was deployed on a brand new server with all the tables created and the default insert scripts completed.
First of all, we keep our data warehouse models in PowerDesigner. This tool seems to just about as good of a job as any for modeling, but it isn’t well designed for automating the process of creating the table create scripts. Evidently there is a VBScript interface to it and we’ll be looking into that eventually. We’re pretty sure we can get this part done, but we are putting it off for now since it will take some learning.
The place where we decided to start was the automation of the deployment of the create table scripts from PD. We go in and manually run the Generate Database commands in PD and then check in the scripts into VSS. After that is done, we are using NAnt to get a local copy of the scripts and then execute them. The scripts are broken into 2 parts – the create table scripts from PD and the default inserts for the tables. These default inserts are any rows in any of these tables that we know a build time. These could be default members in dimension tables or even rows in dimension tables that we know already because they do not change much ever. We also turn on IDENTITY INSERT so that the keys never change. We use this nifty little stored procedure called sp_generate_inserts to dump out all of the records in the dimension table into INSERT statements that we can later call.
The next few tasks include digging into PD VBScript so that we can automate the creation of the create table scripts and also automating the deployment of SSIS packages and then calling those packages from NAnt. Calling them won’t be hard – we’ll use sp_start_job. The problem with this is that it completes right after the job is called – we won’t know when the package is complete. We’ll figure out a way around this by having the SSIS package alert the NAnt process that it is complete and can finish up.
Just to mention our goal process for the automated build, it would be this:
- Create create table scripts in PD
- Check them into VSS
- Get the most recent scripts from VSS
- Run the scripts
- Get the most recent SSIS package from VSS
- Deploy the SSIS Package
- Call the SSIS Package
This should enable us to create and populate the warehouse from scratch at a push of a button. Hopefully we can get there – I bet we can.