Another tenet of Agile development is automated testing of code. We can apply this idea to our datamart as well.
What kinds of things need to be tested in a data warehouse? We don’t need to test transactions as this is the responsibility of the ETL system. What we do need to test is the quality of the data in the mart. This includes both measures in the fact table and data in the dimension tables.
There are two different times that we need to test our datamart. We need to test before our ETL load and also after. This allows us to first test the veracity of the structure of the datamart. We can then run the regular or standard ETL process into the fact or dimension table and then re-run the test with the new expected results. These two sets of tests should be run on known and static data. One set scripted out as insert statements for the fact table or dimension table and another set in an ODS or flat file for use in the ETL process run that is tested.
Why would we want to all the trouble of creating runnig all these tests? Having automated tests allows us to refactor and expand the datamart with more confidense. As we move through more and more sprints or development iterations, our test set grows giving us more confidense that new development doesn’t have adverse effects on already deployed functionality.
Some of the obvious tests we might run include sums of measures in the fact table sliced by many different dimensions, row counts in the fact table according to dimension slices and row counts of dimension tables by different dimension attributes.
The most important thing about testing the datamart is that the number of tests that are run during the testing phase continues to grow. As time passes, we should have more tests testing more functionality.
The tool that I have found that comes the closest to being able to perform these types of tests on SQL Server is called TSQLUnit. This tool places about 3 or 4 tables and 5 or 6 stored procedures in your database. You then call one of the stored procedures to run the tests. It loops through the database catalog and calls stored procedures with the prefix of ‘tsu_’. These are stored procedures that are written to perform the tests. The test results are then stored in one of the tables.
While this to me isn’t ideal, it is a really good start to testing a datamart. What would be better, in my opinion, would be a console application that does all the testing out of an XML file that defines queries and answers and runs all the tests and then reports answers.
Let me know what you think about TSQLUnit or comments about tests for a datamart.