Avoiding the Union All

April 9, 2008

LineageIDs are assigned to any column being passed through a dataflow.  This is the id that SSIS uses to track where a column is sourced.  When you add an OLE DB Source to your data flow, every column that enters the data flow there is assigned a LineageID there.  New columns can be created and LineageIDs assigned at other places as well.

For instance, the Union all.  SSIS treats the output columns of a Union All as new columns in a dataflow and assigns new LineageIDs.  This is good because any colunm going through a Union All is technically not any of the input columns – it is a new column of the combined inputs.

A common use of the Union All is to bring rows back together after a Lookup task to lookup dimension keys where rows that didn’t find a lookup match are directed to the error output.  These error output rows are sent to a derived colunm task to have the dimension key set to the ‘NA’ member or whatever the ‘Unknown’ dimension member is.

This is all fine and dandy until you have quite a few Union All tasks in the data flow.  I found that once I had 20 or so in the flow, the performance of BIDS hit the floor.  It seemed as if BIDS couldn’t deal with all of the LineageIDs – or maybe more secifically all of the columns derived in the package.  Now instead of haveing 200 columns, we had 2000 to 3000 unique LineageIDs.

With suggestion of a colleague of mine, Yuri Bykov, we took out all of the Union All statements and instead set the error row output to ignore on the Lookup.  That flows right to a derived column task that check to see if the column that the lookup task added to the data flow is null.  If it is null and the column flowing through that was the column used to look up into the dimension, we know that the lookup failed and we can set the dimension key to ‘Unknown’ or whichever member it needs to be.

This really improved the performance of BIDS and I think even the performance of the package.

Advertisements

Yea for the scroll bar!

February 25, 2008

One of the things that irked me about Integration Services (BIDS) is that the Execute SQL Task didn’t have a scroll bar for the SQL Text.  If you had a long piece of SQL, the only way to scroll was to use your cursor and arrow keys.  How annoying.

This has been fixed in 2008 (Katmai).

Yea!

SQL Task Scroll Bar


SQL 2008 in Q3 ’08

January 25, 2008

Teo Lachev has brought up that a clarification has been made that SQL 2008 will be released in Q3 2008.  The original blog entry can be found here.

I guess this is good news – but I have to say that many of the places I’m working are just getting through the 2000 to 2005 migration.  I know that Microsoft has been hard at work to deliver SQL within the 24-36 month range.  I have to ask myself – is that because they think it is good for their customers or because it is good for their revenue stream?

Can customers keep up with the upgrade cycle? Can technologists keep up with all the new features?

I also think that Microsoft is undermining their certifications program because it is quite an investment to achieve one and before you know it, you have to start all over with the next version.

Have a good weekend.


Consuming the Reporting Services web service inside SSIS

June 6, 2007

I had a request to go into more detail as to an implementation of the ReportService web service inside SSIS.  I decided I’d take a crack at it.

Taking a snapshot of a report after a successful ETL load is a common task.  You could automate this task into the ETL load by consuming the Reporting Services web service and calling the UpdateReportExecutionSnapshot method and invoking Reporting Service to take a snapshot.  This could then be a catalyst for subsequent subscriptions for users who have asked to be notified when a new version of a report is ready.

First of all, we need to create a proxy object for the service by extracting the WSDL document from the service and using it as an input into a tool that Microsoft provides with Visual Studio called wsdl.exe.  The most straight forward execution for wsdl.exe would look something like this:

wsdl.exe /language:VB /out:ReportService.vb http://%5BReportServerName]/ReportServer/ReportService.asmx?WSDL

OR

wsdl.exe /language:CS /out:ReportService.cs http://%5BReportServerName]/ReportServer/ReportService.asmx?WSDL

You can choose your output language according to your liking, VB or C#.  Complete documentation of all the options for invoking wsdl.exe can be found here.

Once the proxy class’ code has been generated, it is time to get that code imported into an SSIS package.  Go open an SSIS package and place a Script Task into the Control Flow area.  Once that is completed, double click on the new Script Task to open it up.  Next go to the Script “tab” over on the left and then click Design Script… on the right side.  We are now in the code editor for the Script Task.

Next we need to add a new code file into the task.  To do this, go to the top level of the tree view in the Project Explorer (mine is on the left side of the Microsoft Visual Studio for Applications window) and right click.  This will bring up a context sensitive menu.  Choose Add, and Add Class….  When the next dialog box comes up, name the file ReportService.  After clicking the Add button, there should be a new item in your Project Explorer tree view called ReportService.

There will be a little bit of code auto generated by SSIS inside the new file.  We’ll go into the file by double clicking on it and delete out the auto generated code.  Next, go to the file that was created by wsdl.exe and open it up in notepad.  We just need to select all the code in it and paste it into the new file inside the SSIS task we just created.  The code editor should now look something like this:

We need to add a couple of references inside the script task as well.  Right click on References in the Project Explorer and click Add Reference….  Then choose System.XML and System.Web.Services and click OK.  Now we are ready to program against our proxy object.  To begin using the new class we created we need to instantiate it into an object.

You can see here that after I instantiate the object, I’m able to view its methods (known as web methods) using IntelliSense.  For every operation that ReportService.asmx exposes, there are two versions.  One is an asynchronous version and a synchronous version.  The asynchronous version allows you to make the call out to begin the web method call and be alerted when the call is finished by using a callback.  This allows the calling process to keep from blocking its main thread during the call.  These methods work in a pair and have Begin[MethodName] and an End[MethodName] procedure.  The synchronous method blocks the calling thread during execution and is just named [MethodName].  We’ll use the synchronous method here.  You can find more information on using asynchronous web method calls here.

The method we will concentrate on here is the UpdateReportExecutionSnapshot method.  Reports in Reporting Services can be rendered from a snapshot that is taken at an earlier time so that no database query is needed and report response is increased.  One way to take a new snapshot is to declare a schedule and have the snapshot taken at a particular time.  This is sometimes not practical since ETL completion time can vary from day to day.  We can change the properties of a report to render from snapshot and then create a new snapshot after ETL processes are complete.

Before the UpdateReportExecutionSnapshot method will work, security must be set up properly.  You can either change the ReportServer directory to allow anonymous access (not recommended) or apply the credentials of the caller to the web service call.  This is done by using the System.Net.CredentialCache namespace and assigning the Credentials property of our proxy object to be equal to System.Net.CredentialCache.DefaultCredentials.

To make the complete call and create a new snapshot, here is the code:

Dim rs As New ReportingService()
rs.Credentials = System.Net.CredentialCache.DefaultCredentials
rs.UpdateReportExecutionSnapshot(“[ReportName]”)

ReportName is a string that describes the full path to the report.  If a report called “Test Report” were in the root or home directory, you would just need a forward slash then the report name – “/Test Report”.  If the same report were in a directory called “foo” then the ReportName string would look like this: “/foo/”Test Report”.

Hopefully this example will help everyone use this method for interacting with Reporting Services at ETL time.

Good Luck!


BI Studio Bug?

February 22, 2007

I have just come across a bug in BI Studio that has shook me to the core.  It involves the copy and paste of packages within a project.  Let me describe how to reproduce:

Here is the setup:

First, open a brand new IS project.  Then go into the package called “Package.dtsx” that is created by default.  Add a dataflow task.  Inside of the dataflow task, create a flat file source and change it to be fixed length.  Then go and add a column – any column…you can just click the add button and then OK.  We just need to have some column definition in the metadata.  Now go and get a row count task and place it on the data flow area and connect data flow from the flat file source to the row count task.  You can add a variable to the package and assign it to the row count if you like.  To reproduce this bug you don’t even need to do that.

Now we just need to save the package.

Next go into the solution explorer and select the package you just edited and copy it to the clipboard and then paste it back into the project thereby making a copy of the package.  The new one is probably called “Package 1.dtsx.”

OK – here comes the bug.  Go into Package.dtsx and add a data viewer in between the flat file source and the row count.  Now save the file.  Now go look at what happened to your copy.

IT EDITED THEM BOTH!!!!!

This even happens if the copy of the package is closed – not even open in BI Studio.

I do a lot of copy/paste of packages to reduce the amount of repetitive work that I have to do.  I have to admit that this shakes my confidence to the core about this copy/paste functionality.  What else have I been editing in my other packages that were created from copy/paste?  Oh man.

So, does anyone know if this has been identified by Microsoft as a known issue?


Merge Join Task Sort Order Weirdness Workaround

December 4, 2006

In an earlier post (Merge Join Task Complete Cop Out) I lamented about how if you changed the sort order in the data flow earlier in the data flow that a merge join task completely flipped out and would even lock down to where you could not edit it.

I stand by my statement that the dialog box that is presented is a complete cop-out – but now I have found a way to keep it from happening.

I figured out that what it was complaining about whs the output of the Merge Join Task.  When you change the sort order and double clicked on the task, it would ask you if it should change to use the new sort order.  All well and good.  You click ok and get out and it still had a red X on it.  You double click and it would then tell you it couldn’t be edited any more because of the limitations of the dialog box.  Well – when you change the sort order, the automatic process that fixes the join key and such doesn’t fix the sort order of the Merge Join Task Output.

If you will delete the output to the Merge Join before you start the process, you are presented with the same set of dialogs, except you can then join the output of the Merge Join to the next task thereby resetting its metadata and all is cool.

Mark


Merge Join Task Complete Cop-Out

November 13, 2006

I was working on a package just now that included a merge join task that ended up having the incorrect join defined.  The source for this join is a set of flat files defined as Flat File Source tasks.  To change the definition of the sort order of the flat files, you go into the advanced editor and set the sort key position to the new sort order.  I did this.

Afterword, I went into the merge join task it asked me if I would like to re-define the join keys according to the new sort order that I defined in the Flat File Sources.  I chose OK.  I looked at the new join information it it looked ok.  I clicked OK to save the merge join task.

After saving, I still had a red X on the merge join task.  To investigate, I clicked on the task to edit and it says that there isn’t a monotonically increasing set of sort order keys defined.  (Something I can fix)  But the next sentence absolutely in-excusable.  It says:

Due to limitations of the Advanced Editor dialog box, this component cannot be edited using this dialog box.

ARE YOU SERIOUS?  How am I supposed to fix the problem?  Are you guys good enough to find a way to open the dialog even with this problem so that I can take a look at it and fix it?

I know this is version one of this product, but how can you possibly leave this task in this state?  Obviously no one at on the development team used this tool enough to come across this problem.  If they had, they would have fixed it immedately.

WHAT A COMPLETE COP-OUT!

I wonder if it will be fixed in SP2?

Mark