Avoiding the Annoying in The File Connection Manager

May 19, 2008

Let’s say you want to delete a file as a part of your SSIS package and let’s say that the file is created during the executions of the package.  Simple – you use the File System Task.  You just add the task to the package and create a file connection manager to point to the file…only the file doesn’t exist yet.  The file connection manager editor won’t let you leave the file name pointing to a file that doesn’t exist.

Simple fix – All you have to do is use the properties window in SSIS instead of the dialog that comes up when you double click the connection manager.  Doesn’t look like DelayValidation is required here.


Duplicate ErrorCode Columns

May 2, 2008

A colleague of mine was selecting data from a table that contained a column called ErrorCode.  The OLE DB Source in SSIS was giving an error that said:

The package contains two objects with the duplicate name of “output column “ErrorCode” (13)” and “output column “ErrorCode” (279)”

She was having trouble figuring out why it was complaining since the table she was pulling from had only one column named “ErrorCode”.

If you remember, data transformations can add columns to the data flow that describe error conditions.  One of those columns is “ErrorCode”.  This was the problem.  SSIS wanted to reserve ErrorCode for the column that it adds and that conflicted with the column that was in the table.  I wasn’t able to make this happen when I added a column called “ErrorCode” to a select statement – only when it was in the table from which the statment is selected.

We went to the source table and changed the name from ErrorCode to something else.  This fixed the problem.  The other way to fix it is to use the AS clause in SQL to change the name in the source query.  Changing the output name in the Columns tab in the OLE DB Source didn’t fix the problem so you’ll have to go a little deeper to resolve the conflict.

Mark


Avoiding the Union All

April 9, 2008

LineageIDs are assigned to any column being passed through a dataflow.  This is the id that SSIS uses to track where a column is sourced.  When you add an OLE DB Source to your data flow, every column that enters the data flow there is assigned a LineageID there.  New columns can be created and LineageIDs assigned at other places as well.

For instance, the Union all.  SSIS treats the output columns of a Union All as new columns in a dataflow and assigns new LineageIDs.  This is good because any colunm going through a Union All is technically not any of the input columns – it is a new column of the combined inputs.

A common use of the Union All is to bring rows back together after a Lookup task to lookup dimension keys where rows that didn’t find a lookup match are directed to the error output.  These error output rows are sent to a derived colunm task to have the dimension key set to the ‘NA’ member or whatever the ‘Unknown’ dimension member is.

This is all fine and dandy until you have quite a few Union All tasks in the data flow.  I found that once I had 20 or so in the flow, the performance of BIDS hit the floor.  It seemed as if BIDS couldn’t deal with all of the LineageIDs – or maybe more secifically all of the columns derived in the package.  Now instead of haveing 200 columns, we had 2000 to 3000 unique LineageIDs.

With suggestion of a colleague of mine, Yuri Bykov, we took out all of the Union All statements and instead set the error row output to ignore on the Lookup.  That flows right to a derived column task that check to see if the column that the lookup task added to the data flow is null.  If it is null and the column flowing through that was the column used to look up into the dimension, we know that the lookup failed and we can set the dimension key to ‘Unknown’ or whichever member it needs to be.

This really improved the performance of BIDS and I think even the performance of the package.


OLE DB Source Metadata Refresh

April 4, 2008

There is a particular behavior on the OLE DB Source that I’d like to highlight.  I have had the need to change the data type of a column in a source query recently and found that the OLE DB Source doesn’t flush and refresh its metadata.

Let’s say I have a varchar 5 field and it really is suposed to be a dollar amount.  I could go in to the package and add a Data Conversion task – which is a good option, but also I could go in to my source query and change it there by performing a cast([my_varchar_column] as money).  What I have found is that you have to comment out the column and click OK to get out of the OLE DB Source dialog, then go back and un-comment the column so that it is back in the metadata collection with the new data type.

I know that performing cast operations in a souce query may not be a best practice, but I had a table with a few hundred columns and a whole bunch of them needed to have their data type changed.  I could either write a source query (using the catalog) that had all the right data types built in, or build a set of data conversion tasks (like maybe 50) that do the same thing.  In the interest of time, I did the former.


Yea for the scroll bar!

February 25, 2008

One of the things that irked me about Integration Services (BIDS) is that the Execute SQL Task didn’t have a scroll bar for the SQL Text.  If you had a long piece of SQL, the only way to scroll was to use your cursor and arrow keys.  How annoying.

This has been fixed in 2008 (Katmai).

Yea!

SQL Task Scroll Bar