LineageIDs are assigned to any column being passed through a dataflow. This is the id that SSIS uses to track where a column is sourced. When you add an OLE DB Source to your data flow, every column that enters the data flow there is assigned a LineageID there. New columns can be created and LineageIDs assigned at other places as well.
For instance, the Union all. SSIS treats the output columns of a Union All as new columns in a dataflow and assigns new LineageIDs. This is good because any colunm going through a Union All is technically not any of the input columns – it is a new column of the combined inputs.
A common use of the Union All is to bring rows back together after a Lookup task to lookup dimension keys where rows that didn’t find a lookup match are directed to the error output. These error output rows are sent to a derived colunm task to have the dimension key set to the ‘NA’ member or whatever the ‘Unknown’ dimension member is.
This is all fine and dandy until you have quite a few Union All tasks in the data flow. I found that once I had 20 or so in the flow, the performance of BIDS hit the floor. It seemed as if BIDS couldn’t deal with all of the LineageIDs – or maybe more secifically all of the columns derived in the package. Now instead of haveing 200 columns, we had 2000 to 3000 unique LineageIDs.
With suggestion of a colleague of mine, Yuri Bykov, we took out all of the Union All statements and instead set the error row output to ignore on the Lookup. That flows right to a derived column task that check to see if the column that the lookup task added to the data flow is null. If it is null and the column flowing through that was the column used to look up into the dimension, we know that the lookup failed and we can set the dimension key to ‘Unknown’ or whichever member it needs to be.
This really improved the performance of BIDS and I think even the performance of the package.