Ordering Data Inside ETL

April 27, 2010

Ever wondered if it mattered what order your data was in inside an ETL load?

I hadn’t really considered it until this morning.  I have already considered and used the technique of dropping indexes during ETL and then creating them afterwords.  But this was new.

I had a load that took 50 seconds.  It had a clustered index as well as a unique index that was used to enforce a primary key.  The ETL load came in in no order.

The first thing I tried was to order the ETL in ascending order – which is the same as the order of the clustered index – to see if it mattered.  I suspected it would and it did.  the time went from 50 s. to 33 s.  This was good – nearly half the time just by ordering the data.  I wondered if ordering the data backwards from the clustered index would help or hurt, and to my surprise, it helped and lowered the time further to 28 s.

Now I wondered how much lower I could go by dropping the indexes before ETL and the rebuilding them afterwords.

I re-ran the non-ordered ETL using the drop and rebuild method and got 28 s.  So if all I had done was just drop and rebuild indexes, I would have gone from 50 to 28 s.  Not bad.  But when I ordered the load descending it dropped even further to 14 s.  The last test of course was the order it back to ascending and see which ordering worked best for the drop and rebuild method.  It turned out that ordering ascending worked better and dropped the time down to 13 s.

This was a surprise to me.  With the indexes built, the opposite order to the clustered index was better.  When you drop and reload, the same order was better.  I think the moral of the story isn’t specifically whether ascending or descending is better – I suppose it could be different in other cases.  Mostly just remember that the order can really matter and when you are trying to squeeze every ounce of performance out of ETL, check out the order of your ETL. 

Anyone on the SQL CAT team want to comment on why backwards works better when indexes are left intact?


SharePoint 2010 Prerequisite Install

April 26, 2010

I was trying to install SharePoint 2010 (final version) today on Windows Server 2008 (no luck with R2) and I had to try to intsall prerequisites twice before everything would install. If you have trouble, keep trying. 🙂


Report Data Window Disappears

February 26, 2010

Ever lost your Report Data window that shows Datasets in Reporting Services 2008?

I just did and there wasn’t a menu item on the View menu in VS 2008 to bring it up.  There is a KB article that fixes a bug with this, but I wanted to add there is a keyboard shortcut you may not know.

Ctrl-Alt-D brings it up.

Keyboard shortcuts are always good.


Deploying to a Brand New Reporting Services 2008 Install

October 30, 2009

Having problems deploying to a brand new Reporting Services 2008 install?

There are two kinds of permissions in Reporting Services – Server Level and Item Level.  To get to the Server Level permissions, in Report Manager, go to home, and then click on Site Settings in the upper right hand corner, then click the Security menu item on the left.  If you click, New Role Assignment, Notice there are only two roles here.  You can make yourself an administrator here – although local Administrators is automatically added.  This may be enough.  FYI – it may be a good idea to take the local administrators out anyways so that hardware administrators don’t get administrative rights to the Reporting Services Server inadvertently.

But before you can deploy a Reporting Services project to the server, you have to do one more thing.  You need to have item level permissions as well.  To add yourself, go to Home and then click Properties – the blue tab next to Contents.  If you click New Role Assignment, you’ll see item level roles.  You’ll need to be at least Publisher, but you’ll probably just want to be Content Editor.  Publisher can’t modify folder structure, Content Editor can.

After you add yourself, you should be good to go.

Update: Here is a link to more info on adding Item Level Permissions http://msdn.microsoft.com/en-us/library/aa337471(lightweight).aspx


Microsoft’s Gartner Position

October 22, 2009

Here is an aggregation of all the market research about Microsoft Products.  After having looked at them, Microsoft is in the Leader’s Quadrant in nearly all reports.  Impressive.


Unknown Member in Analysis Services

October 22, 2009

Analysis Services has a built in member for each dimension called (by default) “Unknown.”  This is to simplify the process of dealing with facts that have the property of Unknown for a particular dimension.  If a dimension member comes to the fact table after failing a lookup in the SSIS package and contains a null for the surrogate key, Analysis Services assigns it to this special Unknown Member and moves forward.

There are three steps to this situation.  First, the dimension itself has a property called UnknownMember that describes the usage of this unknown member.  It can be set to Visible, Hidden, or None.  Next, the dimension member set with the usage of Key has a property called NullProcessing that is set to UnknownMember.  This tells the dimension what to do in case of coming across a null surrogate key in the fact table.  Third, in the Dimension Usage screen of the cube, for each dimension in use, there is a setting under Advanced that once again describes NullProcessing.  This also can be set to describe behavior when processing a null dimension surrogate key.  Here is a link to a description of all the options.  This is a reference to Analysis Services Scripting, but it was the only place I could find these options described.  http://msdn.microsoft.com/en-us/library/ms127041.aspx

I think that this unknown member is a very convenient inclusion by the Analysis Services team, but I think I’ll pass on using it.  There is some syntactic sugar in MDX that allows the usage of a member called UNKNOWNMEMBER that seems nice, but what this scenario does not allow is an unknown member in the relational store.  If you don’t ever plan on querying the relational store, then the only place you will need an unknown member will be Analysis Services.  You can then pass unknown members to the fact table as null and allow AS to process accordingly.

I like to leave the relational store in as query-able state as possible.  Report writers might later have a reason to use it and having null in the fact table for surrogate keys will cause problems.  Report writers will have to use LEFT JOIN and then derive an unknown member at query time.

I think in this situation, creating an unknown member in the dimension with a surrogate key of 1, 0, or -1 (a special number of your choosing) is a good solution.

You’ll have to go and turn off the unknown member in the dimension, change NullProcessing in the key attribute for the dimension and change NullProcessing in the dimension usage of the cube to enable it.  But I think you’ll find that this is a good compromise when the relational store needs to be as query-able as possible.


PerformancePoint Monitoring Authentication

October 16, 2009

While deploying Monitoring Server, I was having trouble viewing dashboards from SharePoint, but preview was working fine.  I referenced Nick and Adrian’s book and it suggested using the same identity for the SharePoint application pool for the credentials for the Monitoring Server application pool.  I’m working in an environment where the SharePoint service accounts are already deployed and Monitoring Server is coming in later.  The account names already in use for the SharePoint application pool wouldn’t make sense.

On page 241 on Nick and Adrian’s book, there is an awesome diagram of data/security flow for rendering a dashboard.  (Buy the book – it’s great!)

Just don’t forget that for a preview of the dashboard, it will use the application pool identity of the Monitoring Server, but when you render on SharePoint, it will use the credentials of the application pool for SharePoint.  If they are two different accounts, you’ll need to add them both with read permission to your data sources.

If you care to look and you are using SQL Server or Analysis Server as your data source, fire up Profiler and watch.  You’ll see two different accounts.