RDBMS Doomed?

February 13, 2009

Here is a great article from Read Write Web that discusses the place and uses for a Key/Value database such as SSDS or SimpleDB or Google App Engine.  Jason Massie argues that the DBA profession will level off in 5-7 years and then start to fall off in 10 because of the prevelence of cloud based Key/Value based databases.

During the last couple of releases of SQL Server, Microsoft has been focused on BI.  Why?  Everyone has lots of data.  Everyone knows that there is a lot of value in that data, but they need to get at it.  BI is one of the top agenda items for CIOs according to Gartner.

What does this have to do with SSDS and SimpleDB?

SSDS and SimpleDB have basically no reporting or analytics capabilities.  Microsoft has mentioned that the future for SSDS includes OLAP like capabilities.  Even if this was available today and shipped today, would companies be ready to move their entire data ecosystem to Microsoft, Amazon, or Google, re-write everything out of RDBMS and into Key/Value programming, then re-work ETL and data warhouse infrasturcture to work on a non-existant (so far, Microsoft says) architecture?

They sure aren’t going to move all of the OLTP systems they have to the cloud right now because the data is then trapped in a no-reporting storage mechanism.  Google App Engine only allows 1000 result objects per query.  Amazon only allows 5 second duration queries.  Can companies’ appetite for analytics be satiated by that?  I don’t think so.

This seems a very log way off.  Even if one of these Key/Value databases HAD reporting and analytics, it seems a minimum of 5 to 7 year for any sizable amount of migration to take place.  And this would be if there WERE analytics and reporting.  There aren’t.

Many companies have large amounts of data and money in their VSAM/COBOL systems that are still running fine that everyone said would have to have been replaced a decade ago.  Those systems aren’t going anywhere for quite a while and these systems need analytics.  These analytics will need to be done on-sight on RDBMS/OLAP platforms.

I suppose for a San Francisco-based startup-focused DBA, there might be a noticible movement to the cloud, but for the rest of the world, it is going to take longer than 5 years to notice a plateau in the need for RDBMS DBAs.

I will however conceede that at some point, there will be a draw to the cloud.  But there are many hurdles to overcome before that can happen in earnest.

One change might be a cloud hosted RDBMS.  Certainly there are some out there somewhere, but the hot topic right now is Key/Value and how that enables large distributed systems.  For a company looking to save money, a hosted RDBMS makes sense, but probably not SSDS and Key/Value.

Advertisements

Float Data Types

January 29, 2009

This is a really big distinction but is only mentioned in passing in BOL (check here), but approximate numerics are just that:  approximate.  Someone who understands the storage engine better can explain why a foat or real can’t store an exact number, but they can’t and don’t.  You might put 1.5 into it and later come to find that the value is 1.4999.

If you need an exact representation of a number that isn’t an integer, you need to use numeric or decimal.  These are functionally the same, so it doesn’t matter which you choose.


SELECT 1 VS Select *

February 7, 2008

Connor Cunningham has a great explanation of the difference between using SELECT 1 versus SELECT * in an EXISTS clause.  I’d like to add that I’m assuming that the same rule would apply to SELECT count(*) and SELECT count(1).

SQL Query Optimizer would expand * into the column list and then figure out that none of them are needed.  So theoretically, SELECT count(1) would be a little faster in putting the plan together, but would run at the same speed and have the same IOs as SELECT count(*).


Top Few per Group – Using T-SQL

February 2, 2008

Ever needed to get the most recent 2 transactions for every customer in a list using a T-SQL statement?  This sounds like an easy task, but upon further examination the query is a little harder than it seems.  One solution, a very ugly solution I might add, would be to cursor through all the customers and run a select statement with an order by clause to get the most recent transactions.  I bet there are other possible solutions, but they all seem to take multiple passes.  A lot of people hate continued posts, but this one is kind of long so I’m going to do just that.  Sorry for the click-through.  I’m not making money on ad pageviews!  🙂

Read the rest of this entry »


Katmai (SQL 2008) IntelliSense

February 2, 2008

I wanted to put up a sample of what IntelliSense looks like it Katmai:

IntelliSense

(Click on the image for full size)

This is a feature that we have all been hearing about and has finally arrived with Katmai.

I can see how this is going to be very helpful for developer productivity since developers will be prompted with the names of databases, tables, and columns without having to use the Object Explorer.

By the way – I’m using Windows Live Writer to compose this post.  I used it in the first version and this new version is very nice.  I think I’ll be using it from now on.  It facilitates creating more complex and better looking posts.  Very cool.


Carriage Return in T-SQL varchar

January 23, 2008

This turned out to be a dumb question – or rather it had a simple answer.  I wanted to insert a carriage return and line feed into a sql statement I was generating in T-SQL so that it was more readable.  My first question was which comes first, the chicken or the egg?  Just kidding – actually the question was, which comes first char(13) or char(10)?  Turns out, seems like char(13) comes first.  I’ve always been inside a programming language so you could use things like Environment.NewLine – so I hadn’t ever memorized it.  First question answered.

The second question was about actually getting the carriage return to show up in the text.  Turns out that you can’t put (or I couldn’t figure out a way) a carriage return into a statment that is SELECTed.  Rather you need to use the PRINT statment and then you’ll be good to go.

I never found anyone that really stated that you can’t do it in a SELECT and to do it in a PRINT.  So – I thought I’d just get it written down here – if not for my reference later!

Have a good day!

Mark


Most Recent in a List

December 6, 2007

Anyone ever been asked to pull a report of the most recent transaction for every customer?  Sounds pretty easy at first, but it turns out to be tricky.

My scenario wasn’t exactly that, but it was the same problem.

You can see right away that we need to perform some sort of a GROUP BY on CustomerID or whatever the business key is for customer.  What I came up with was a two part solution.  I was really under the gun so I didn’t have a chance to look for a more elegant solution – so you know how that goes.  Here is what I did:

I split the query up into two parts.  The first part being a select statement that selected the CustomerID and the MAX of the TransactionDate while having a GROUP BY clause on CustomerID.  Then I INNER JOINed to this statement with a statement like this:

SELECT * FROM Transaction t
INNER JOIN
(
SELECT CustomerID, MAX(TransactionDate) MaxTransactionDate
FROM Transaction
GROUP BY CustomerID
) ss
on t.CustomerID = ss.CustomerID
and t.TransactionDate = MaxTransactionDate

I suppose that this can call back two transactions for one customer if they have the same TransactionDate.  In my case, this was ok.

Does anyone see another more elegant way to do this with one statement instead of two?  Could you use a CTE?