Ever needed to get the most recent 2 transactions for every customer in a list using a T-SQL statement? This sounds like an easy task, but upon further examination the query is a little harder than it seems. One solution, a very ugly solution I might add, would be to cursor through all the customers and run a select statement with an order by clause to get the most recent transactions. I bet there are other possible solutions, but they all seem to take multiple passes. A lot of people hate continued posts, but this one is kind of long so I’m going to do just that. Sorry for the click-through. I’m not making money on ad pageviews!
Maybe everybody else has already figured this out, but I finally came up with what I think is a very elegant solution. If you use the RANK() function and ROW_NUMBER() function, it gets pretty easy. First, let’s start with an example that is in BOL. This query retrieves sales numbers for customers by zip code and demonstrates all the different kind of ranking functions SQL Server has. I have taken out a couple of columns for clarity. I didn’t need them for this demonstration. If you’d like to see the original, please click here.
SELECT c.FirstName, c.LastName
,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS RowNumber
,RANK() OVER (ORDER BY a.PostalCode) AS Rank
,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Contact c
ON s.SalesPersonID = c.ContactID
INNER JOIN Person.Address a
ON a.AddressID = c.ContactID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
Here are the results:
| FirstName | LastName | RowNumber | Rank | SalesYTD | PostalCode |
| Maciej | Dusza | 1 | 1 | $4,557,045.05 | 98027 |
| Shelley | Dyck | 2 | 1 | $5,200,475.23 | 98027 |
| Linda | Ecoffey | 3 | 1 | $3,857,163.63 | 98027 |
| Carla | Eldridge | 4 | 1 | $1,764,938.99 | 98027 |
| Carol | Elliott | 5 | 1 | $2,811,012.72 | 98027 |
| Jauna | Elson | 6 | 6 | $3,018,725.49 | 98055 |
| Michael | Emanuel | 7 | 6 | $3,189,356.25 | 98055 |
| Terry | Eminhizer | 8 | 6 | $3,587,378.43 | 98055 |
| Gail | Erickson | 9 | 6 | $5,015,682.38 | 98055 |
| Mark | Erickson | 10 | 6 | $3,827,950.24 | 98055 |
| Martha | Espinoza | 11 | 6 | $1,931,620.18 | 98055 |
| Janeth | Esteves | 12 | 6 | $2,241,204.04 | 98055 |
| Twanna | Evans | 13 | 6 | $1,758,385.93 | 98055 |
I have modified the query so that I can get the customers with the two highest sales number for a zip code:
SELECT * FROM
(
SELECT c.FirstName, c.LastName
,ROW_NUMBER() OVER (ORDER BY a.PostalCode, s.SalesYTD desc) AS RowNumber
,RANK() OVER (ORDER BY a.PostalCode) AS Rank
,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Contact c
ON s.SalesPersonID = c.ContactID
INNER JOIN Person.Address a
ON a.AddressID = c.ContactID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
) ss
You’ll see here that I’ve added a couple of things. First I added another sort item to the RowNumber column. This changes the RowNumber column so that the order is now not just by PostalCode, it is also by SalesYTD descending. The seond thing I added was an surrounding SELECT statement so that I could reference the RowNumber and Rank columns in the WHERE clause. I’m not sure why, but you can’t reference ROW_NUMBER() or RANK() in the where clause. This fixes that. I have also used a Common Table Expression to accomplish the same thing. Use whatever you like. Here are the results:
| FirstName | LastName | RowNumber | Rank | SalesYTD | PostalCode |
| Shelley | Dyck | 1 | 1 | $5,200,475.23 | 98027 |
| Maciej | Dusza | 2 | 1 | $4,557,045.05 | 98027 |
| Linda | Ecoffey | 3 | 1 | $3,857,163.63 | 98027 |
| Carol | Elliott | 4 | 1 | $2,811,012.72 | 98027 |
| Carla | Eldridge | 5 | 1 | $1,764,938.99 | 98027 |
| Gail | Erickson | 6 | 6 | $5,015,682.38 | 98055 |
| Mark | Erickson | 7 | 6 | $3,827,950.24 | 98055 |
| Terry | Eminhizer | 8 | 6 | $3,587,378.43 | 98055 |
| Michael | Emanuel | 9 | 6 | $3,189,356.25 | 98055 |
| Jauna | Elson | 10 | 6 | $3,018,725.49 | 98055 |
| Janeth | Esteves | 11 | 6 | $2,241,204.04 | 98055 |
| Martha | Espinoza | 12 | 6 | $1,931,620.18 | 98055 |
| Twanna | Evans | 13 | 6 | $1,758,385.93 | 98055 |
You’ll see now that we have instructed the ROW_NUMBER() function now to order by PostalCode as well as SalesYTD, the solution takes shape. We see that if we restrict the WHERE clause to only include rows where RowNumber is equal to or only one greater than Rank, we have our solution. Here is the final SQL statement:
SELECT * FROM
(
SELECT c.FirstName, c.LastName
,ROW_NUMBER() OVER (ORDER BY a.PostalCode, s.SalesYTD desc) AS RowNumber
,RANK() OVER (ORDER BY a.PostalCode) AS Rank
,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Contact c
ON s.SalesPersonID = c.ContactID
INNER JOIN Person.Address a
ON a.AddressID = c.ContactID
WHERE TerritoryID IS NOT NULL
AND SalesYTD <> 0
) ss
WHERE ss.RowNumber = ss.Rank or ss.RowNumber = ss.Rank + 1
And here are the final results:
| FirstName | LastName | RowNumber | Rank | SalesYTD | PostalCode |
| Shelley | Dyck | 1 | 1 | $5,200,475.23 | 98027 |
| Maciej | Dusza | 2 | 1 | $4,557,045.05 | 98027 |
| Gail | Erickson | 6 | 6 | $5,015,682.38 | 98055 |
| Mark | Erickson | 7 | 6 | $3,827,950.24 | 98055 |
These two functions put together allow us to make this kind of request a very easy task. It is also a batch process that SQL Server can handle with efficiency.


February 13, 2008 at 3:03 pm |
Hi Mark -
Thanks for posting on this subject. I agree that it is a tricky problem and like you, I prefer more elegant single-statement approach. I think the solution you suggest is a great example of how to use new syntax found in SS2005, though it might be a little more power then needed. When faced with this exact issue a while back, and I actually found the answer in an MS Access KB article! I basically followed the article’s instructions, lifted the SQL generated, and tidied up a bit. Here it is for the Northwind database, but you can apply it to any situation once you see the form. The subquery is what limits the list to N choices per group, and its where clause keeps it lined up with the categories in the main query. The main query controls the categories and products you want to select. In this example, the IN clause returns a list of quantities as though they were IDs, but it could just as easily return ProductIDs and you could set any criteria and sort you choose in the subquery. I’m pasting the code snippet I keep with me all the time because it is such a twisty problem, no one should have to solve it from scratch every time! Thanks for your post!
–Query to select Top N Values Per Group
–from: http://support.microsoft.com/kb/q153747/
use northwind
go
–picks the top 3 products by quantity for each category
SELECT c.CategoryName, p.ProductName, p.UnitsInStock
FROM Categories c
INNER JOIN Products p ON c.CategoryID = p.CategoryID
WHERE p.UnitsInStock IN (
SELECT TOP 3 [UnitsInStock]
FROM Products
WHERE [CategoryID] = c.[CategoryID]
ORDER BY [UnitsInStock] DESC
)
ORDER BY c.CategoryName, p.UnitsInStock DESC
February 13, 2008 at 3:12 pm |
You absolutely can do it that way. Don’t forget that for each category you have to perform a sub-query to join to to fulfill the request. Each one of those sub-selects will have to go and get all of the records in the Products table that are from that category, sort them, and then take the top three. If you have a small database like Northwind, it works great. But if you have tables of a few hundred million records each, the time it takes to perform the sub-select and sort those results a hundred million times is going to take a lot longer than if you perform one single batch operation on the large table. The way to complete the request in the smallest amount of time will be to perform one singe query that does all the sorting and filtering all at once on the database instead of a few hundred million sub-queries.
December 8, 2008 at 9:44 pm |
Hi Mark,
Great stuff! I wanted to make it dynamic in terms of how many from each group to return…top 50, top 10 or whatever so I altered the last line to read:
WHERE ss.RowNumber = ss.Rank or ss.RowNumber < ss.Rank + @howMany
Works like a charm.
Many Thanks!
December 17, 2008 at 5:08 pm |
Sweet!
This is exactly what we needed. Worked like a charm.
Thanks!
March 10, 2009 at 11:29 am |
This has helped me multiple times. Thanks!
April 6, 2009 at 2:23 pm |
Mark,
You rule. Thank you for publishing your solution. I was trying to limit previews of artwork in our online artist’s gallery to four pieces per artist even if they had more approved to show in the database. Your solution, plus the additional comment from Eric Desch, worked perfectly. I never would have solved this so easily on my own. Many thanks, again!
April 6, 2009 at 9:09 pm |
Here was my final SQL code:
SELECT tblGallery.ArtistID, tblGallery.Name, tblGallery.ArtworkID, tblGallery.Title, tblGallery.Preview FROM (
SELECT tblArtists.ArtistID, tblArtists.Name, tblArtwork.ArtworkID, tblArtwork.Title, tblArtists.ArtFolder + ‘/’ + tblArtwork.Preview AS Preview, RANK() OVER (ORDER BY tblArtists.ArtistID) AS ‘iRANK’, ROW_NUMBER() OVER(ORDER BY tblArtists.ArtistID, tblArtwork.ArtworkID DESC) AS ‘iROW_NUMBER’
FROM tblArtists INNER JOIN tblArtwork ON tblArtists.ArtistID = tblArtwork.ArtistID
WHERE (tblArtwork.GalleryChoice = 1) AND (tblArtists.Active = 1) AND (tblArtwork.Active = 1)) AS tblGallery
WHERE tblGallery.iROW_NUMBER = tblGallery.iRANK or tblGallery.iROW_NUMBER < tblGallery.iRANK + 4
ORDER BY tblGallery.Name, tblGallery.Title
August 12, 2009 at 12:07 pm |
select COL1, COL2
from
(select COL1, COL2, ROW_NUMBER() OVER (PARTITION BY COL1 ORDER BY COL2) as ROWER
from TABLE
–WHERE BLAH BLAH
) xxx
where ROWER < @NUM_ROWS_PER_GROUP+1