Introduction
I recently had a project that required me to find the number of employees who were scheduled for every 15 minute increment, for every business day for a week, and for many stores. Yikes! Sure, I could use tons of loops and temp tables to do this, but I’m all about set based queries whenever possible…and I knew this was possible and I love a challenge!
It finally occurred to me that I could use the self-referencing (recursive) capabilities of Common Table Expressions (introduced in SQL 2005) to the mimic loops I needed to accomplish my goal.
I took some of the recursive CTE queries that I came up with for my project and turned them into 4 examples. It was quite a bit of info, so I’ll be spreading them across 2 posts.
What is a Common Table Expression (CTE)?
Books Online defines a CTE as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
Here’s a very basic example of a CTE.
USE AdventureWorks GO ;WITH Employees (EmployeeID, JobTitle) AS ( SELECT EmployeeID, Title FROM HumanResources.Employee ) SELECT * FROM Employees
As you can see in the example, I preceded the CTE with an semicolon. The reason for this is because if a CTE is not the first statement in a batch, the previous statement must always be terminated with a semicolon, and you will receive one of two errors.
Incorrect syntax near the keyword ‘with’. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.
or
Incorrect syntax near ‘Employees’.
In my example, the semicolon is not needed since GO starts a new batch and the CTE is the first statement in the batch, I always make a habit of starting them off with a semicolon in case I ever add a preceding statement.
Recursive Queries
One of the most powerful features of common table expressions is their ability to be self-referenced, allowing a recursive query. The most common use of a recursive query in a CTE is to create a hierarchy. For example, if you had an Employee table that had a column for the EmployeeID and another column for the ManagerEmployeeID, you could use a self-referencing CTE to build a hierarchical result set of an organizational chart.
Srinivas Sampath wrote a good article on building a hierarchy using a CTE on sqlservercentral.com, so I’m not going to go into much detail here. Instead, I figured I would show you other ways that a recursive query can be used within a CTE.
Example 1 – Finding missing numbers in a sequence
In this example, I’ll show you how to use a CTE to find missing numbers (gaps) in a range of sequential numbers. First, I’m going to create a temporary table and add some numbers to it.
CREATE TABLE #Numbers (Num INT) INSERT INTO #Numbers VALUES (1) INSERT INTO #Numbers VALUES (3) INSERT INTO #Numbers VALUES (6) INSERT INTO #Numbers VALUES (10)
Now, I can use a CTE to find the missing sequential numbers.
;WITH CTE_MissingNumbers (MissingNum) AS ( SELECT n1.Num + 1 FROM #Numbers n1 WHERE NOT EXISTS (SELECT NULL FROM #Numbers n2 WHERE n2.Num = n1.Num + 1) AND n1.Num < (SELECT MAX(Num) FROM #Numbers) UNION ALL SELECT cte.MissingNum + 1 FROM CTE_MissingNumbers cte WHERE NOT EXISTS (SELECT NULL FROM #Numbers n WHERE n.Num = cte.MissingNum + 1) ) SELECT MissingNum FROM CTE_MissingNumbers ORDER BY MissingNum
The first SELECT statement in the CTE will only return the first missing number in each gap, so in this case, it will return the numbers 2, 4, and 7.
The second SELECT statement in the CTE will return the remaining missing numbers, if any, in each gap, so in this case, it will return the numbers 5, 8, and 9. It does this by continually referencing itself and reevaluating the gaps. As the more numbers are returned, the gaps get smaller and smaller until all numbers are returned. Think of the CTE as a WHILE loop that runs the first statement in the CTE and missing numbers back into the table. The loop then would continue over and over until the SELECT statement no longer returned any results.
Example 2 – Finding missing dates in a sequence
This example takes the same concept from example 1 and applies it to dates instead of numbers. First, I’m going to create a temporary table and add some dates to it.
CREATE TABLE #Dates (Date SMALLDATETIME)
INSERT INTO #Dates VALUES ('1/1/2009')
INSERT INTO #Dates VALUES ('1/3/2009')
INSERT INTO #Dates VALUES ('1/6/2009')
INSERT INTO #Dates VALUES ('1/10/2009')
Now, I can use a CTE to find the missing sequential dates.
;WITH CTE_MissingDates (MissingDate) AS ( SELECT DATEADD(DAY,1,d1.Date) FROM #Dates d1 WHERE NOT EXISTS (SELECT NULL FROM #Dates d2 WHERE d2.Date = DATEADD(DAY,1,d1.Date)) AND d1.Date < (SELECT MAX(Date) FROM #Dates) UNION ALL SELECT DATEADD(DAY,1,cte.MissingDate) FROM CTE_MissingDates cte WHERE NOT EXISTS (SELECT NULL FROM #Dates d WHERE d.Date = DATEADD(DAY,1,cte.MissingDate)) ) SELECT MissingDate FROM CTE_MissingDates ORDER BY MissingDate
The only thing I really had to change to return missing dates instead of numbers was adding the DATEADD function. This will find all of the missing days, but for this to work, all of the time values in the Date column must all be the same. If your date values are not the same, you can strip off the time values as shown in red below:
TRUNCATE TABLE #Dates
INSERT INTO #Dates VALUES ('1/1/2009 5:00 PM')
INSERT INTO #Dates VALUES ('1/3/2009 3:00 PM')
INSERT INTO #Dates VALUES ('1/6/2009 1:00 PM')
INSERT INTO #Dates VALUES ('1/10/2009 2:00 PM')
;WITH CTE_MissingDates (MissingDate) AS
(
SELECT DATEADD(DAY,1,DATEDIFF(DAY,0,d1.Date))
FROM #Dates d1
WHERE NOT EXISTS
(
SELECT NULL
FROM #Dates d2
WHERE DATEADD(DAY,0,DATEDIFF(DAY,0,d2.Date)) = DATEADD(DAY,1,DATEDIFF(DAY,0,d1.Date))
)
AND d1.Date < (SELECT MAX(Date) FROM #Dates)
UNION ALL SELECT DATEADD(DAY,1,cte.MissingDate)
FROM CTE_MissingDates cte
WHERE NOT EXISTS
(
SELECT NULL
FROM #Dates d
WHERE DATEADD(DAY,0,DATEDIFF(DAY,0,d.Date)) = DATEADD(DAY,1,cte.MissingDate)
)
)
SELECT MissingDate
FROM CTE_MissingDates
ORDER BY MissingDate
Conclusion
Here are two examples of using common table expressions to find missing numbers and dates in a sequence. In my next post, I’ll show you a few more examples of recursive queries with common table expressions based off of the examples in this post.
Part 2 of my posts on Recursive Common Table Expressions. Includes examples for building ranges of numbers or datetime values.
Pingback by Recursive Common Table Expressions – Part 2 « Aaron Akin — January 21, 2009 @ 7:48 pm |
Using a larger dataset of numbers than your example, I received this error message:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
To overcome this, add OPTION (MAXRECURSION 0) to the end of the function.
Comment by EdG — February 15, 2009 @ 9:30 pm |
EdG,
You’re absolutely correct. I had been working with many small datasets and completely forgot to mention the default maximum recursion limit. Thanks for the heads up.
Comment by Aaron Akin — February 16, 2009 @ 9:30 am |
Why not generate a sequence of dates using common table expressions or a loop encapsulated in a multi-statement table-valued function and do then a left join to your production table like in http://sql-troubles.blogspot.com/2010/02/more-date-related-functionality-part-ii.html?
I think this approach allows you more flexibility and decreases query’s complexity.
Comment by Adrian — February 21, 2010 @ 11:06 am |