Aaron Akin

January 20, 2009

Recursive Common Table Expressions – Part 1

Filed under: CTE,SQL,SQL 2005,TSQL — Aaron Akin @ 6:43 pm

Introduction

I recently had a project that required me to find the number of employees who were scheduled for every 15 minute increment, for every business day for a week, and for many stores.  Yikes!  Sure, I could use tons of loops and temp tables to do this, but I’m all about set based queries whenever possible…and I knew this was possible and I love a challenge!

It finally occurred to me that I could use the self-referencing (recursive) capabilities of Common Table Expressions (introduced in SQL 2005) to the mimic loops I needed to accomplish my goal.

I took some of the recursive CTE queries that I came up with for my project and turned them into 4 examples.  It was quite a bit of info, so I’ll be spreading them across 2 posts.


What is a Common Table Expression (CTE)?

Books Online defines a CTE as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Here’s a very basic example of a CTE.

USE AdventureWorks

GO

;WITH Employees (EmployeeID, JobTitle) AS
(
    SELECT EmployeeID, Title
      FROM HumanResources.Employee
)
SELECT *
  FROM Employees

As you can see in the example, I preceded the CTE with an semicolon.  The reason for this is because if a CTE is not the first statement in a batch, the previous statement must always be terminated with a semicolon, and you will receive one of two errors.

Incorrect syntax near the keyword ‘with’. If this statement is a common table expression or an xmlnamespaces clause, the previous statement must be terminated with a semicolon.

or

Incorrect syntax near ‘Employees’.

In my example, the semicolon is not needed since GO starts a new batch and the CTE is the first statement in the batch, I always make a habit of starting them off with a semicolon in case I ever add a preceding statement.


Recursive Queries

One of the most powerful features of common table expressions is their ability to be self-referenced, allowing a recursive query.  The most common use of a recursive query in a CTE is to create a hierarchy.  For example, if you had an Employee table that had a column for the EmployeeID and another column for the ManagerEmployeeID, you could use a self-referencing CTE to build a hierarchical result set of an organizational chart.

Srinivas Sampath wrote a good article on building a hierarchy using a CTE on sqlservercentral.com, so I’m not going to go into much detail here.  Instead, I figured I would show you other ways that a recursive query can be used within a CTE.


Example 1 – Finding missing numbers in a sequence

In this example, I’ll show you how to use a CTE to find missing numbers (gaps) in a range of sequential numbers.  First, I’m going to create a temporary table and add some numbers to it.

CREATE TABLE #Numbers (Num INT)


INSERT INTO #Numbers VALUES (1)
INSERT INTO #Numbers VALUES (3)
INSERT INTO #Numbers VALUES (6)
INSERT INTO #Numbers VALUES (10)

Now, I can use a CTE to find the missing sequential numbers.

;WITH CTE_MissingNumbers (MissingNum) AS

(
     SELECT n1.Num + 1
       FROM #Numbers n1
       WHERE NOT EXISTS (SELECT NULL FROM #Numbers n2 WHERE n2.Num = n1.Num + 1)
         AND n1.Num < (SELECT MAX(Num) FROM #Numbers)
     UNION ALL
     SELECT cte.MissingNum + 1
       FROM CTE_MissingNumbers cte
       WHERE NOT EXISTS (SELECT NULL FROM #Numbers n WHERE n.Num = cte.MissingNum + 1)
)
SELECT MissingNum
  FROM CTE_MissingNumbers
  ORDER BY MissingNum

The first SELECT statement in the CTE will only return the first missing number in each gap, so in this case, it will return the numbers 2, 4, and 7.

The second SELECT statement in the CTE will return the remaining missing numbers, if any, in each gap, so in this case, it will return the numbers 5, 8, and 9.  It does this by continually referencing itself and reevaluating the gaps.  As the more numbers are returned, the gaps get smaller and smaller until all numbers are returned.  Think of the CTE as a WHILE loop that runs the first statement in the CTE and missing numbers back into the table.  The loop then would continue over and over until the SELECT statement no longer returned any results.


Example 2 – Finding missing dates in a sequence

This example takes the same concept from example 1 and applies it to dates instead of numbers.  First, I’m going to create a temporary table and add some dates to it.

CREATE TABLE #Dates (Date SMALLDATETIME) 


INSERT INTO #Dates VALUES ('1/1/2009')
INSERT INTO #Dates VALUES ('1/3/2009')
INSERT INTO #Dates VALUES ('1/6/2009')
INSERT INTO #Dates VALUES ('1/10/2009')

Now, I can use a CTE to find the missing sequential dates.

;WITH CTE_MissingDates (MissingDate) AS

(
    SELECT DATEADD(DAY,1,d1.Date)
      FROM #Dates d1
      WHERE NOT EXISTS (SELECT NULL FROM #Dates d2 WHERE d2.Date = DATEADD(DAY,1,d1.Date))
        AND d1.Date < (SELECT MAX(Date) FROM #Dates)
    UNION ALL
    SELECT DATEADD(DAY,1,cte.MissingDate)
      FROM CTE_MissingDates cte
      WHERE NOT EXISTS (SELECT NULL FROM #Dates d WHERE d.Date = DATEADD(DAY,1,cte.MissingDate))
)
SELECT MissingDate
  FROM CTE_MissingDates
  ORDER BY MissingDate

The only thing I really had to change to return missing dates instead of numbers was adding the DATEADD function.  This will find all of the missing days, but for this to work, all of the time values in the Date column must all be the same.  If your date values are not the same, you can strip off the time values as shown in red below:

TRUNCATE TABLE #Dates


INSERT INTO #Dates VALUES ('1/1/2009 5:00 PM')
INSERT INTO #Dates VALUES ('1/3/2009 3:00 PM')
INSERT INTO #Dates VALUES ('1/6/2009 1:00 PM')
INSERT INTO #Dates VALUES ('1/10/2009 2:00 PM')

;WITH CTE_MissingDates (MissingDate) AS
(
     SELECT DATEADD(DAY,1,DATEDIFF(DAY,0,d1.Date))
       FROM #Dates d1
       WHERE NOT EXISTS
             (
             SELECT NULL
               FROM #Dates d2
               WHERE DATEADD(DAY,0,DATEDIFF(DAY,0,d2.Date)) = DATEADD(DAY,1,DATEDIFF(DAY,0,d1.Date))
             )
         AND d1.Date < (SELECT MAX(Date) FROM #Dates)
     UNION ALL     SELECT DATEADD(DAY,1,cte.MissingDate)
       FROM CTE_MissingDates cte
       WHERE NOT EXISTS
             (
             SELECT NULL
               FROM #Dates d
               WHERE DATEADD(DAY,0,DATEDIFF(DAY,0,d.Date)) = DATEADD(DAY,1,cte.MissingDate)
             )
)
SELECT MissingDate
  FROM CTE_MissingDates
  ORDER BY MissingDate



Conclusion

Here are two examples of using common table expressions to find missing numbers and dates in a sequence.  In my next post, I’ll show you a few more examples of recursive queries with common table expressions based off of the examples in this post.

Advertisements

5 Comments »

  1. Part 2 of my posts on Recursive Common Table Expressions. Includes examples for building ranges of numbers or datetime values.

    Pingback by Recursive Common Table Expressions – Part 2 « Aaron Akin — January 21, 2009 @ 7:48 pm | Reply

  2. Using a larger dataset of numbers than your example, I received this error message:

    Msg 530, Level 16, State 1, Line 1
    The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

    To overcome this, add OPTION (MAXRECURSION 0) to the end of the function.

    Comment by EdG — February 15, 2009 @ 9:30 pm | Reply

    • EdG,

      You’re absolutely correct. I had been working with many small datasets and completely forgot to mention the default maximum recursion limit. Thanks for the heads up.

      Comment by Aaron Akin — February 16, 2009 @ 9:30 am | Reply

  3. Why not generate a sequence of dates using common table expressions or a loop encapsulated in a multi-statement table-valued function and do then a left join to your production table like in http://sql-troubles.blogspot.com/2010/02/more-date-related-functionality-part-ii.html?
    I think this approach allows you more flexibility and decreases query’s complexity.

    Comment by Adrian — February 21, 2010 @ 11:06 am | Reply

  4. not so helpful.

    Comment by Anonymous — March 31, 2012 @ 10:21 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: