What is Recursive Common Table Expression?
I have a question for you. What do these three have in common: factorial, Russian doll, and Romanesco?
It's been a long time since I left middle school, so if you're struggling to recall what factorial is, like I was, check out math is fun. It's my go-to when I need to refresh any math topics I learned eons ago.
While we are on a side note, did you know that Romanesco is a cousin of broccoli? It's an alienish cool looking veg, huh? 🤔 I wonder if it tastes like its cousin.
Give up? These are all examples of something you can do recursively. Why am I talking about recursion? Recall in my previous blog, we looked at the basics of Common Table Expression (CTE). CTE, aka WITH statement, is a temporarily generated data table that SQL queries can refer to with other statements. There are two types: non-recursive and recursive. I focused on the non-recursive form last time. In this blog, we will look at the more difficult recursive type.
Recursive CTE is a very advanced SQL concept. It's a rather complex idea to get your head around. I've spent quite some time digesting and understanding it conceptually. I have little experience with it, but I know by understanding the theory well, it'll be less daunting when I must put it into practice. While researching recursive CTE, I learned that recursion is a big deal in computer science, not just in SQL. So, it's a concept worth investing the time to learn correctly.
Now, back to the topic of recursion as a type of CTE. We will cover the following:
- What is recursive CTE
- How to write recursive CTE
- Guide to using recursive CTE
What is recursive CTE?
Recursive CTEs are common table expressions that reference themselves, meaning it's a query that calls on itself. The CTE is repeatedly executed, returning subsets of data until a termination condition is met. People have likened CTE to Russian doll. Each time you open a doll, a smaller one will be inside. The process is repeated until the smallest doll is reached, at which point you stop.
From what I gleaned from the knowledgeable web, recursive CTE isn't used all that often. It is, however, usually applied to analyses of hierarchy data in SQL. What are hierarchy data? Think about an organization and its employees. Each employee has a manager. The manager also has a manager. The manager's manager can have a manager, and you get the gist. This employee hierarchy goes all the way up to the CEO.
How to write recursive CTE?
Let's look at the anatomy of a recursive CTE syntax. The general layout is like a non-recursive CTE. It is defined using WITH clause. A recursive CTE consists of three elements:
Anchor member: The anchor member is your base query. It is the starting point for the query. It provides the initial result set.
Recursive member: The recursive member follows the anchor query. It runs iteratively until a condition is met. The result set from each iteration is an input into the next.
Terminator: Terminator is the condition that stops the iteration. It is optional. In some situations, you don't need a terminator, like the employee-manager hierarchy, but in others, you do. You don't want your query to go ground and ground, chasing its tail forever, right!?!
Let's have a look at two queries we can use recursive CTE to solve:
- Display factorial 1 to 5
- Find the hierarchy of managers for a given employee
Query 1
Query: Display factorial 1 to 5
The anchor query holds factorial 1 and returns a value of 1. The recursive member then takes the value 1, adds 1 to the anchor query result set, and then returns the value for factorial 2. It iterates three more times until it reaches the termination clause, which is n <5.
On the left-hand side is the output.
Query 2
Query: Find the hierarchy of managers for a given employee (Claudia Avary)
The managers of the employee we want to find out is Claudia Avary. The task is to return Claudia's manager, Claudia's manager's manager... etc. The clause will terminate when we reach the CEO: Anna Thompson, because there is no manager above her.
Here is the employee table and org chart to help with the solution.
The anchor member query returns one record, which is Claudia Avary. The recursive member is slightly more challenging than example 1 because it contains a join clause. We need the join clause to look up the employee_id from the emp_hierarchy h table against the employee table e manager_id.
In this example, we added a column called lvl (level), so you can see the number of iterations.
Here is the output.
Guide to using recursive CTE
With recursive CTE, there are some limitations and conditions associated with CTE.
- There is a default limit to how many recursion steps. I've come across 100 as well as 200. If you need more recursion cycles than the default limit, you can increase it up to a maximum of 32767. You can increase the number of recursion with this line of code "OPTION(MAXRECURSION x)". Note x can be any value up to 32767.
- The following SQL statements are not allowed: GROUP BY, HAVING, LEFT JOIN, RIGHT JOIN, OUTER JOIN, SELECT DISTINCT, TOP
- The number of columns for the anchor and recursive member must be the same.
- The anchor and the recursive member data types must be the same.
Ok, that's all folks for recursive CTE. It's ok if you don't completely understand. I find with curlier concepts, such as recursive CTE, I often have to revisit the topic multiple times on different occasions to comprehend it fully. I find it helpful to talk through it out loud to myself to help the information sink in. Give it a go if you've never tried this approach before.