What is Common Table Expression and When to Use it?
My colleague, herein alias M, and I have been working on a project which involves creating a suite of automated dashboards and real-time KPI reporting. We are a duo team. He is the data guy, and I'm the design chick. We complement skills. We bounce ideas off and learn from each other. He mentors me on SQL, and I share my knowledge of visual design for effective dashboards with him.
This project requires many complex SQLs. When M gets stuck on data challenges, he discusses the problems with me. Having binged on SQL and database management tutorial videos and articles, I can support him by suggesting ideas he could apply or explore to solve those challenges. One of those ideas I suggested was CTE.
CTE stands for Common Table Expression; it is an advanced SQL feature. It is sometimes referred to as WITH statement. You'll see why that's the case later.
Once you have mastered the basics of SQL, such as selecting, ordering, filtering data, and joining tables, I highly recommend you start learning CTE. It can be beneficial. Ask M, and he'll tell you to "CTE everything" 😂
In this blog, I'm going to share with you the following:
• What is CTE
• Why use CTE
• How to write CTE
What is CTE
I'm not fond of technical definitions. I find them hard to understand. That said, I'll regurgitate the formal definition of CTE as you'll likely come across it and then share the lay version.
Technical definition: CTE is a temporary name result set created from a simple SELECT statement that can be used in a subsequent SELECT statement.
Layperson definition: CTE is a temporary data set returned by a query, which is then used by another query. CTE only exist when the query is run. It isn't stored anywhere and hence is temporary.
There are two types of CTEs: recursive and non-recursive.
Recursive CTEs are CTEs that reference themselves. A recursive query repeatedly runs on a subset of the data until a termination condition is met.
Non-recursive CTEs, as the name implies, don't use recursion. They don't reference themselves.
I find recursive CTE a difficult topic to grasp. I have some vague understanding. I'll park this topic for a future blog. Non-recursive CTEs, on the other hand, are much easier to understand, which is the type we'll be looking at in this blog.
Why use CTE
There are several reasons why people use CTE.
Managing Complex Queries
CTE improves the readability of your code. Instead of lumping all your query logic into one massive query, several CTEs break up the code into manageable blocks of code, whereby you get chunks of data you need and then combine them later in a final SELECT statement. Essentially, it helps separate query logic; this makes your code more understandable, making it easier to change the code or correct errors in the code.
Substitute for a view
View is another approach users can take to make complex queries more manageable. You can encapsulate all the queries and join logic into a view.
View is very similar to CTE. You can think of it as a long-lived version of CTE. View is a virtual table just like CTE but exists beyond the query's duration. M and I don't have permission to create database objects, so we can't make views. CTE is a good substitute.
Overcoming limitations or complex queries:
Some data outputs cannot be achieved due to limitations, such as hierarchical data, which can be achieved through recursion. Multi-level aggregation is another one in which CTE is a good candidate.
How to write CTE
CTE has two parts:
Part 1: defines the name of the CTE
Part 2: Query definition
As you can see, the syntax starts with a WITH statement. Hence, WITH queries/statement is a synonym for CTE.
Once you build your CTE, you can use it in a query, as if it were a table.
Let's have a look at an example. Here I have a sales table. The task is to compare the total sales of each order with the average order amount for each corresponding store.
We first create the CTE called avg_per_store. Using this CTE, we create a table that lists all stores and the average sales amount by store. Then in the main query, we select to display Order_ID, Store, and Sales, from the original sales table and avg_sales_for_store from the CTE defined earlier.
Here's the output of this query:
Did you know that we can also have multiple CTEs in one query? The syntax is straightforward. We use one WITH keyword and separate the CTEs with commas.
Let's have a look at an example. Let's look at an example using the same sales data. Apart from comparing the order amount order with the average order amount for each corresponding store, we also want to look at the average sale for each staff. To do this, we need two define two CTEs.
The first CTE is the same as the earlier example. The second CTE is the addition, and it's called avg_per_staff.
In the main query, we joined the two CTEs to the original sales table to get the following output:
I covered some of the basics of CTE. I explained what CTE is, its use and its syntax. With this blog, I hope to have given you an appreciation of what CTE is and why they are helpful. It is advantageous when working with large or complex tables that are resource intensive to query and downright confusing to read and debug. Now, it's over to you to explore more deeply or park it in your to-learn bank and pick it up when you're ready.