What Is a CTE?
CTE is defined as a Common Table Expression. You create a CTE using a WITH query, then reference it within a SELECT, INSERT, UPDATE, or DELETE statement.
It goes by a few names:
- Common Table Expression
- Subquery Factoring
- SQL WITH Clause
Let’s say you have a table called schools
with the columns school_id
, school_name
, district_id
, and the number of students. You need to write a query to display a list of schools alongside their district ID and the average number of students per school in that district.
Your logic might be as follows:
- Create a table with the list of districts and the corresponding average number of students per school.
- Join this table with the list of schools and display the required information.
- Drop the table with the average number of students per school for each district.
If you use a CTE, you don’t need to create and drop a table. You can simply reference the temporary result set created by the WITH query, like you see below:
So, here you start by defining the temporary result set avg_students
in the WITH query. In the brackets, there is a SELECT statement that defines this result set; it contains a list of districts and the corresponding average number of students per school. After the closing parenthesis, you start the main SELECT statement. Notice that you reference the temporary result set just like an ordinary table, using its assigned name (avg_students
). The output will include the school name, the ID of its district, and the average number of students in this district.
CTEs in Action
Let’s start by exploring the data. Suppose you are a data analyst for a retail bank and you want to analyze the bonuses given to employees last month. The following table is the starting point:
Bonus_jan
Now let’s say you want to see the bonus paid to each employee along with the average bonus for their position. To this end, you first need to calculate the average bonus for every position. This can be done in a temporary result set (a CTE). The whole query will look like this:
As you can see, we start by defining the temporary result set avg_position
. Then there’s the main SELECT statement, where you join the bonus_jan
table with the temporary result set avg_position
to display information about each employee, their bonus, and the average bonus for that position:
Why Use a CTE?
Now, you may have noticed that in most cases, you could use one or more subqueries instead of CTEs. So why use a CTE?
- Common Table Expressions better organize long queries. Multiple subqueries often look messy.
- CTEs also make a query more readable, as you have a name for each of the Common Table Expressions used in a query.
- CTEs organize the query so that it better reflects human logic. With CTEs, you start by defining the temporary result set(s) and then refer to it/them in the main query. With subqueries, you start with the main query and then place the subqueries in the middle of the query.
- Finally, there is also a specific category of CTEs called recursive CTEs that are allowed to reference themselves. These CTEs can solve problems that cannot be addressed with other queries. Recursive queries are especially useful in working with hierarchical data.