CTEs: The RAM-Hogging Culprits in Your Queries

Last updated: August 3, 2024
CTEs: Addressing a Common SQL Misconception

Understanding CTEs and Their Impact

Common Table Expressions (CTEs) are widely used in SQL to simplify complex queries and enhance readability. However, there's a prevalent misconception about how they handle memory and execution. Let’s clarify this with an example and address some common misunderstandings.

A Sample Query Using a CTE


WITH AuthorStats AS (
    SELECT 
        author_id,
        COUNT(post_id) AS post_count,
        SUM(views) AS total_views
    FROM 
        blog_posts
    GROUP BY 
        author_id
)

SELECT
    top_posts.author_id AS top_author_by_posts,
    top_posts.post_count,
    top_views.author_id AS top_author_by_views,
    top_views.total_views
FROM
    (SELECT 
        author_id,
        post_count
    FROM 
        AuthorStats
    ORDER BY 
        post_count DESC
    LIMIT 5) AS top_posts

    CROSS JOIN

    (SELECT 
        author_id,
        total_views
    FROM 
        AuthorStats
    ORDER BY 
        total_views DESC
    LIMIT 5) AS top_views;
            

Hypothetical Scenario

Consider a scenario like the above query. We have:

What happens when the query executes?

The total time taken to execute is the sum of the CTE execution time and the two subqueries.
However, there's an important aspect called Processing Rate, which we can relate to RAM. This is a limited resource, and there is a cost associated with every unit of RAM.

Explaining for Better Understanding

Using a CTE:

Imagine you have 12 containers to process your data, and the data to be processed is 100 GB.
The query executed in step 1 needs to be stored somewhere, which is the RAM. It occupies space (let’s say 6 containers) and stays there until the query completes. This space remains occupied, whether it’s being actively used or not. Now, the remainder of the query has only 6 containers to process the rest of the data.

CTE Example

Using a Subquery Instead:

The query at any point has the entirety of 12 containers to process the data.
More containers mean faster execution—here, potentially 2x faster because the query processes in parallel. Imagine each container processes 10 GB. Even if the subquery portion needs to be executed twice, there's sufficient processing space, and once completed, the rest of the query has all 12 containers available.
This not only saves runtime but also optimizes query performance.

Best Practices

To optimize performance and manage memory effectively: