SQL Refresher Cheat Sheet
This cheat sheet is designed to help you refresh your SQL knowledge ahead of your upcoming Breakout Academy course, so you can hit the ground running on day one.
All queries in this document can be run in your dedicated learning environment, which has been prepared and shared with you.
The examples are based on a fictional e-commerce company and use three tables: orders_de, customers, and deliveries.
Let's get familiar with the 3 tables you will be working with.
| Column | Type | Description |
|---|---|---|
| id | integer | Unique order ID |
| customer_id | integer | Links to customers.id |
| total_amount | decimal | Order value in dollars |
| status | text | 'delivered', 'ordered', 'pending', 'cancelled' |
| created_at | timestamp | When the order was placed |
| Column | Type | Description |
|---|---|---|
| id | integer | Unique customer ID |
| first_name | text | Customer's first name |
| last_name | text | Customer's last name |
| text | Email address | |
| created_at | timestamp | When they signed up |
| Column | Type | Description |
|---|---|---|
| id | integer | Unique delivery ID |
| order_id | integer | Links to orders_de.id |
| status | text | 'delivered', 'picked_up', 'cancelled' |
| picked_up_at | timestamp | When it was picked up |
| delivered_at | timestamp | When it was delivered |
The foundation of every query. Pick which columns you want and which table they come from.
SELECT * FROM raw.ecomm.orders_de
SELECT
id,
customer_id,
total_amount,
status
FROM raw.ecomm.orders_de
SELECT * in production - name columns explicitly. Makes queries easier to read and less likely to break when tables change.Filters rows before any grouping happens. Only rows matching your condition are returned.
WHERE status = 'delivered'
WHERE total_amount > 100
WHERE status != 'cancelled'
WHERE total_amount BETWEEN 50 AND 200
WHERE first_name LIKE 'J%'
WHERE email IS NULL
WHERE status IN ('delivered','ordered','pending')
Combine conditions
WHERE status = 'delivered' AND total_amount > 100
| Operator | Meaning |
|---|---|
| AND | Both conditions must be true |
| OR | Either condition can be true |
| NOT | Reverses the condition |
| Operator | Meaning |
|---|---|
| = | Equal to |
| != | Not equal to |
| > / < | Greater / less than |
| >= / <= | Greater or less than or equal |
| IN | Matches a list of values |
| BETWEEN | Inclusive range |
| LIKE | Pattern match (used together with "%") |
The % wildcard symbol matches any sequence of characters. e.g. LIKE 'J%' matches any value starting with J. | |
| IS NULL | Missing / empty value |
-- Highest first (DESC = descending)
ORDER BY total_amount DESC
-- Oldest first (ASC = ascending, default)
ORDER BY created_at
-- Sort by multiple columns
ORDER BY created_at DESC,
total_amount DESC
-- Top 10 most recent orders SELECT id, customer_id, total_amount FROM raw.ecomm.orders_de ORDER BY created_at DESC LIMIT 10
ORDER BY + LIMIT = "show me the latest N records."Combines rows from two tables based on a shared column. e.g. orders_de.customer_id links to customers.id.
SELECT raw.ecomm.orders_de.id,
raw.ecomm.orders_de.total_amount,
raw.ecomm.customers.first_name,
raw.ecomm.customers.last_name
FROM raw.ecomm.orders_de
INNER JOIN raw.ecomm.customers
ON raw.ecomm.orders_de.customer_id = raw.ecomm.customers.id
WHERE raw.ecomm.orders_de.status = 'delivered'
ORDER BY raw.ecomm.orders_de.total_amount DESC
LEFT JOIN - all left rows, NULLs if no match
SELECT raw.ecomm.orders_de.id,
raw.ecomm.orders_de.status,
raw.ecomm.deliveries.status AS delivery_status,
raw.ecomm.deliveries.picked_up_at,
raw.ecomm.deliveries.delivered_at
FROM raw.ecomm.orders_de
LEFT JOIN raw.ecomm.deliveries
ON raw.ecomm.deliveries.order_id = raw.ecomm.orders_de.id
| Type | Returns |
|---|---|
| INNER JOIN | Only matching rows in both tables |
| LEFT JOIN | All left rows + matches from right |
| RIGHT JOIN | All right rows + matches from left |
| FULL JOIN | All rows from both tables |
ON defines the join condition. WHERE filters the result after joining.Summarize data - totals, counts, averages - instead of returning row-by-row detail.
| Function | Does |
|---|---|
| COUNT(*) | Count all rows |
| COUNT(col) | Count non-NULL values |
| SUM(col) | Add up values |
| AVG(col) | Calculate average |
| MIN(col) | Lowest value |
| MAX(col) | Highest value |
SELECT status, COUNT(*) AS order_count, SUM(total_amount) AS total_revenue FROM raw.ecomm.orders_de GROUP BY status ORDER BY total_revenue DESC
Groups all rows in orders_de by their status value. For each status group, it counts how many orders exist and adds up the total revenue. The result is sorted so the highest-revenue status appears first - giving you a quick breakdown of performance by order status.
CASE works like an if/then statement inside a query. It checks conditions in order and returns the first match. If nothing matches, the ELSE value is returned.
CASE - if/then logicSELECT id,
CASE
WHEN total_amount >= 50
THEN 'High'
WHEN total_amount >= 30
THEN 'Medium'
ELSE 'Low'
END AS order_tier
FROM raw.ecomm.orders_de
order_tier.Returns the first non-NULL value from a list. It is essentially a shorter way of writing a CASE WHEN that checks for NULL.
SELECT
customers.last_name,
customers.first_name,
orders_de.status
FROM raw.ecomm.customers
LEFT JOIN raw.ecomm.orders_de
ON customers.id = orders_de.customer_id
| last_name | first_name | status |
|---|---|---|
| Scroyton | Lotte | NULL |
| Nevins | Ava | delivered |
| Bonham | Mada | pending |
SELECT
customers.last_name,
customers.first_name,
COALESCE(orders_de.status, 'no order') AS order_status
FROM raw.ecomm.customers
LEFT JOIN raw.ecomm.orders_de
ON customers.id = orders_de.customer_id
| last_name | first_name | order_status |
|---|---|---|
| Scroyton | Lotte | no order |
| Nevins | Ava | delivered |
| Bonham | Mada | pending |
orders_de.status would be NULL. COALESCE replaces that NULL with 'no order' so the result is always readable.CASE WHEN orders_de.status IS NOT NULL
THEN orders_de.status
ELSE 'no order'
END
COALESCE when you just need a fallback for NULL values. Use CASE WHEN when you need more complex conditional logic.A CTE lets you define a temporary named result set at the top of your query using WITH. It makes complex queries easier to read and reason about - think of it as giving a subquery a name and moving it out of the way.
WITH delivered_orders AS ( SELECT id, customer_id, total_amount FROM raw.ecomm.orders_de WHERE status = 'delivered' ) SELECT * FROM delivered_orders WHERE total_amount > 20Multiple CTEs
WITH delivered_orders AS (
SELECT id, customer_id, total_amount
FROM raw.ecomm.orders_de
WHERE status = 'delivered'
),
top_customers AS (
SELECT customer_id,
SUM(total_amount) AS total_spent
FROM delivered_orders
GROUP BY customer_id
)
SELECT customers.first_name,
customers.last_name,
top_customers.total_spent
FROM top_customers
INNER JOIN raw.ecomm.customers
ON top_customers.customer_id = customers.id
ORDER BY top_customers.total_spent DESC
WITHSELECT comes after all CTEs, WITHOUT a preceding commaWITH customer_orders AS (
SELECT
orders_de.customer_id,
COUNT(orders_de.id) AS order_count,
SUM(orders_de.total_amount) AS total_spent,
MAX(orders_de.created_at) AS last_order_date
FROM raw.ecomm.orders_de
WHERE orders_de.status IN ('delivered', 'ordered', 'pending')
GROUP BY orders_de.customer_id
),
delivery_summary AS (
SELECT
orders_de.customer_id,
COUNT(deliveries.id) AS total_deliveries,
SUM(CASE WHEN deliveries.status = 'delivered'
THEN 1 ELSE 0 END) AS successful_deliveries
FROM raw.ecomm.orders_de
LEFT JOIN raw.ecomm.deliveries
ON deliveries.order_id = orders_de.id
GROUP BY orders_de.customer_id
)
SELECT
customers.first_name,
customers.last_name,
COALESCE(customer_orders.order_count, 0) AS order_count,
COALESCE(customer_orders.total_spent, 0) AS total_spent,
customer_orders.last_order_date,
COALESCE(delivery_summary.total_deliveries, 0) AS total_deliveries,
COALESCE(delivery_summary.successful_deliveries, 0) AS successful_deliveries
FROM raw.ecomm.customers
LEFT JOIN customer_orders
ON customers.id = customer_orders.customer_id
LEFT JOIN delivery_summary
ON customers.id = delivery_summary.customer_id
ORDER BY customer_orders.total_spent DESC
LIMIT 20
| Section | What happens |
|---|---|
| CTE 1 | Filters active orders, groups by customer, calculates order count, total spend and last order date |
| CTE 2 | Counts total and successful deliveries per customer |
| LEFT JOINs | Joins all customers to their order and delivery summaries |
| COALESCE | Replaces NULL with 0 for customers with no orders or deliveries |
| ORDER BY | Highest spenders first |
| LIMIT | Top 20 only |
This is the order you write a SQL query as a developer:
You now have the required SQL foundation for your Breakout Academy course. You're ready to jump in, happy learning!
