Clause Ordering (so far)

  1. SELECT
  2. FROM
  3. WHERE
  4. GROUP BY
  5. HAVING
  6. ORDER BY

Foundational Concepts

Select

  1. You may do operations within this statement e.g. SELECT total_sales - cogs AS total_profit
  2. You may define new column names
  3. You may use aggregate functions e.g. SELECT COUNT(product_id)
  • * selects all columns
    SELECT column1 (, column2, ...) FROM ... ;
    SELECT column1 (, column2, ...) FROM ... WHERE condition;

Where

Validates a condition. Condition can look like stars < 4, with stars being a column name.

Logic and Helpers

AND / OR / NOT
BETWEEN value1 (incl.) AND value2 (incl.)
col_name IN set

  • Validates on any row value in col_name being a member of the specified set
  • Shortens a series of OR to just col_name IN ('Biogen', 'Bayer') LIKE …`
  • Wildcard % can represent zero or multiple characters
    • Obtain names with “relief” somewhere in the name can use '%Relief%'
    • 'ke%y' stars with ke and ends with y
  • Wildcard _ can represent a single character
    ORDER BY col_name DESC (, col_name ASC/DESC, ...)
  • By default, it is ASC ascending
  • You can order by multiple columns
  • Rather than writing out the column name, you can use a number based on the degree of the column in the SELECT statement, e.g. ORDER BY 1,3 DESC
    LIMIT num - Only show num queries
  • Can be useful to grab “top 5” things that have been ordered
    OFFSET num - Disregard num queries

Time to get a little more… Intermediate

Dear god SQL syntax seems horribly inconsistent

Aggregation

SUM/MIN/MAX/AVG/COUNT - Performs aggregate function across entire column

  • Usage: SUM(col_name)

Group by

Tells database to separate data into groups which can be aggregated independently. Used for identical data and turns it into one based on the aggregate function(s) used. Note: Typically used with aggregate functions.

SELECT category, SUM(spend) 
FROM product_spend 
GROUP BY category;
categorysum
electronics1007.54
appliance1135.22

Having

Like WHERE, but allows you to use aggregate functions like AVG

What’s the difference?

WHEREHAVING
When It FiltersValues BEFORE GroupingValues AFTER Grouping
Operates On Data FromIndividual RowsAggregated Values from Groups of Rows
ExampleSELECT username, followers
FROM instagram_data
WHERE followers > 1000;
SELECT country
FROM instagram_data
GROUP BY country
HAVING AVG(followers) > 100;

Distinct

Used with SELECT to remove duplicates. Comparable to GROUP BY in the sense that it removes duplicates.
Usage: SELECT DISTINCT col (, col, ...) FROM ...

Can also be used inside of an aggregate function, but pretty much only COUNT as the rest have no meaning for removing duplicates.
Usage: SELECT COUNT(DISTINCT user_id) FROM trades

Arithmetic

SQL supports:

  • Exponentiation with ^
  • ABS(num)
  • ROUND(num, decimal places),
  • CEIL(num) and FLOOR(num)
  • Often redundant due to operators: POWER(num, exp), MOD(num, num)

SQL does integer division by default. If one of the operands includes a float, it will perform float division.
3 ways to convert integer division to float division

  1. CAST(num AS DECIMAL) or CAST(num AS FLOAT)
  2. Multiply by 1.0
  3. Cast with :: e.g. 10::FLOAT/6

Data Types:

  • NUMERIC/DECIMAL(they are functionally the same) like integers, do not store a decimal place. These are exact.
  • REAL(single)/FLOAT(double) like floating points. These are approximate.

NULL…ness

Important Note: NULL is not a value, but a result
In other words, you may not check with val = NULL

Identify values with IS NULL and IS NOT NULL

COALESCE(val, val, ...) will return the first non-null value from the inputs. Inputs may be column, expression, etc. This is great for providing a default value in case of no value found, such as 0.
Example: COALESCE(column, expression) will returns expression if column is null.

IFNULL(val, backup), is a simplified version COALESCE with only two parameters. The first is the original value, unless it is found to be null where it will then be replaced by the second parameter value.

Joins

Still do not understand the different types :/
Use JOIN after FROM to combine two tables like so SELECT * FROM artists JOIN songs

ON establishes a join condition which is great for tables that are related in some way(s).
Usage: SELECT * FROM artists JOIN songs ON artists.artist_id = songs.artist_id
Note: The columns do NOT have to be the same name.

Types of SQL joins:

  1. Inner Join - Returns only rows with matching values from both tables. NOTE: JOIN is short for INNER JOIN
  2. Left Join - Returns all rows from left table and matching rows from right table.
  3. Right Join - Returns all rows from right table and matching rows from left table.`
  4. Full Outer Join - Returns all rows where there is a match in either the left or right table. If there is no match, NULL values are returns for columns from the table without a match.

Date Functions

CURRENT_DATE - Returns today’s date, e.g. 08/27/2023 00:00:00
CURRENT_TIME - Returns today’s time, e.g. 07:35:15.989933+00
CURRENT_TIMESTAMP or NOW() - Return both date and time, e.g. 08/27/2023 07:35:15

You may use comparison operators (>= or =) for dates.
You may also use MIN/MAX on dates as well. Minimum implies oldest.

Extracting Parts from Dates
EXTRACT() usage: EXTRACT(YEAR FROM sent_date) AS extracted_year
OR equivalently
DATE_PART()usage: DATE_PART('year', sent_date) AS part_year

Truncating Dates
DATE_TRUNC() rounds down a date or timestamp to a specified unit of time. In other words, it trims the finer details into default values and retains the specified unit. Usage: DATE_TRUNC('month', sent_date) AS truncated_to_month

SentTruncated MonthTruncated DayTruncated Hour
08/03/2022 16:43:0008/01/2022 00:00:0008/03/2022 00:00:0008/03/2022 16:00:00
Date Arithmetic
Using INTERVAL we can add and subtract time units to a date.
Examples: sent_date + INTERVAL '2 days' AS add_2days OR sent_date - INTERVAL '10 minutes' AS minus_10mins

Formatting Dates (as Strings)
Example Usage: TO_CHAR(sent_date, 'YYYY-MM-DD HH:MI:SS')

Formatting Codes:

CodeExample
YYYY2023
MM08
MonthAugust
MonAug
DD27
DDth27th
DaySaturday
HH2414
HH02
MI30
SS00
AMPM
Casting Dates
::DATE or TO_DATE() - Convert strings into dates.
- `sent_date::DATE`
- `TO_DATE('2023-08-27', 'YYYY-MM-DD')`

::TIMESTAMP or TO_TIMESTAMP() - Convert strings into timestamps.
- sent_date::TIMESTAMP
- TO_TIMESTAMP('2023-08-27 10:30:00', 'YYYY-MM-DD HH:MI:SS')

Now Advanced

I can begin to appreciate certain aspects of SQL now. But syntax is still silly…

CTEs/Subqueries

They’re like functions!

What is a CTE?
A Common Table Expression. It’s a temporary table that stores an intermediate result of a larger operations. It can be done using WITH.

-- Start of a CTE
WITH genre_revenue_cte AS (
  SELECT
    genre,
    SUM(concert_revenue) AS total_revenue
  FROM concerts
  GROUP BY genre
)
-- End of a CTE
 
SELECT
  g.genre,
  g.total_revenue,
  AVG(c.concert_revenue) AS avg_concert_revenue
FROM genre_revenue_cte AS g
INNER JOIN concerts AS c 
  ON g.genre = c.genre
WHERE c.concert_revenue > g.total_revenue * 0.5
GROUP BY g.genre, g.total_revenue;

Advantages of using a CTE:

  1. Breaks down complex queries
  2. Can be reused to prevent redundant calculations
  3. Can be used for recursive queries, like traversing hierarchal data

What is a subquery?
Known as inner queries, they embed one query inside of another using parenthesis to generate a temporary table.

Advantages of using a subquery:

  1. Single value comparison in where clauses
SELECT artist_name
FROM concerts
WHERE concert_revenue > (SELECT AVG(concert_revenue) FROM concerts);
  1. Column creation and aggregation, subqueries are often used with the IN, NOT IN, and EXISTS operators
SELECT 
  artist_name, 
  genre, 
  concert_revenue,
  (SELECT AVG(concert_revenue) FROM concerts) AS avg_concert_revenue,
  (SELECT MAX(concert_revenue) FROM concerts) AS max_concert_revenue
FROM concerts;
  1. Correlated subqueries allow you to use information from both the inner and outer query
SELECT 
  artist_name, 
  genre, 
  concert_revenue
FROM concerts AS c1
WHERE concert_revenue = (
  SELECT MAX(concert_revenue)
  FROM concerts AS c2
  WHERE c1.genre = c2.genre
);