Clause Ordering (so far)
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
Foundational Concepts
Select
- You may do operations within this statement e.g.
SELECT total_sales - cogs AS total_profit
- You may define new column names
- You may use aggregate functions e.g.
SELECT COUNT(product_id)
*
selects all columns
SELECT column1 (, column2, ...) FROM ... ;
SELECT column1 (, column2, ...) FROM ... WHERE condition;
Where
Validates a condition. Condition can look like stars < 4
, with stars being a column name.
Logic and Helpers
AND
/ OR
/ NOT
BETWEEN value1 (incl.) AND value2 (incl.)
col_name IN set
- Validates on any row value in col_name being a member of the specified set
- Shortens a series of
OR
to justcol_name IN ('Biogen', 'Bayer')
LIKE …` - Wildcard
%
can represent zero or multiple characters- Obtain names with “relief” somewhere in the name can use
'%Relief%'
'ke%y'
stars withke
and ends withy
- Obtain names with “relief” somewhere in the name can use
- Wildcard
_
can represent a single character
ORDER BY col_name DESC (, col_name ASC/DESC, ...)
- By default, it is
ASC
ascending - You can order by multiple columns
- Rather than writing out the column name, you can use a number based on the degree of the column in the
SELECT
statement, e.g.ORDER BY 1,3 DESC
LIMIT num
- Only shownum
queries
- Can be useful to grab “top 5” things that have been ordered
OFFSET num
- Disregardnum
queries
Time to get a little more… Intermediate
Dear god SQL syntax seems horribly inconsistent
Aggregation
SUM
/MIN
/MAX
/AVG
/COUNT
- Performs aggregate function across entire column
- Usage:
SUM(col_name)
Group by
Tells database to separate data into groups which can be aggregated independently. Used for identical data and turns it into one based on the aggregate function(s) used. Note: Typically used with aggregate functions.
SELECT category, SUM(spend)
FROM product_spend
GROUP BY category;
category | sum |
---|---|
electronics | 1007.54 |
appliance | 1135.22 |
Having
Like WHERE
, but allows you to use aggregate functions like AVG
What’s the difference?
WHERE | HAVING | |
---|---|---|
When It Filters | Values BEFORE Grouping | Values AFTER Grouping |
Operates On Data From | Individual Rows | Aggregated Values from Groups of Rows |
Example | SELECT username, followers FROM instagram_data WHERE followers > 1000; | SELECT country FROM instagram_data GROUP BY country HAVING AVG(followers) > 100; |
Distinct
Used with SELECT
to remove duplicates. Comparable to GROUP BY
in the sense that it removes duplicates.
Usage: SELECT DISTINCT col (, col, ...) FROM ...
Can also be used inside of an aggregate function, but pretty much only COUNT
as the rest have no meaning for removing duplicates.
Usage: SELECT COUNT(DISTINCT user_id) FROM trades
Arithmetic
SQL supports:
- Exponentiation with
^
ABS(num)
ROUND(num, decimal places)
,CEIL(num)
andFLOOR(num)
- Often redundant due to operators:
POWER(num, exp)
,MOD(num, num)
SQL does integer division by default. If one of the operands includes a float, it will perform float division.
3 ways to convert integer division to float division
CAST(num AS DECIMAL)
orCAST(num AS FLOAT)
- Multiply by 1.0
- Cast with
::
e.g.10::FLOAT/6
Data Types:
NUMERIC
/DECIMAL
(they are functionally the same) like integers, do not store a decimal place. These are exact.REAL
(single)/FLOAT
(double) like floating points. These are approximate.
NULL…ness
Important Note: NULL
is not a value, but a result
In other words, you may not check with val = NULL
Identify values with IS NULL
and IS NOT NULL
COALESCE(val, val, ...)
will return the first non-null value from the inputs. Inputs may be column, expression, etc. This is great for providing a default value in case of no value found, such as 0
.
Example: COALESCE(column, expression)
will returns expression
if column
is null.
IFNULL(val, backup)
, is a simplified version COALESCE
with only two parameters. The first is the original value, unless it is found to be null where it will then be replaced by the second parameter value.
Joins
Still do not understand the different types :/
Use JOIN
after FROM
to combine two tables like so SELECT * FROM artists JOIN songs
ON
establishes a join condition which is great for tables that are related in some way(s).
Usage: SELECT * FROM artists JOIN songs ON artists.artist_id = songs.artist_id
Note: The columns do NOT have to be the same name.
Types of SQL joins:
- Inner Join - Returns only rows with matching values from both tables. NOTE:
JOIN
is short forINNER JOIN
- Left Join - Returns all rows from left table and matching rows from right table.
- Right Join - Returns all rows from right table and matching rows from left table.`
- Full Outer Join - Returns all rows where there is a match in either the left or right table. If there is no match, NULL values are returns for columns from the table without a match.
Date Functions
CURRENT_DATE
- Returns today’s date, e.g. 08/27/2023 00:00:00
CURRENT_TIME
- Returns today’s time, e.g. 07:35:15.989933+00
CURRENT_TIMESTAMP
or NOW()
- Return both date and time, e.g. 08/27/2023 07:35:15
You may use comparison operators (>=
or =
) for dates.
You may also use MIN
/MAX
on dates as well. Minimum implies oldest.
Extracting Parts from Dates
EXTRACT()
usage: EXTRACT(YEAR FROM sent_date) AS extracted_year
OR equivalently
DATE_PART()
usage: DATE_PART('year', sent_date) AS part_year
Truncating Dates
DATE_TRUNC()
rounds down a date or timestamp to a specified unit of time. In other words, it trims the finer details into default values and retains the specified unit. Usage: DATE_TRUNC('month', sent_date) AS truncated_to_month
Sent | Truncated Month | Truncated Day | Truncated Hour |
---|---|---|---|
08/03/2022 16:43:00 | 08/01/2022 00:00:00 | 08/03/2022 00:00:00 | 08/03/2022 16:00:00 |
Date Arithmetic | |||
Using INTERVAL we can add and subtract time units to a date. | |||
Examples: sent_date + INTERVAL '2 days' AS add_2days OR sent_date - INTERVAL '10 minutes' AS minus_10mins |
Formatting Dates (as Strings)
Example Usage: TO_CHAR(sent_date, 'YYYY-MM-DD HH:MI:SS')
Formatting Codes:
Code | Example |
---|---|
YYYY | 2023 |
MM | 08 |
Month | August |
Mon | Aug |
DD | 27 |
DDth | 27th |
Day | Saturday |
HH24 | 14 |
HH | 02 |
MI | 30 |
SS | 00 |
AM | PM |
Casting Dates | |
::DATE or TO_DATE() - Convert strings into dates. |
- `sent_date::DATE`
- `TO_DATE('2023-08-27', 'YYYY-MM-DD')`
::TIMESTAMP
or TO_TIMESTAMP()
- Convert strings into timestamps.
- sent_date::TIMESTAMP
- TO_TIMESTAMP('2023-08-27 10:30:00', 'YYYY-MM-DD HH:MI:SS')
Now Advanced
I can begin to appreciate certain aspects of SQL now. But syntax is still silly…
CTEs/Subqueries
They’re like functions!
What is a CTE?
A Common Table Expression. It’s a temporary table that stores an intermediate result of a larger operations. It can be done using WITH
.
-- Start of a CTE
WITH genre_revenue_cte AS (
SELECT
genre,
SUM(concert_revenue) AS total_revenue
FROM concerts
GROUP BY genre
)
-- End of a CTE
SELECT
g.genre,
g.total_revenue,
AVG(c.concert_revenue) AS avg_concert_revenue
FROM genre_revenue_cte AS g
INNER JOIN concerts AS c
ON g.genre = c.genre
WHERE c.concert_revenue > g.total_revenue * 0.5
GROUP BY g.genre, g.total_revenue;
Advantages of using a CTE:
- Breaks down complex queries
- Can be reused to prevent redundant calculations
- Can be used for recursive queries, like traversing hierarchal data
What is a subquery?
Known as inner queries, they embed one query inside of another using parenthesis to generate a temporary table.
Advantages of using a subquery:
- Single value comparison in where clauses
SELECT artist_name
FROM concerts
WHERE concert_revenue > (SELECT AVG(concert_revenue) FROM concerts);
- Column creation and aggregation, subqueries are often used with the
IN
,NOT IN
, andEXISTS
operators
SELECT
artist_name,
genre,
concert_revenue,
(SELECT AVG(concert_revenue) FROM concerts) AS avg_concert_revenue,
(SELECT MAX(concert_revenue) FROM concerts) AS max_concert_revenue
FROM concerts;
- Correlated subqueries allow you to use information from both the inner and outer query
SELECT
artist_name,
genre,
concert_revenue
FROM concerts AS c1
WHERE concert_revenue = (
SELECT MAX(concert_revenue)
FROM concerts AS c2
WHERE c1.genre = c2.genre
);