This is a review of past tutorials and basic SQL, before we get into creating tables and doing table joins.
Go ahead and download this starter database
Table of contents
- Tutorial review
- Exercises
- How many records in the database
- How many babies in the database
- How many unique names
- How many boy names
- How many names and babies, broken down by sex
- 5 popular names (i.e. most babies named)
- 5 most popular boy names
- 5 most popular names between the letters 'U' and 'Z'
- 5 most popular boy names that begin with a vowel
- 5 most popular girl names that don't end in a vowel
- Number of unique names with more than 7 letters
- Number of unique names shorter than 5 letters, and number of babies with those names
- 5 longest names
- 5 longest names with at least 1,000 babies
- 5 most popular starting letters, based on number of names
- 5 most popular starting letters, based on number of babies named
- 5 most popular names with exactly 6 characters that begin with the letters "Ja"
- 3 most popular names with at least 2 consecutive "r" letters
- Average length (unweighted) of names, boys versus girls, rounded to 2 decimal places
- Top 10 starting letter of names, sorted by longest (unweighted) average length of name
- For each vowel, total number of names (and babies named) that begin with a vowel, broken down by boys versus girls
Tutorial review
The queries in this tutorial are the same that we've done for the following:
-
SELECTing rows FROM data tables – The syntax for retrieving and displaying data from a SQLite tables
-
LIMIT and ORDER BY in SQL Queries – How to specify the quantity and arrangement of data rows returned by the SQL database.
-
Using the WHERE clause to filter data in SQL – How to retrieve rows based on whether they match specified values.
-
More Boolean Expressions to Filter SQL Queries – How to filter SQL data using comparison operators, such as "greater than" and "not equal to". Mostly, this is a review of how tricky logical expressions can be.
-
Using LIKE, IN, BETWEEN, and wildcards to match multiple values in SQL
– Real-world data is often messy, so we need messy ways of matching values, because matching only on exact values can unintentionally filter out relevant data.
-
Functions for transforming text and numbers in SQL – Think of these as spreadsheet functions.
-
Aliasing Columns and Tables in SQL Databases – A short lesson on how to give human-readable names for otherwise messily-named values and identifiers.
-
Using GROUP BY to create aggregates in SQL – With the use of the GROUP BY clause, we gain the ability to aggregate our data based on values in a given column or columns. At the very least, this let's us count the number of unique values in that column.
-
Aggregate functions in SQL – How to calculate sum, average, and other aggregates with the GROUP BY clause.
Exercises
Did you download the database? Here it is, with the babynames_2014
table loaded in.
How many records in the database
SELECT COUNT(*)
FROM babynames_2014;
How many babies in the database
SELECT SUM(babies)
FROM babynames_2014;
How many unique names
This number counts each unique name only once, i.e. doesn't double count names that are given to both boys and girls:
SELECT COUNT(DISTINCT name)
FROM babynames_2014;
COUNT(DISTINCT name) |
30579 |
How many boy names
SELECT COUNT(*)
FROM babynames_2014
WHERE sex = 'M';
How many names and babies, broken down by sex
SELECT sex, COUNT(*), SUM(babies)
FROM babynames_2014
GROUP BY sex;
sex |
COUNT(*) |
SUM(babies) |
F |
19067 |
1768775 |
M |
13977 |
1901376 |
5 popular names (i.e. most babies named)
SELECT name, sex, babies
FROM babynames_2014
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
Emma |
F |
20799 |
Olivia |
F |
19674 |
Noah |
M |
19144 |
Sophia |
F |
18490 |
Liam |
M |
18342 |
5 most popular boy names
SELECT name, sex, babies
FROM babynames_2014
WHERE sex = 'M'
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
Noah |
M |
19144 |
Liam |
M |
18342 |
Mason |
M |
17092 |
Jacob |
M |
16712 |
William |
M |
16687 |
5 most popular names between the letters 'U' and 'Z'
SELECT name, sex, babies
FROM babynames_2014
WHERE name BETWEEN 'U' AND 'Z'
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
William |
M |
16687 |
Wyatt |
M |
8812 |
Victoria |
F |
7955 |
Xavier |
M |
4726 |
Violet |
F |
4156 |
5 most popular boy names that begin with a vowel
SELECT *
FROM babynames_2014
WHERE
sex = 'M'
AND SUBSTR(name, 1, 1) IN ('A', 'E', 'I', 'O', 'U')
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
Ethan |
M |
15619 |
Alexander |
M |
15293 |
Elijah |
M |
13694 |
Aiden |
M |
13296 |
Anthony |
M |
11490 |
5 most popular girl names that don't end in a vowel
Note that we have to match against lower-case letters:
SELECT *
FROM babynames_2014
WHERE
sex = 'F'
AND SUBSTR(name, -1) NOT IN ('a', 'e', 'i', 'o', 'u')
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
Emily |
F |
12562 |
Abigail |
F |
11985 |
Madison |
F |
10247 |
Harper |
F |
9564 |
Avery |
F |
9517 |
Number of unique names with more than 7 letters
SELECT COUNT(DISTINCT name)
FROM babynames_2014
WHERE LENGTH(name) > 7;
COUNT(DISTINCT name) |
5027 |
Number of unique names shorter than 5 letters, and number of babies with those names
SELECT COUNT(DISTINCT name), SUM(babies)
FROM babynames_2014
WHERE LENGTH(name) < 5;
COUNT(DISTINCT name) |
SUM(babies) |
2988 |
584082 |
5 longest names
SELECT name, LENGTH(name) AS namelength,
sex, babies
FROM babynames_2014
ORDER BY namelength DESC
LIMIT 5;
name |
namelength |
sex |
babies |
Christopherjohn |
15 |
M |
5 |
Mariaguadalupe |
14 |
F |
16 |
Mariadelcarmen |
14 |
F |
8 |
Oluwafunmilayo |
14 |
F |
6 |
Oluwaseyifunmi |
14 |
F |
6 |
5 longest names with at least 1,000 babies
SELECT name, LENGTH(name) AS namelength,
sex, babies
FROM babynames_2014
WHERE babies >= 1000
ORDER BY namelength DESC
LIMIT 5;
name |
namelength |
sex |
babies |
Christopher |
11 |
M |
10278 |
Alexandria |
10 |
F |
1589 |
Jacqueline |
10 |
F |
1301 |
Evangeline |
10 |
F |
1177 |
Alessandra |
10 |
F |
1078 |
5 most popular starting letters, based on number of names
SELECT
SUBSTR(name, 1, 1) AS first_letter,
COUNT(*) AS name_count
FROM babynames_2014
GROUP BY first_letter
ORDER BY name_count DESC;
first_letter |
name_count |
A |
4537 |
K |
2970 |
J |
2810 |
M |
2650 |
S |
2163 |
5 most popular starting letters, based on number of babies named
SELECT
SUBSTR(name, 1, 1) AS first_letter,
SUM(babies) AS total_babies
FROM babynames_2014
GROUP BY first_letter
ORDER BY total_babies DESC;
first_letter |
total_babies |
A |
517815 |
J |
368628 |
M |
298113 |
C |
259840 |
E |
248933 |
5 most popular names with exactly 6 characters that begin with the letters "Ja"
SELECT *
FROM babynames_2014
WHERE name LIKE 'Ja____'
ORDER BY babies DESC
LIMIT 5;
Alternatively, this would also work:
SELECT *
FROM babynames_2014
WHERE name LIKE 'Ja%'
AND LENGTH(name) = 6
ORDER BY babies DESC
LIMIT 5;
name |
sex |
babies |
Jayden |
M |
12878 |
Jaxson |
M |
4900 |
Javier |
M |
1761 |
Jasper |
M |
1731 |
Jaiden |
M |
1552 |
3 most popular names with at least 2 consecutive "r" letters
SELECT *
FROM babynames_2014
WHERE name LIKE '%rr%'
ORDER BY babies DESC
LIMIT 3;
name |
sex |
babies |
Harrison |
M |
3184 |
Garrett |
M |
1552 |
Barrett |
M |
950 |
Average length (unweighted) of names, boys versus girls, rounded to 2 decimal places
SELECT sex,
ROUND(AVG(LENGTH(name)), 2)
FROM babynames_2014
GROUP BY sex;
sex |
ROUND(AVG(LENGTH(name)), 2) |
F |
6.32 |
M |
5.98 |
Top 10 starting letter of names, sorted by longest (unweighted) average length of name
"Unweighted" here means that each name has equal weight, regardless of how many babies were given that name. It just makes the calculation easier.
SELECT
SUBSTR(name, 1, 1) AS first_letter,
AVG(LENGTH(name)) AS avg_namelength
FROM babynames_2014
GROUP BY first_letter
ORDER BY avg_namelength DESC
LIMIT 5;
first_letter |
avg_namelength |
Q |
6.78723404255319 |
G |
6.68703427719821 |
C |
6.56065959952886 |
M |
6.49320754716981 |
P |
6.42452830188679 |
For each vowel, total number of names (and babies named) that begin with a vowel, broken down by boys versus girls
SELECT
SUBSTR(name, 1, 1) AS first_letter,
sex,
COUNT(*) AS total_names,
SUM(babies) AS total_babies
FROM babynames_2014
WHERE
first_letter IN ('A', 'E', 'I', 'O', 'U')
GROUP BY first_letter, sex
ORDER BY first_letter, sex;
first_letter |
sex |
total_names |
total_babies |
A |
F |
3091 |
325658 |
A |
M |
1446 |
192157 |
E |
F |
926 |
142064 |
E |
M |
547 |
106869 |
I |
F |
371 |
43366 |
I |
M |
233 |
38363 |
O |
F |
141 |
24140 |
O |
M |
206 |
30212 |
U |
F |
28 |
570 |
U |
M |
43 |
2648 |