Create a Document in your PADJO2015 folder named: PADJO2015-MIDTERM-BABYSTEPS.
Every problem should have its own page devoted to it. So if there 16 problems, your document should have 16 pages.
If you work with a partner, both of you must turn in a separate document, but indicate (on the first page) the name of your partner.
Download the National data zip file of baby names from the Social Security Administration.
Extract the 1990 and 2010 files and create a SQLite database with 2 tables, babynames_1990
and babynames_2010
, respectively
For each problem, write the SQLite query that will return the given result.
Your query must return the same results – including the same column headers and format.
You may work with one other person. Be sure that both of you actually know how to solve all of the problems.
Read through these tutorials to make sure you understand joins:
Using LEFT JOINs to find what's missing from one table to another – Sometimes when we compare two tables, we care more about their differences.
Then follow the instructions on how to create and import data, except create tables for babynames_1990
and babynames_2010
.
The query result is shown for each exercise. So provide the query that will return these results.
Give each exercise its own page in your document.
Aggregate functions in SQL – How to calculate sum, average, and other aggregates with the GROUP BY clause.
Some warmup baby queries – Getting started with baby names.
SUM(babies) |
---|
3950252 |
SELECT SUM(babies) FROM babynames_1990;
Aggregate functions in SQL – How to calculate sum, average, and other aggregates with the GROUP BY clause.
Some warmup baby queries – Getting started with baby names.
(partial answer)
SELECT /* fill out this part on your own*/
FROM babynames_1990 /* this needs to be changed to fit the question */
GROUP BY sex;
sex | SUM(babies) |
---|---|
F | 1772738 |
M | 1913851 |
SELECT sex, SUM(babies) FROM babynames_2010 GROUP BY sex;
Using GROUP BY to create aggregates in SQL – With the use of the GROUP BY clause, we gain the ability to aggregate our data based on values in a given column or columns. At the very least, this let's us count the number of unique values in that column.
Aggregate functions in SQL – How to calculate sum, average, and other aggregates with the GROUP BY clause.
Some warmup baby queries – Getting started with baby names.
(A hint)
SELECT COUNT(DISTINCT name) FROM babynames_1990;
COUNT(DISTINCT name) |
---|
31603 |
SELECT COUNT(DISTINCT name) FROM babynames_2010;
COUNT(DISTINCT name) |
---|
2041 |
SELECT COUNT(DISTINCT name)
FROM babynames_1990
WHERE name LIKE 'A%';
name | sex | babies |
---|---|---|
Jessica | F | 46466 |
Ashley | F | 45549 |
Brittany | F | 36535 |
Amanda | F | 34406 |
Samantha | F | 25864 |
SELECT * FROM babynames_1990
WHERE sex = 'F'
ORDER BY babies DESC
LIMIT 5;
name | sex | babies |
---|---|---|
Ethan | M | 17985 |
Alexander | M | 16742 |
Aiden | M | 15516 |
Anthony | M | 15470 |
Andrew | M | 14221 |
SELECT * FROM babynames_2010
WHERE sex = 'M'
AND SUBSTR(name, 1, 1) IN ('A', 'E', 'I', 'O', 'U')
ORDER BY babies DESC
LIMIT 5;
last_letter | total_babies |
---|---|
n | 693690 |
r | 167287 |
l | 134811 |
e | 130073 |
s | 124710 |
SELECT SUBSTR(name, -1) AS last_letter,
SUM(babies) AS total_babies
FROM babynames_2010
WHERE sex = 'M'
GROUP BY last_letter
ORDER BY total_babies DESC
LIMIT 5;
(partial answer)
SELECT b1990.name,
b1990.babies AS babies_1990,
b2010.babies AS babies_2010
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010
ON
/* fill this part out on your own */
WHERE b1990.sex = 'F'
ORDER BY b1990.babies DESC
LIMIT 5;
name | babies_1990 | babies_2010 |
---|---|---|
Jessica | 46466 | 3192 |
Ashley | 45549 | 6305 |
Brittany | 36535 | 727 |
Amanda | 34406 | 1655 |
Samantha | 25864 | 8396 |
SELECT b1990.name,
b1990.babies AS babies_1990,
b2010.babies AS babies_2010
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010
ON
b1990.name = b2010.name AND
b1990.sex = b2010.sex
WHERE b1990.sex = 'F'
ORDER BY b1990.babies DESC
LIMIT 5;
name | babies_1990 | babies_2010 | diffbabies |
---|---|---|---|
Dustin | 8453 | 865 | -7588 |
Scott | 8082 | 871 | -7211 |
Gregory | 8377 | 1286 | -7091 |
Corey | 7690 | 1006 | -6684 |
Paul | 8578 | 2120 | -6458 |
SELECT b1990.name,
b1990.babies AS babies_1990,
b2010.babies AS babies_2010,
b2010.babies - b1990.babies AS diffbabies
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010
ON
b1990.name = b2010.name AND
b1990.sex = b2010.sex
WHERE b1990.sex = 'M' AND b1990.babies < 10000
ORDER BY diffbabies ASC
LIMIT 5;
name | babies_1990 | babies_2010 |
---|---|---|
Sophia | 1125 | 20612 |
Emma | 2415 | 17322 |
Olivia | 4622 | 17012 |
Emily | 19358 | 14260 |
Abigail | 3719 | 14228 |
SELECT b2010.name,
b1990.babies AS babies_1990,
b2010.babies AS babies_2010
FROM babynames_2010 AS b2010
INNER JOIN babynames_1990 AS b1990
ON
b1990.name = b2010.name AND
b1990.sex = b2010.sex
WHERE b2010.sex = 'F'
AND babies_2010 >= 500
AND babies_1990 >= 500
ORDER BY babies_2010 DESC
LIMIT 5;
(oops, looks like I forgot to delete the actual answer before posting it…that's now been fixed, subtly…)
SELECT b2010.name,
b2010.sex,
b2010.puppies AS babies_2010
FROM babynames_2010 AS b2010
LEFT JOIN babynames_1990 AS b1990
ON
b1990.name = b2010.name AND
b1990.sex = b2010.sex
WHERE b1990.name IS NULL
ORDER BY babies_2010 DESC
LIMIT 982394;
name | sex | babies_2010 |
---|---|---|
Nevaeh | F | 6402 |
Kaiden | M | 2346 |
Reese | F | 2263 |
Jaxson | M | 2212 |
Jaiden | M | 2203 |
SELECT b2010.name,
b2010.sex,
b2010.babies AS babies_2010
FROM babynames_2010 AS b2010
LEFT JOIN babynames_1990 AS b1990
ON
b1990.name = b2010.name AND
b1990.sex = b2010.sex
WHERE b1990.name IS NULL
ORDER BY babies_2010 DESC
LIMIT 5;
Note: previously, I mistakenly had the answer as this:
COUNT(*) |
---|
2454 |
That's because I stupidly had this as my join constraint, thanks to sloppy and paste:
ON b2010.name = b1990.name
AND b1990.name = b2010.name
You can probably figure out what's wrong with that. And now I've re-learned my lesson of not copy-pasting code just to safe a few keystrokes.
COUNT(*) |
---|
2910 |
SELECT COUNT(*)
FROM babynames_1990 AS b1990
LEFT JOIN babynames_2010 AS b2010
ON b2010.name = b1990.name
AND b1990.sex = b2010.sex
WHERE b1990.sex = 'M'
AND b2010.name IS NULL;
name | boy_babies | girl_babies | total_babies |
---|---|---|---|
Jordan | 16128 | 5954 | 22082 |
Taylor | 6574 | 7256 | 13830 |
Jamie | 1318 | 6510 | 7828 |
Casey | 4130 | 3261 | 7391 |
Angel | 2404 | 1663 | 4067 |
SELECT x.name, x.babies AS boy_babies,
y.babies AS girl_babies,
x.babies + y.babies AS total_babies
FROM babynames_1990 AS x
INNER JOIN babynames_1990 AS y
ON x.name = y.name
AND x.name = y.name
WHERE x.sex = 'M' AND y.sex = 'F'
AND boy_babies >= 1000 AND girl_babies >= 1000
ORDER BY total_babies DESC
LIMIT 5;
COUNT(*) |
---|
324 |
SELECT COUNT(*)
FROM (SELECT name FROM babynames_2010
WHERE sex = 'M' ORDER BY babies DESC LIMIT 500)
AS b2010
INNER JOIN (SELECT name FROM babynames_1990
WHERE sex = 'M' ORDER BY babies DESC LIMIT 500)
AS b1990
ON b1990.name = b2010.name;
COUNT(*) |
---|
15 |
SELECT COUNT(*)
FROM (SELECT * FROM babynames_2010
WHERE sex = 'F'
ORDER BY babies DESC
LIMIT 100) as topnames
WHERE SUBSTR(name, 1, 1) IN ('A', 'E', 'I', 'O', 'U')
AND SUBSTR(name, -1) IN ('a', 'e', 'i', 'o', 'u');
Partial hint:
SELECT m.name,
m.babies AS boy_babies,
f.babies AS girl_babies,
ABS(m.babies - f.babies) AS gender_diff
FROM (SELECT * FROM babynames_2010
AS m WHERE babies >= 500 AND sex = 'M')
AS m
/* you do the rest... */
m.name | boy_babies | girl_babies | gender_diff |
---|---|---|---|
Dakota | 1130 | 1141 | 11 |
Justice | 585 | 546 | 39 |
Quinn | 1236 | 1276 | 40 |
Amari | 1004 | 661 | 343 |
Rowan | 954 | 608 | 346 |
SELECT m.name, m.babies AS boy_babies,
f.babies AS girl_babies,
ABS(m.babies - f.babies) AS gender_diff
FROM (SELECT * FROM babynames_2010 AS m WHERE babies >= 500 AND sex = 'M')
AS m
INNER JOIN
(SELECT * FROM babynames_2010 AS f WHERE babies >= 500 AND sex = 'F')
AS f
ON f.name = m.name
ORDER by gender_diff ASC
LIMIT 5;
I left this out of the test. Just another test of your ability to untangle nested queries.
name | babies_1990 | babies_2010 |
---|---|---|
Michael | 65274 | 17308 |
Christopher | 52323 | 14243 |
Matthew | 44794 | 14099 |
Ashley | 45549 | 6305 |
Jessica | 46466 | 3192 |
SELECT b1990.name,
b1990.babies AS babies_1990,
b2010.babies AS babies_2010
FROM babynames_2010 AS b2010
INNER JOIN (SELECT * FROM babynames_1990
ORDER BY babies DESC
LIMIT 5) AS b1990
ON b1990.name = b2010.name
AND b1990.sex = b2010.sex
ORDER BY babies_2010 DESC
LIMIT 5;