Assignment

Due: Tuesday, November 17
Points: 10

Deliverables

Create a Document in your PADJO2015 folder named: PADJO2015-MIDTERM-BABYSTEPS.

Every problem should have its own page devoted to it. So if there 16 problems, your document should have 16 pages.

If you work with a partner, both of you must turn in a separate document, but indicate (on the first page) the name of your partner.

Requirements

  • Download the National data zip file of baby names from the Social Security Administration.

  • Extract the 1990 and 2010 files and create a SQLite database with 2 tables, babynames_1990 and babynames_2010, respectively

  • For each problem, write the SQLite query that will return the given result.

  • Your query must return the same results – including the same column headers and format.

  • You may work with one other person. Be sure that both of you actually know how to solve all of the problems.

Table of contents

Getting started

Read through these tutorials to make sure you understand joins:

Then follow the instructions on how to create and import data, except create tables for babynames_1990 and babynames_2010.

Exercises

The query result is shown for each exercise. So provide the query that will return these results.

Give each exercise its own page in your document.

1. How many babies are there in 1990?

Result

SUM(babies)
3950252

Query

SELECT SUM(babies) FROM babynames_1990;

2. How many boy and girl babies are there in 2010?

(partial answer)

SELECT /* fill out this part on your own*/ 
FROM babynames_1990 /* this needs to be changed to fit the question */
GROUP BY sex;

Result

sex SUM(babies)
F 1772738
M 1913851

Query

SELECT sex, SUM(babies) FROM babynames_2010 GROUP BY sex;

3. How many unique names are there in 2010?

(A hint)

SELECT COUNT(DISTINCT name) FROM babynames_1990;

Result

COUNT(DISTINCT name)
31603

Query

SELECT COUNT(DISTINCT name) FROM babynames_2010;

4. How many unique names are there in 1990 that start with the letter 'A'?

Result

COUNT(DISTINCT name)
2041

Query

SELECT COUNT(DISTINCT name) 
FROM babynames_1990
WHERE name LIKE 'A%';

Result

name sex babies
Jessica F 46466
Ashley F 45549
Brittany F 36535
Amanda F 34406
Samantha F 25864

Query

SELECT * FROM babynames_1990
WHERE sex = 'F'
ORDER BY babies  DESC
LIMIT 5;

Result

name sex babies
Ethan M 17985
Alexander M 16742
Aiden M 15516
Anthony M 15470
Andrew M 14221

Query

SELECT * FROM babynames_2010
WHERE sex = 'M'
  AND SUBSTR(name, 1, 1) IN ('A', 'E', 'I', 'O', 'U')
ORDER BY babies  DESC
LIMIT 5;

Result

last_letter total_babies
n 693690
r 167287
l 134811
e 130073
s 124710

Query

SELECT SUBSTR(name, -1) AS last_letter, 
  SUM(babies) AS total_babies 
FROM babynames_2010
WHERE sex = 'M'
GROUP BY last_letter
ORDER BY total_babies  DESC
LIMIT 5;

(partial answer)

SELECT b1990.name, 
  b1990.babies AS babies_1990, 
  b2010.babies AS babies_2010
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010 
  ON 
  /* fill this part out on your own */

WHERE b1990.sex = 'F'
ORDER BY b1990.babies DESC
LIMIT 5;

Result

name babies_1990 babies_2010
Jessica 46466 3192
Ashley 45549 6305
Brittany 36535 727
Amanda 34406 1655
Samantha 25864 8396

Query

SELECT b1990.name, 
  b1990.babies AS babies_1990, 
  b2010.babies AS babies_2010
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010 
  ON 
  b1990.name = b2010.name AND
  b1990.sex = b2010.sex
WHERE b1990.sex = 'F'
ORDER BY b1990.babies DESC
LIMIT 5;

9. The top 5 boy names with fewer than 10,000 babies in 1990, sorted by biggest drop in popularity in 2010

Result

name babies_1990 babies_2010 diffbabies
Dustin 8453 865 -7588
Scott 8082 871 -7211
Gregory 8377 1286 -7091
Corey 7690 1006 -6684
Paul 8578 2120 -6458

Query

SELECT b1990.name, 
  b1990.babies AS babies_1990, 
  b2010.babies AS babies_2010,
  b2010.babies - b1990.babies AS diffbabies
FROM babynames_1990 AS b1990
INNER JOIN
babynames_2010 AS b2010 
  ON 
  b1990.name = b2010.name AND
  b1990.sex = b2010.sex
WHERE b1990.sex = 'M' AND b1990.babies < 10000
ORDER BY diffbabies ASC
LIMIT 5;

10. The top 5 girl names that have been given to least 500 babies in both 1990 and 2010, sorted by popularity in 2010

Result

name babies_1990 babies_2010
Sophia 1125 20612
Emma 2415 17322
Olivia 4622 17012
Emily 19358 14260
Abigail 3719 14228

Query

SELECT b2010.name, 
  b1990.babies AS babies_1990,
  b2010.babies AS babies_2010
FROM babynames_2010 AS b2010 
INNER JOIN babynames_1990 AS b1990
  ON 
  b1990.name = b2010.name AND
  b1990.sex = b2010.sex
WHERE b2010.sex = 'F' 
  AND babies_2010 >= 500
  AND babies_1990 >= 500
ORDER BY babies_2010 DESC
LIMIT 5;

(oops, looks like I forgot to delete the actual answer before posting it…that's now been fixed, subtly…)

SELECT b2010.name, 
  b2010.sex,
  b2010.puppies AS babies_2010
FROM babynames_2010 AS b2010 
LEFT JOIN babynames_1990 AS b1990
  ON 
  b1990.name = b2010.name AND
  b1990.sex = b2010.sex
WHERE b1990.name IS NULL
ORDER BY babies_2010 DESC
LIMIT 982394;

Result

name sex babies_2010
Nevaeh F 6402
Kaiden M 2346
Reese F 2263
Jaxson M 2212
Jaiden M 2203

Query

SELECT b2010.name, 
  b2010.sex,
  b2010.babies AS babies_2010
FROM babynames_2010 AS b2010 
LEFT JOIN babynames_1990 AS b1990
  ON 
  b1990.name = b2010.name AND
  b1990.sex = b2010.sex
WHERE b1990.name IS NULL
ORDER BY babies_2010 DESC
LIMIT 5;

12. How many boy names in 1990 don't appear in the list of boy names for 2010?

Note: previously, I mistakenly had the answer as this:

COUNT(*)
2454

That's because I stupidly had this as my join constraint, thanks to sloppy and paste:

    ON b2010.name = b1990.name
      AND b1990.name = b2010.name

You can probably figure out what's wrong with that. And now I've re-learned my lesson of not copy-pasting code just to safe a few keystrokes.

Result

COUNT(*)
2910

Query

SELECT COUNT(*)
FROM babynames_1990 AS b1990
LEFT JOIN babynames_2010 AS b2010
ON b2010.name = b1990.name
  AND b1990.sex = b2010.sex
WHERE b1990.sex = 'M'
  AND b2010.name IS NULL;

Result

name boy_babies girl_babies total_babies
Jordan 16128 5954 22082
Taylor 6574 7256 13830
Jamie 1318 6510 7828
Casey 4130 3261 7391
Angel 2404 1663 4067

Query

SELECT x.name, x.babies AS boy_babies,
  y.babies AS girl_babies,
  x.babies + y.babies AS total_babies
FROM babynames_1990 AS x
INNER JOIN babynames_1990 AS y
ON x.name = y.name
  AND x.name = y.name
WHERE x.sex = 'M' AND y.sex = 'F'
  AND boy_babies >= 1000 AND girl_babies >= 1000
ORDER BY total_babies DESC
LIMIT 5;

Query

COUNT(*)
324

Result

SELECT COUNT(*)
FROM (SELECT name FROM babynames_2010
      WHERE sex = 'M' ORDER BY babies DESC LIMIT 500)
       AS b2010
INNER JOIN (SELECT name FROM babynames_1990
      WHERE sex = 'M' ORDER BY babies DESC LIMIT 500)
       AS b1990
  ON b1990.name = b2010.name;

Result

COUNT(*)
15

Query

SELECT COUNT(*) 
 FROM (SELECT * FROM babynames_2010
  WHERE sex = 'F'
  ORDER BY babies DESC
  LIMIT 100) as topnames
WHERE SUBSTR(name, 1, 1)  IN ('A', 'E', 'I', 'O', 'U')
  AND SUBSTR(name, -1)  IN ('a', 'e', 'i', 'o', 'u');

16. Show the 5 most gender neutral names among 2010 names that were given to at least 500 boys and 500 girls

Partial hint:

SELECT m.name, 
  m.babies AS boy_babies,
  f.babies AS girl_babies,
  ABS(m.babies - f.babies) AS gender_diff
  FROM (SELECT * FROM babynames_2010 
      AS m WHERE babies >= 500 AND sex = 'M')
  AS m
  /* you do the rest... */

Result

m.name boy_babies girl_babies gender_diff
Dakota 1130 1141 11
Justice 585 546 39
Quinn 1236 1276 40
Amari 1004 661 343
Rowan 954 608 346

Query

SELECT m.name, m.babies AS boy_babies,
  f.babies AS girl_babies,
  ABS(m.babies - f.babies) AS gender_diff
FROM (SELECT * FROM babynames_2010 AS m WHERE babies >= 500 AND sex = 'M')
  AS m
INNER JOIN 
  (SELECT * FROM babynames_2010 AS f WHERE babies >= 500 AND sex = 'F')
  AS f
  ON f.name = m.name
ORDER by gender_diff  ASC
LIMIT 5;

I left this out of the test. Just another test of your ability to untangle nested queries.

Result

name babies_1990 babies_2010
Michael 65274 17308
Christopher 52323 14243
Matthew 44794 14099
Ashley 45549 6305
Jessica 46466 3192

Query

SELECT b1990.name, 
  b1990.babies AS babies_1990, 
  b2010.babies AS babies_2010
FROM babynames_2010 AS b2010
INNER JOIN (SELECT * FROM babynames_1990
  ORDER BY babies DESC
  LIMIT 5) AS b1990
  ON b1990.name = b2010.name
    AND b1990.sex = b2010.sex
ORDER BY babies_2010 DESC
LIMIT 5;