The week of Tuesday, September 22

These links contain material that I refer to in class but in self-contained article/aggregate/tutorial form.

Data is just as biased and dirty as any anecdote. This understanding is the foundational starting point of the best best data journalism.

How the Sun-Sentinel used databases and grade-school arithmetic to derive the illegal speeding by Florida cops.

Resources and articles about how to request files from the Federal Bureau of Investigation via FOIA.

Using spreadsheets to re-create a small, ingenious part of the analysis in the Sun Sentinel’s Pulitzer-winning investigation into speeding cops.

Assignments

Casual introduction to spreadsheets

Due: Thursday, September 24
Points: 5

Learning to use spreadsheets for organizing your non-numerical data and for sane crowdsourcing.

Your (first?) FOIA to the FBI

Due: Tuesday, September 29
Points: 5

Ask the FBI what they have on the famous and deceased.

Read 5 data journalism stories

Due: Thursday, October 1
Points: 5

Pick 5 data stories from a wide selection, write five short summaries

Thursday, September 24

We talked a lot about Texas and the FBI and FOIA laws


Notes for Tuesday, September 22

For the first week, we'll discuss the term "data journalism", e.g. what is data journalism, how is it done, how is it different from "non-data" journalism, etc, and get a grounding in fundamental tools (spreadsheets) and concepts (filtering and aggregation).

How to find speeding cops

Florida's missing speeding cops

FBI FOIAs

This class won't be covering the complexities of American public records law, but we can still practice it. Requesting a FBI file for an individual person is a popular example of FOIA.

In-class work

  • Follow the Casual Spreadsheets homework assignment and create a folder on Google Drive: PADJO2015-DNGUYEN (put your first initial and last name instead of mine). Give edit permissions to dun@stanford.edu.
  • Signup for Github and the Github Student Education Pack. I'm not sure how far we'll go into the Github funland but approval for the education pack takes a few days/weeks.

Pivot tables

For Tuesday and Thursday we'll be doing some in-class work with pivot tables. This is to get everyone re-acquainted with the spreadsheet and to understand the huge value that it brings to doing effective data work. Even after you've mastered SQL, you'll find that spreadsheets will be your go-to tool for most data tasks.

In class, we'll try to walk through: Spreadsheet exercise with Sun Sentinel speeding cops database.

And here's a pivot table walkthrough from last year that's worth practicing: Basic Aggregation with Pivot Tables

On Thursday: We'll focus more on visualization, or more accurately, how filtering and aggregation are necessary for making useful visualizations, rather than "noisy" [ones like this earthquake map](https://dundee.cartodb.com/viz/888634d0-60ae-11e5-93c9-0e018d66dc29/public_map.

Odds and ends

Look, Google has analytics data for our humble campus coffee shop:

img

References

City of Chicago FOIA homepage | cityofchicago.org

The city posts all of the FOIA requests it receives, including the name of the requester.

People who are on the Texas Department of Public Safety’s Registry of Sex Offender Registry Downloaders.

This Cambridge research paper analyzed an anonymized dataset of “57 billion friendships” and found a “correlation between higher social class and fewer international friendships”. But how did they account for social class from anonymized user data?