Data is just as biased and dirty as any anecdote. This understanding is the foundational starting point of the best best data journalism.
These links contain material that I refer to in class but in self-contained article/aggregate/tutorial form.
Data is just as biased and dirty as any anecdote. This understanding is the foundational starting point of the best best data journalism.
How the Sun-Sentinel used databases and grade-school arithmetic to derive the illegal speeding by Florida cops.
Resources and articles about how to request files from the Federal Bureau of Investigation via FOIA.
Using spreadsheets to re-create a small, ingenious part of the analysis in the Sun Sentinel’s Pulitzer-winning investigation into speeding cops.
Learning to use spreadsheets for organizing your non-numerical data and for sane crowdsourcing.
Ask the FBI what they have on the famous and deceased.
Pick 5 data stories from a wide selection, write five short summaries
We talked a lot about Texas and the FBI and FOIA laws
For the first week, we'll discuss the term "data journalism", e.g. what is data journalism, how is it done, how is it different from "non-data" journalism, etc, and get a grounding in fundamental tools (spreadsheets) and concepts (filtering and aggregation).
Florida's missing speeding cops
This class won't be covering the complexities of American public records law, but we can still practice it. Requesting a FBI file for an individual person is a popular example of FOIA.
For Tuesday and Thursday we'll be doing some in-class work with pivot tables. This is to get everyone re-acquainted with the spreadsheet and to understand the huge value that it brings to doing effective data work. Even after you've mastered SQL, you'll find that spreadsheets will be your go-to tool for most data tasks.
In class, we'll try to walk through: Spreadsheet exercise with Sun Sentinel speeding cops database.
And here's a pivot table walkthrough from last year that's worth practicing: Basic Aggregation with Pivot Tables
On Thursday: We'll focus more on visualization, or more accurately, how filtering and aggregation are necessary for making useful visualizations, rather than "noisy" [ones like this earthquake map](https://dundee.cartodb.com/viz/888634d0-60ae-11e5-93c9-0e018d66dc29/public_map.
Look, Google has analytics data for our humble campus coffee shop:
The city posts all of the FOIA requests it receives, including the name of the requester.
People who are on the Texas Department of Public Safety’s Registry of Sex Offender Registry Downloaders.
This Cambridge research paper analyzed an anonymized dataset of “57 billion friendships” and found a “correlation between higher social class and fewer international friendships”. But how did they account for social class from anonymized user data?