Combine your database and data visualization skills to find, filter, analyze, and visualize data as a publishable package for the web.

Research other FOIA requests, get inspired, and send out 5 to agencies relevant to your beat/thesis so that you have something to look forward to in the new year.

image Using PostGIS, SQL, and CartoDB to identify schools at risk from Oklahoma's earthquakes

A tutorial on using geospatial analysis, shapefiles and datasets from the U.S. Geological Survey, Census, and Department of Education to visualize the impact of Oklahoma’s earthquakes and explore possible investigative projects.

(From last week) Here's a crime data report from the NYPD (as a bad example of visualization).

How to use SQL to learn about Medicare, contemporary issues in Medicare billing practices, the math and evidence behind the WSJ’s “Medicare Unmasked” project, and the general problems with real-world datasets.

Homework (for after Thanksgiving)

A hands-on exploration using SQL to learn about Medicare data and controversial practices in Medicare billing, as well as to appreciate the Wall Street Journal’s Pulitzer Prize winning data investigation.

Past homework

A warm-up, self-contained exercise on all the SQL Joins you’ll need to know for the more real-world exercise.

Combine several kinds of spatial and boundary data in QGIS to make a map that shows both points and gradients

Public Affairs Data Journalism at Stanford University

Listing: COMM 273D: Public Affairs Data Journalism I

Session: September 21 to December 4, 2015

Hours: Tuesdays and Thursdays, 1:30 to 2:50PM

Location: McClatchy Hall (Building 120), Room 410

Instructor: Dan Nguyen

Our primary goal is to learn how to argue with and against data.

If we want to understand our government, including the power it holds over and yields to its institutions and citizens, then we must understand the data that is both the byproduct and fuel from the work of government.

Note: COMM 273D was first taught in Fall 2014.


September 22

Elements of Assertive Data Journalism

An introduction to data journalism, how to count, how to research, and understanding the elements of a data story. We also dive into doing public records requests and using spreadsheets to aggregate and filter data.

September 29

DIY Databases

Some of the most important journalism involves finding that there just is no official data -- and then being able to efficiently collect and organize the data needed to do unique analysis and journalism. The recent crowdsourced initiatives to track police-involved homicides are valuable case studies for learning best practices and the complications of real-world data collection.

October 6

More DIY Databases and introduction to SQL

We continue looking at "homemade" databases and start to learn the concepts and basics of Structured Query Language and database programming.

October 13

SQL Syntax and Aggregations

A real grounding in basic SQL syntax. We won't cover much that we can't already do with a spreadsheet, but with SQL, we'll learn how to do it with much bigger datasets.

October 20

Data Joins, More SQL, More Visualizations

The ability to join two or more datasets is one of the most powerful techniques of the data journalist. We learn the SQL syntax to express these joins.

October 27

Midterm Marlarkey

Some catch-up time, a take home midterm, and a guest speaker.

November 3

The points and shapes of maps

Besides being a very valuable and much sought-after visualization skill, mapping allows us to continue thinking about how different kinds of data can be joined, even without a database. We'll also cover best practices with mapping and get an overview of the latest mapping tools.

November 10

Data research and wrangling

Not much different than other kinds of journalistic research, but profoundly important in the work of data wrangling, collection, and analysis. Since it's that time of year again, we will also take a look at various U.S.-election-related data.

November 17

Data visualization and publication

An emphasis on seeing how effective data wrangling and research leads to powerful and expansive data visualization. We also learn how to publish interactive and static visualizations independent of a platform or content-management system.

November 24

Thanksgiving Break

No classes

November 30

Project discussion and work time.

Last week of classes. Workshop and project-showoff time in the lab.

Assignments and Grading Assessments

Students will be graded on successful completion of homework assignments and a midterm test focused on SQL (see last year's test).

The assignments will be frequent. Some of the assignments will be longer than others but none are what I would consider project-length. There is no final.

Grading breakdown

Standard (but flexible) curve with __A-__ starting at 90%

80% of the grade will be from assignments.

20% of the grade will be from the midterm.

Any assignment more than 2 days late will receive half-credit at most.


Please read Stanford's Honor Code, the university's statement on academic integrity as written by students in 1921.

Use of technology in class

This is a computer-intensive class but I will do my best to post up-to-date and complete tutorials on the tools we cover, so that students can be focused on the in-class discussions and demonstrations without the burden of typing out notes on every detail. That said, in-class laptop usage must remain on task.


Compared to the other core classes, I'm relatively flexible on excused absences for lecture if the reason is related to being on journalistic assignment, i.e. an interview or event that cannot be rescheduled. Students are expected to notify me 72 hours in advance to receive approval.

Each unresolved unexcused absence may result in a penalty of half-letter of a grade.