The Takeaways

By the end of this tutorial, make sure you can answer these questions:

  • Where to type in a query and how to execute it in DB Browser for SQLite (or your SQL client of choice).

  • How SELECT 'hello, world' is different from SELECT 'hello', 'world'

  • That SELECT name, state FROM baby_names is the same as select name,state from baby_names

  • What the star symbol – i.e. * – means in context of SELECT and FROM

  • Why SELECT 'name' FROM baby_names will not get the same result as SELECT "name" from baby_names

Get the data

For this lesson, download the following file, which is a SQLite database built from the U.S. Social Security Administration file of popular baby names:

Unzip it, and open the sqlite file using the SQLite client of your choice (e.g. DB Browser for SQLite)

Table of contents

The SELECT statement

You can guess what the SELECT statement does: it's what we use to query the database for data. From the SQLite documentation:

The result of a SELECT is zero or more rows of data where each row has a fixed number of columns. A SELECT statement does not make any changes to the database.

There's not many kinds of statements in SQL, and 95% of our queries will involve using the SELECT statement. As a corollary – if you read the excerpt above – this means most what we generally do with SQL databases does not alter the data, i.e. creating, updating, and deleting data.

Trying a basic SELECT

Let's get right into executing a query.

Open a database file – I'm assuming you've downloaded and unzipped the following file:

ssa_baby_names–2010_2014–all.sqlite.zip

If you are using the client, DB Browser for SQLite, select the Execute SQL tab and type the following into the first input box:

SELECT 100;

Then hit the Play button (or better yet, the keyboard shortcut: Ctrl + Return).

The result should look something like this (note that I've left off the semi-colon, which is optional here but is generally used to denote the end of a command):

image

Here's an animated GIF of the process:

GIF: sqlite-db-browser-select-100.gif

More text, fewer screenshots

From this point on, I’ll be displaying the input and output of these examples as just text, rather than screenshotting the GUI.

This is partly to emphasize that whatever graphical client we use is merely a user-friendly shell for the code we execute. Also, text is a lot easier to produce for these tutorials.

What does it mean?

We asked the database to "select the number 100" and it obliged. The result of that "query" is simply the number 100, returned as a single column (with a header of 100) and a single row.

Selecting multiple values

In SQL syntax, we can specify a series of values to select by separating them with commas; note in the example below, it's only the commas that delimit the values; whitespace and newlines have no effect.

SELECT 100, 200,   -300,
    -42;

The result is just…more values, each in their own column:

100 200 -300 -42
100 200 -300 -42

Specifying literal text strings

We can specify literal text strings by enclosing a word in single quotes (i.e. apostrophes):

SELECT 'hello', 'world'

The result:

'hello' 'world'
hello world

I re-emphasize the term literal here; the previous query selected the literal words 'hello' and 'world', separately. What if we wanted to select the phrase, 'hello, world'?

We just enclose the phrase in single quotes, which specifies that we want hello and world again, literally, but as a single string connected with a literal comma and space:

SELECT 'hello, world';

Result:

'hello, world'
hello, world

Text strings without the quotes?

What happens if you omit the single quotes, as we did in the first example queries with just numbers (e.g. SELECT 100, 200, 300)?

SELECT hello

You should get an error message:

Error: no such column: hello

Without the single-quotes, the SQLite interpreter interprets hello as something non-literal, such as the the name of a column or some other SQLite-specific command. So when I use the term literal, I mean, just the plain, ordinary value that a text string has for humans, rather than the special meaning that it might have for the computer.

If it's not obvious by now, the word SELECT has a special meaning to the SQLite interpreter, as it is a keyword in the syntax of the SQLite language. The following query would return an error, because it doesn't make sense to the interpreter:

SELECT SELECT

What if we wanted to SELECT the literal text string of 'SELECT'? Again, by using the single-quotes, we specify to the interpreter that we want the word "SELECT", literally:

SELECT 'SELECT'
'SELECT'
SELECT

Are you literally tired of seeing the word literal explained over and over?? I belabor the point here because not knowing when to use single-quotes – or if you're new to programming in general, the importance concept of computer syntax – will bite you very hard, very soon. But don't worry, we'll reiterate the concept by the end of the lesson.

Using the FROM clause to select data from tables

So selecting values such as 'hello' and 100 is not very exciting, nor does it make much sense. What we really want to do is select values from our data tables. To do that, we use the FROM keyword to specify from which table we want the data.

Remember that in our data file – ssa_baby_names--2010_2014--all.sqlite.zip – there are 2 tables:

image

image

Select one column from one table

Let's get right into the syntax. To select the state column from all the rows in the totals table:

SELECT state
FROM totals;

The result as a screenshot:

image

Take particular note of the log window, which gives some statistics about the result:

780 Rows returned from: SELECT state
FROM totals; (took 5ms)

In other words, every row from the totals table was returned, but only the state column.

Let's repeat the previous SELECT statement but now choose the baby_names table:

SELECT state
FROM baby_names;

image

The preview of the returned data rows looks the same as before, but the results log shows that many, many more data rows were returned:

633608 Rows returned from: SELECT state
FROM baby_names; (took 45ms)

Choosing multiple columns from a table

Selecting one column at a time is not very exciting. To select multiple columns, simply list each column's name as a comma-delimited list. Here's 3 columns from baby_names:

SELECT name, state, year
FROM baby_names;

image

The ordering of the columns is dictated by the order of the column names in the SELECT statement:

SELECT year, sex, state, name
FROM baby_names;

image

Error messages

Trying to select non-existent columns

Let's go back to the totals table, but use the same query as above otherwise:

SELECT year, sex, state, name
FROM totals;

The results log should display an error message:

image

The relevant part of that error message is:

  no such column: name

If you browse the totals table, you'll see that there is no name column, hence, the error.

Trying to select columns without a table

Here's a query that will return a similar error of column name not found:

SELECT name, sex, state;

I've completely omitted the FROM clause, which means I haven't specified a table. So the error message is technically true – a name column can't be found. But it's maddeningly vague.

You might be looking at this thinking that only an idiot could ever forget the FROM clause – there's literally only two keywords to remember, right? But when you include the half-dozen clauses (and countless other keywords) we're about to learn, it might surprise you how often you forget to include FROM because you assume the computer can read your mind as to which table you want to access. So, fair warning.

Trying to select from a non-existent table

Again, you might think, what kind of idiot would SELECT FROM a table that doesn't exist? The most common situation is when the table's name is misspelled:

SELECT name, sex
FROM babynames

So now you're thinking: But there's only 2 tables, can't the computer fix it like the way my phone has autocorrect?.

Sorry, but there's no autocorrect when it comes to executing SQL statements.

Using * to select all columns

Sometimes (but not always) it's nice just to select all the columns in a table without having to type all their names out. Use the star operator, i.e. *, after the SELECT keyword:

SELECT *
FROM totals;

image

Other selection minutiae

OK, we've basically covered the core concept: the syntax for selecting data rows from a single table:

SELECT column_a, column_b
FROM my_table;

But how do you select columns from multiple tables? The answer to that is surprisingly complicated – we won't get to it this week.

In the meantime, here are a few things and concepts about SQL syntax that aren't fundamental – i.e. you probably won't memorize them – but that are worth mentioning because they reinforce the strict and peculiar logic of the SQL language that we've seen so far. Also, they will become relevant as we cover more advanced data-wrangling techniques in SQL.

Selecting columns repeatedly

If you can select a column once, you can select it more than once:

SELECT name, year, name, year, name, name
FROM baby_names

image

Note how the number of rows returned is still the number of rows total in the baby_names table, i.e. the number of columns has no effect on the number of rows.

Why would we need to refer to one column multiple times in the same SELECT statement? What if I told you SQL had functions that can transform text in the same way that spreadsheets do?

SELECT name, UPPER(name), LOWER(name)
FROM baby_names

image

We'll get to those functions soon enough…

Selecting values and columns

Remember this query?

SELECT 100;

What happens if we include a FROM clause? And throw in a couple of column names for good measure?

SELECT name, 100, state  FROM baby_names;

The result is the number 100 for every row retrieved from the baby_names table:

image

Why would that ever be useful? To be honest, there's not a ton of obvious use cases for including a constant value in the results. But we may run into its use in real-world scenarios down the road…

Single-quotes and double-quotes

Earlier in this tutorial, we covered the use of single-quotes in denoting string literals, or as I refer to them, text strings with literal values. To review:

SELECT name, 'name' 
FROM baby_names;

image

We already know what happens when we don't enclose name in quotes: it's treated as an identifier for the column, name:

SELECT name, name 
FROM baby_names;

But what if we use double-quotes around name? If you've ever used another programming language in which you can enclose text strings in either single- or double-quotes, you should be curious about it:

SELECT name, "name" 
FROM baby_names;

The result is that SQLite interprets "name" as a column identifier, i.e., the column known by the name of name:

image

To reiterate the difference, try this query (oh, by the way, the case of the letters do not matter for an identifier:

SELECT name, NAME, "NaMe", 'name'
FROM baby_names;

image

Conclusion

If this is your first time with a programming language, all of these details should seem overwhelming. That's fine; as I said before, the most important takeaway is how to SELECT data FROM a table:

SELECT column_a, column_b
FROM my_table;

All the other details in this lesson are described so that you begin to appreciate just how literal and simple-minded the interpreter will be when reading your SQL code. With SQL, we'll become significantly more powerful in our data-crunching ability, but it requires becoming significantly more careful and specific in how we talk to our computers.


References

SQL tutorials from Fall 2014 | fall2014.padjo.org

These tutorials cover the same ground as this year’s SQL tutorials, just with a different dataset.

The official documentation for the SELECT statement.

This tutorial for the 2014 session covers similar ground and uses San Francisco crime data.