Table of contents
The SELECT statement
You can guess what the SELECT statement does: it's what we use to query the database for data. From the SQLite documentation:
The result of a SELECT is zero or more rows of data where each row has a fixed number of columns. A SELECT statement does not make any changes to the database.
There's not many kinds of statements in SQL, and 95% of our queries will involve using the SELECT statement. As a corollary – if you read the excerpt above – this means most what we generally do with SQL databases does not alter the data, i.e. creating, updating, and deleting data.
Trying a basic SELECT
Let's get right into executing a query.
Open a database file – I'm assuming you've downloaded and unzipped the following file:
ssa_baby_names–2010_2014–all.sqlite.zip
If you are using the client, DB Browser for SQLite, select the Execute SQL tab and type the following into the first input box:
SELECT 100;
Then hit the Play button (or better yet, the keyboard shortcut: Ctrl + Return).
The result should look something like this (note that I've left off the semi-colon, which is optional here but is generally used to denote the end of a command):
Here's an animated GIF of the process:
From this point on, I’ll be displaying the input and output of these examples as just text, rather than screenshotting the GUI.
This is partly to emphasize that whatever graphical client we use is merely a user-friendly shell for the code we execute. Also, text is a lot easier to produce for these tutorials.
What does it mean?
We asked the database to "select the number 100" and it obliged. The result of that "query" is simply the number 100
, returned as a single column (with a header of 100
) and a single row.
Selecting multiple values
In SQL syntax, we can specify a series of values to select by separating them with commas; note in the example below, it's only the commas that delimit the values; whitespace and newlines have no effect.
SELECT 100, 200, -300,
-42;
The result is just…more values, each in their own column:
100 |
200 |
-300 |
-42 |
100 |
200 |
-300 |
-42 |
Specifying literal text strings
We can specify literal text strings by enclosing a word in single quotes (i.e. apostrophes):
SELECT 'hello', 'world'
The result:
'hello' |
'world' |
hello |
world |
I re-emphasize the term literal here; the previous query selected the literal words 'hello'
and 'world'
, separately. What if we wanted to select the phrase, 'hello, world'
?
We just enclose the phrase in single quotes, which specifies that we want hello
and world
again, literally, but as a single string connected with a literal comma and space:
SELECT 'hello, world';
Result:
'hello, world' |
hello, world |
Text strings without the quotes?
What happens if you omit the single quotes, as we did in the first example queries with just numbers (e.g. SELECT 100, 200, 300
)?
SELECT hello
You should get an error message:
Error: no such column: hello
Without the single-quotes, the SQLite interpreter interprets hello
as something non-literal, such as the the name of a column or some other SQLite-specific command. So when I use the term literal, I mean, just the plain, ordinary value that a text string has for humans, rather than the special meaning that it might have for the computer.
If it's not obvious by now, the word SELECT
has a special meaning to the SQLite interpreter, as it is a keyword in the syntax of the SQLite language. The following query would return an error, because it doesn't make sense to the interpreter:
SELECT SELECT
What if we wanted to SELECT
the literal text string of 'SELECT'
? Again, by using the single-quotes, we specify to the interpreter that we want the word "SELECT", literally:
SELECT 'SELECT'
Are you literally tired of seeing the word literal explained over and over?? I belabor the point here because not knowing when to use single-quotes – or if you're new to programming in general, the importance concept of computer syntax – will bite you very hard, very soon. But don't worry, we'll reiterate the concept by the end of the lesson.
Using the FROM
clause to select data from tables
So selecting values such as 'hello'
and 100
is not very exciting, nor does it make much sense. What we really want to do is select values from our data tables. To do that, we use the FROM
keyword to specify from which table we want the data.
Remember that in our data file – ssa_baby_names--2010_2014--all.sqlite.zip
– there are 2 tables:
baby_names
– which has 600,000+ rows for every combination of baby name, sex, year, and state:
totals
, which has a mere 780 rows representing every combination of year, state, and sex:
Select one column from one table
Let's get right into the syntax. To select the state
column from all the rows in the totals
table:
SELECT state
FROM totals;
The result as a screenshot:
Take particular note of the log window, which gives some statistics about the result:
780 Rows returned from: SELECT state
FROM totals; (took 5ms)
In other words, every row from the totals
table was returned, but only the state
column.
Let's repeat the previous SELECT
statement but now choose the baby_names
table:
SELECT state
FROM baby_names;
The preview of the returned data rows looks the same as before, but the results log shows that many, many more data rows were returned:
633608 Rows returned from: SELECT state
FROM baby_names; (took 45ms)
Choosing multiple columns from a table
Selecting one column at a time is not very exciting. To select multiple columns, simply list each column's name as a comma-delimited list. Here's 3 columns from baby_names
:
SELECT name, state, year
FROM baby_names;
The ordering of the columns is dictated by the order of the column names in the SELECT
statement:
SELECT year, sex, state, name
FROM baby_names;
Error messages
Trying to select non-existent columns
Let's go back to the totals
table, but use the same query as above otherwise:
SELECT year, sex, state, name
FROM totals;
The results log should display an error message:
The relevant part of that error message is:
no such column: name
If you browse the totals
table, you'll see that there is no name
column, hence, the error.
Trying to select columns without a table
Here's a query that will return a similar error of column name
not found:
SELECT name, sex, state;
I've completely omitted the FROM
clause, which means I haven't specified a table. So the error message is technically true – a name
column can't be found. But it's maddeningly vague.
You might be looking at this thinking that only an idiot could ever forget the FROM
clause – there's literally only two keywords to remember, right? But when you include the half-dozen clauses (and countless other keywords) we're about to learn, it might surprise you how often you forget to include FROM
because you assume the computer can read your mind as to which table you want to access. So, fair warning.
Trying to select from a non-existent table
Again, you might think, what kind of idiot would SELECT FROM a table that doesn't exist? The most common situation is when the table's name is misspelled:
SELECT name, sex
FROM babynames
So now you're thinking: But there's only 2 tables, can't the computer fix it like the way my phone has autocorrect?.
Sorry, but there's no autocorrect when it comes to executing SQL statements.
Using * to select all columns
Sometimes (but not always) it's nice just to select all the columns in a table without having to type all their names out. Use the star operator, i.e. *
, after the SELECT
keyword:
SELECT *
FROM totals;
Other selection minutiae
OK, we've basically covered the core concept: the syntax for selecting data rows from a single table:
SELECT column_a, column_b
FROM my_table;
But how do you select columns from multiple tables? The answer to that is surprisingly complicated – we won't get to it this week.
In the meantime, here are a few things and concepts about SQL syntax that aren't fundamental – i.e. you probably won't memorize them – but that are worth mentioning because they reinforce the strict and peculiar logic of the SQL language that we've seen so far. Also, they will become relevant as we cover more advanced data-wrangling techniques in SQL.
Selecting columns repeatedly
If you can select a column once, you can select it more than once:
SELECT name, year, name, year, name, name
FROM baby_names
Note how the number of rows returned is still the number of rows total in the baby_names
table, i.e. the number of columns has no effect on the number of rows.
Why would we need to refer to one column multiple times in the same SELECT
statement? What if I told you SQL had functions that can transform text in the same way that spreadsheets do?
SELECT name, UPPER(name), LOWER(name)
FROM baby_names
We'll get to those functions soon enough…
Selecting values and columns
Remember this query?
SELECT 100;
What happens if we include a FROM
clause? And throw in a couple of column names for good measure?
SELECT name, 100, state FROM baby_names;
The result is the number 100
for every row retrieved from the baby_names
table:
Why would that ever be useful? To be honest, there's not a ton of obvious use cases for including a constant value in the results. But we may run into its use in real-world scenarios down the road…
Single-quotes and double-quotes
Earlier in this tutorial, we covered the use of single-quotes in denoting string literals, or as I refer to them, text strings with literal values. To review:
SELECT name, 'name'
FROM baby_names;
We already know what happens when we don't enclose name
in quotes: it's treated as an identifier for the column, name
:
SELECT name, name
FROM baby_names;
But what if we use double-quotes around name
? If you've ever used another programming language in which you can enclose text strings in either single- or double-quotes, you should be curious about it:
SELECT name, "name"
FROM baby_names;
The result is that SQLite interprets "name"
as a column identifier, i.e., the column known by the name of name
:
To reiterate the difference, try this query (oh, by the way, the case of the letters do not matter for an identifier:
SELECT name, NAME, "NaMe", 'name'
FROM baby_names;
Conclusion
If this is your first time with a programming language, all of these details should seem overwhelming. That's fine; as I said before, the most important takeaway is how to SELECT
data FROM
a table:
SELECT column_a, column_b
FROM my_table;
All the other details in this lesson are described so that you begin to appreciate just how literal and simple-minded the interpreter will be when reading your SQL code. With SQL, we'll become significantly more powerful in our data-crunching ability, but it requires becoming significantly more careful and specific in how we talk to our computers.