The Takeaways

By the end of this tutorial, make sure you can answer these questions:

  • Where the WHERE clause fits in relation to SELECT and FROM

  • What the equals sign, =, means in SQL.

  • The use of AND and OR to specify multiple conditions.

  • How to use parentheses to specify the order in which conditions are evaluated.

  • Why SELECT * from baby_names WHERE 1 = 1 will return all rows from the baby_names table.

  • Why SELECT * from baby_names WHERE 1 = 2 will return no rows from the baby_names table.

  • Recognizing the difference between: - 'name' = 'name' - "name" = 'name' - 'name' = 'Name' - "name" = 'Name' - "name" = "Name"

Get the data

For this lesson, download the following file, which is a SQLite database built from the U.S. Social Security Administration file of popular baby names:

Unzip it, and open the sqlite file using the SQLite client of your choice (e.g. DB Browser for SQLite)

Table of contents

Set conditions with WHERE

The WHERE clause allows us to set the conditions by which rows are fetched from data tables. A condition is some kind of expression that evaluates to either true or false.

For example, the following query will return all rows in which the name column is equal to the text string of 'Sarah':

SELECT * 
FROM baby_names
WHERE name = 'Sarah';

For each row in the database, the value in the column name is checked to see if it is equal to 'Sarah'. If true, then the row is included in the result. Otherwise, it is filtered out.

The truncated result:

state sex year name count
AK F 2010 Sarah 16
AL F 2010 Sarah 127
AR F 2010 Sarah 40
AZ F 2010 Sarah 99
CA F 2010 Sarah 696

And that's the gist of it. As we'll see in the next sections, the WHERE clause can be stacked with as many conditional expressions as we want. But no matter how convoluted the WHERE clause can get, the SQLite database will test all of its conditions for each row, and only the rows that pass all the conditions will be returned.

Requiring multiple conditions with AND

The AND keyword is used to specify multiple requirements. For example, if we want only rows that have a name of 'Sarah' and a year of 2012:

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  AND year = 2012;

The truncated results:

state sex year name count
AK F 2012 Sarah 6
AL F 2012 Sarah 101
AR F 2012 Sarah 32
AZ F 2012 Sarah 77
CA F 2012 Sarah 573

We can add as many conditions as we'd like:

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  AND year = 2012
  AND state = 'USA'
state sex year name count
USA F 2012 Sarah 5167
USA M 2012 Sarah 6

Though, of course, the more required conditions we add, the fewer the results that will meet all of the conditions:

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  AND year = 2012
  AND state = 'USA'
  AND sex = 'M'
state sex year name count
USA M 2012 Sarah 6

Mutually exclusive and logically impossible conditions

With the AND keyword, it's very easy to unintentionally set conditions that are impossible for any row to meet:

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  AND name = 'Daniel';

If you execute the above query, it will return 0 rows. This is because each row can only have one name value; thus, while it is possible to find rows in which sex is M and name is Sarah, it is logically impossible to have rows in which the name is both 'Sarah' and 'Daniel'.

Likewise, the following query would return 0 rows:

SELECT * 
FROM baby_names
WHERE name = 'Dan'
  AND name = 'Daniel';

Even though we often think of people named 'Daniel' as also being 'Dan', the database considers them to be two completely separate values (though in a later chapter, we'll see how to match on partial values).

What's very important to realize – so important I'll reiterate the point at the end of this lesson – is that a logically impossible query will not raise an error message. The SQL syntax is perfectly fine in both of the above examples, so the interpreter won't complain.

So when the result of a query is 0 – and you are not expecting it to be 0 – take a careful look at the logic of your conditional expressions.

Specifying optional conditions with OR

While it's impossible for a row to have a name be 'Sarah' and 'Daniel', it is perfectly reasonable for a row to be either 'Sarah' or 'Daniel'. In the example below, I'll include an ORDER BY and LIMIT clause, both of which follow the WHERE clause, to show the first 5 results by largest COUNT:

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  OR name = 'Alyssa'
ORDER BY count DESC
LIMIT 5;

Result:

state sex year name count
USA F 2010 Alyssa 6978
USA F 2010 Sarah 6308
USA F 2011 Alyssa 6001
USA F 2011 Sarah 5541
USA F 2012 Sarah 5167

Unlike AND, the more OR conditions you include, the more the result set could potentially expand.

The following query would return all rows with a name of 'Sarah' or all rows with a sex of 'F' – i.e. all girl names and the relatively few rows that correspond to boys named 'Sarah':

SELECT * 
FROM baby_names
WHERE name = 'Sarah'
  OR sex = 'F'

Use parentheses to explicitly state the order

It's possible to create very complex combinations of conditions even with just AND and OR:

SELECT * 
FROM baby_names
WHERE year = 2014 OR state = 'NY' AND sex = 'F' OR name = 'Madison' 
  AND year = 2011 

So the SQLite syntax does define the order of operations here, i.e. whether the OR conditions are evaluated after the AND conditions. This is similar to how in standard arithmetic, there is a rule that states that the multiplication operation happens before the addition operation:

  10 + 9 * 2

But rather than memorize how the rules of order in SQLite, just use parentheses, for readability, if nothing else. For example, the above arithmetic expression is equivalent to this:

  10 + (9 * 2)

Here's one way to organize the previous SQL statement (note how I use whitespace to make it even more readable):

SELECT * 
FROM baby_names
WHERE (year = 2014 OR state = 'NY' AND sex = 'F') 
  OR (name = 'Madison'  AND year = 2011) 

i.e. Select all rows for girls born in New York in the year 2014; or, all rows for babies named 'Madison' in the year 2011.

Here's another way to rearrange that query with completely different results:

SELECT * 
FROM baby_names
WHERE (year = 2014 OR state = 'NY') 
  AND (sex = 'F' OR name = 'Madison'  AND year = 2011);

I don't even know how to parse the above as something that makes sense in English. But it "works", in that it returns some results, even though I'm not going to waste energy figuring out the implications.

In general, if you can't translate your query into plain English, you may want to rethink what it is you think you are doing.

Be careful with matching text strings

Filtering on numeric values is pretty straightforward:

SELECT * 
FROM baby_names
WHERE year = 2014

But text strings, with how they can be interpreted literally or non-literally, can be the source of errors when trying to perform matches with them.

Case sensitivity

A common error is trying to match a value but incorrectly specifying the case (i.e. upper or lower) of the alphabet letters. The following queries are valid SQL but would return 0 results in the given data set, in which all the baby names have the first-letter capitalized and the state initials upper-cased:

SELECT * FROM baby_names WHERE name = 'sarah'
SELECT * from baby_names WHERE name = 'SARAH'
SELECT * from baby_names WHERE state = 'Ny'

Differentiate between single- and double- quotes

In SQLite (and in standard SQL), a text string enclosed in single quotes is interpreted as a string literal. Ergo, the following query would return all records for babies who were literally named 'Name' (at least according to Social Security Administration records):

SELECT * 
FROM baby_names
WHERE name = 'Name'
state sex year name count
MD F 2010 Name 11
USA F 2010 Name 11
MD M 2010 Name 13
USA M 2010 Name 13
MD M 2011 Name 10
USA M 2011 Name 10

But what happens we enclose 'Name' in double-quotes?

SELECT * 
FROM baby_names
WHERE name = "Name"

The result set will include every record in baby_names. Why? Because by enclosing Name in double-quotes, the interpreter sees "Name" as the column "name" (remember that column identifiers are case-insensitive). So, the conditional expression is basically this:

  Match all rows in which the value in the `name` column is
  equal to the value in the `Name` column, 
  i.e. equal to itself

Granted, this is a pretty weird situation – who names their child "Name" in the first place? But it's just an example of the ambiguity that can crop up in real-world data, and why it's important to memorize some of the specifics of computer language syntax.

An empty result set is not an error

As I said before, a query can return an empty set of results but still be grammatically correct, as far as the SQL interpreter is concerned. The obvious example is when the query contains conditions that can never be true. Here's an easy one:

SELECT * 
FROM baby_names
WHERE 1 = 2;

This is not a condition you would generally ever use in a real-world SQL query, but it is valid: return all rows for which the value of 1 is equal to the value of 2. Well, that is never true, so that query will always return no rows no matter what database you call it on.

Here's another logical impossibility:

SELECT * 
FROM baby_names
WHERE name = "Madison"
  AND year = 2012 
  AND year = 2013;

The query returns 0 results, not because there were no babies named 'Madison' in 2012 or 2013, but because the query is looking for rows in which the year value is simultaneously 2012 and 2013. The author of the above query likely meant to use OR, but the SQLite interpreter will not make that assumption:

SELECT * 
FROM baby_names
WHERE name = "Madison"
  AND (year = 2012 OR year = 2013);

But be careful. Just because a query returns 0 results does not necessarily mean that the query is logically flawed. For example, the following query on this particular dataset will return 0 rows:

SELECT * 
FROM baby_names
WHERE name = "Sue" AND sex = 'M';

It's not logically impossible that somewhere in America, a boy has been named 'Sue'. After all, Shel Silverstein wrote a poem about such a boy and Johnny Cash put it to music. But apparently, not enough parents have chosen "Sue" as a boy's name for it to show up in the data.

In other words, the result has 0 rows because the data just isn't there. But the query itself is perfectly fine.

Finally, there are times when you'll get results in spite of the logic being flawed:

SELECT name, count
FROM baby_names
WHERE state = 'CA' AND year = 2014 OR sex = 'M'
ORDER BY count DESC
LIMIT 1;

The result:

name count
Jacob 22082

Is that the correct answer? Yes, it is most definitely the correct answer for how the query was stated. But this is what the query is in plain English:

Select the top baby name with the most number of babies
among baby names in California for the year 2014, 
  or among all boy names

But what if the query author wanted to find this instead:

Select the top baby name with the most number of babies
among baby names for boys in California for the year 2014.

Then the query should have been this:

SELECT name, count
FROM baby_names
WHERE state = 'CA' AND year = 2014 AND sex = 'M'
ORDER BY count DESC
LIMIT 1;

And the result would be markedly different:

name count
Noah 2761

And yet, in the erroneous version of the query, the SQLite interpreter does not throw an error.

So if you haven't gotten the point by now, here it is: Logic is hard. While you will eventually get good enough at SQLite syntax to avoid making the kind of mistakes that cause the computer to show an error message, the hard part is realizing when you might have made a mistake, even if the computer doesn't tell you so.

In other words, be prepared to become a better logical thinker. Because the computer won't be reading your mind.


References

WHERE clause filtering | SQLite documentation