The Takeaways

By the end of this tutorial, make sure you can answer these questions:

  • How the LIKE operator works.

  • That LIKE is case insensitive.

  • What the % and _ wildcards stand for.

  • How the IN and NOT IN operators work.

  • How to use BETWEEN

Get the data

For this lesson, download the following file, which is a SQLite database built from the U.S. Social Security Administration file of popular baby names:

Unzip it, and open the sqlite file using the SQLite client of your choice (e.g. DB Browser for SQLite)

Table of contents

The LIKE operator for fuzzy matching

The LIKE operator is used to match text string patterns. In terms of syntax structure, it fits into a boolean expression just as an equals sign normally would:

SELECT * FROM baby_names
  WHERE name LIKE 'Megan';

Its functionality is similar too, though by default, LIKE will match English alphabet characters without regard to capitalization (i.e. case-insensitive):

SELECT * FROM baby_names
  WHERE state LIKE 'ny';

Using LIKE with wildcards

The true power of LIKE comes with the use of wildcards.

Match zero-or-more characters with %

The percentage sign% – is a stand-in for "zero-or-more characters".

In other words, to get all names that begin with Zara, including just Zara:

SELECT * FROM baby_names
  WHERE name LIKE 'Zara%';

Here's a sample of the output:

state sex year name count
USA F 2013 Zara 627
USA F 2013 Zarah 80
USA F 2013 Zarayah 13
USA F 2013 Zarai 9
USA F 2013 Zaraya 9
USA F 2013 Zarahi 8
USA F 2013 Zarae 7
USA F 2013 Zaray 6

To get all names that end in zara, including Zara (don't mind the capitalization below; remember that LIKE will do a case-insensitive match):

SELECT * FROM baby_names
  WHERE name LIKE '%Zara';

A sample of the output:

state sex year name count
USA F 2014 Zara 754
USA F 2014 Jahzara 74
USA F 2014 Chizara 26
USA F 2014 Azara 22
USA F 2014 Jazara 8
USA F 2014 Ozara 5

And to get all names that have Zara in them, somewhere – either at the beginning, the end, or in the middle – including just Zara:

SELECT * FROM baby_names
  WHERE name LIKE '%Zara%';

A sample of the output:

state sex year name count
USA F 2014 Zara 754
USA F 2014 Zarah 91
USA F 2014 Jahzara 74
USA F 2014 Chizara 26
USA F 2014 Chizaram 23
USA F 2014 Azara 22
USA F 2014 Jazarah 18

And one more relatively common variation: Names that begin with za and end in ra, including Zara:

SELECT * FROM baby_names
  WHERE name LIKE 'za%ra';

Sample output:

state sex year name count
USA F 2011 Zara 444
USA F 2011 Zahra 245
USA F 2011 Zaira 153
USA F 2011 Zahara 149

Match any single character with _

The underscore character – _ – is used to match one of any kind of character.

To match all 3-letter names that begin with Jo and end with any character:

SELECT * FROM baby_names
  WHERE name LIKE 'Jo_';

Sample output:

state sex year name count
USA F 2014 Joy 687
USA F 2014 Joi 91
USA M 2014 Joe 488
USA M 2014 Jon 374
USA M 2014 Job 80
USA M 2014 Joy 23

Two important things to note:

  1. The underscore matches only one character, so the results in the above query will only return 3-letter names, not a name such as 'John'
  2. The underscore matches at least one character, so the results do not include 'Jo'

Try running the previous query using % instead of _ to see the difference.

NOT LIKE

The NOT keyword can be used to negate a LIKE operator, similar to what != is to =.

To find all names that begin with the letter J but do not have the letter e in them:

SELECT * from baby_names
WHERE name LIKE 'J%n' 
  AND name NOT LIKE '%e%';

Sample output:

state sex year name count
USA F 2014 Jwan 5
USA M 2014 Jackson 12121
USA M 2014 John 10600
USA M 2014 Jonathan 8035
USA M 2014 Jaxon 7635

Note: There must be better ways to phrase the above query. In fact, I can't think of a time where I've actually used NOT LIKE except just now, which may explain the lameness of my example. But I include NOT LIKE because its existence and functionality is a natural consequence with how NOT and LIKE and the rest of SQLite syntax works. You may not actually use it, but it is there, because of course it's there.

Using IN to match against multiple possibilities

The IN keyword can be seen as a way to clean up multiple OR conditions. The following two queries are equivalent:

SELECT * FROM baby_names
  WHERE state = 'CA'
    OR state = 'NY'
    OR state = 'TX';
SELECT * FROM baby_names
  WHERE state IN('CA', 'NY', 'TX');

Using NOT IN to exclude multiple possibilities

The NOT IN comparator works the same as if we used multiple conditions that used != and were joined with AND (i.e. not OR).

The following query finds all data rows in which the state is neither California, New York, nor Texas:

SELECT * FROM baby_names
  WHERE state != 'CA'
    AND state != 'NY'
    AND state != 'TX';

Or, I guess if you prefer to think of things in OR, this would work too:

SELECT * FROM baby_names
  WHERE NOT(state  ='CA'
            OR state = 'NY'
            OR state = 'TX');

Both of the above formulations is equivalent to this use of NOT IN:

SELECT * FROM baby_names
  WHERE state NOT IN('CA', 'NY', 'TX');

Specifying a range with BETWEEN

The BETWEEN keyword is pretty straightforward. The structure of the clause is:

some_value_or_column BETWEEN lower_bound AND upper_bound 

So instead of this:

SELECT * FROM baby_names
  WHERE year >= 2010 AND year <= 2014

We can write this:

SELECT * FROM baby_names
  WHERE year BETWEEN 2010 AND 2014

Note that BETWEEN is inclusive of both endpoints – e.g. in the above example, rows with year of 2010 or 2014 are also included.

There's a couple of advantages with BETWEEN:

  1. It prevents the all-too common mistake of screwing up the greater-than and less-than signs.
  2. The comparative operation is done only once, as opposed to two separate comparisons (i.e. is x greater than equal to a and is x less than or equal to b). The performance savings is probably not that big of a deal for us, though.

References

The IN and NOT IN operators | SQLite documentation
The LIKE operator | sqlite.org