Table of contents
The LIKE operator for fuzzy matching
The LIKE
operator is used to match text string patterns. In terms of syntax structure, it fits into a boolean expression just as an equals sign normally would:
SELECT * FROM baby_names
WHERE name LIKE 'Megan';
Its functionality is similar too, though by default, LIKE
will match English alphabet characters without regard to capitalization (i.e. case-insensitive):
SELECT * FROM baby_names
WHERE state LIKE 'ny';
Using LIKE with wildcards
The true power of LIKE
comes with the use of wildcards.
Match zero-or-more characters with %
The percentage sign – %
– is a stand-in for "zero-or-more characters".
In other words, to get all names that begin with Zara
, including just Zara
:
SELECT * FROM baby_names
WHERE name LIKE 'Zara%';
Here's a sample of the output:
state |
sex |
year |
name |
count |
USA |
F |
2013 |
Zara |
627 |
USA |
F |
2013 |
Zarah |
80 |
USA |
F |
2013 |
Zarayah |
13 |
USA |
F |
2013 |
Zarai |
9 |
USA |
F |
2013 |
Zaraya |
9 |
USA |
F |
2013 |
Zarahi |
8 |
USA |
F |
2013 |
Zarae |
7 |
USA |
F |
2013 |
Zaray |
6 |
To get all names that end in zara
, including Zara
(don't mind the capitalization below; remember that LIKE
will do a case-insensitive match):
SELECT * FROM baby_names
WHERE name LIKE '%Zara';
A sample of the output:
state |
sex |
year |
name |
count |
USA |
F |
2014 |
Zara |
754 |
USA |
F |
2014 |
Jahzara |
74 |
USA |
F |
2014 |
Chizara |
26 |
USA |
F |
2014 |
Azara |
22 |
USA |
F |
2014 |
Jazara |
8 |
USA |
F |
2014 |
Ozara |
5 |
And to get all names that have Zara
in them, somewhere – either at the beginning, the end, or in the middle – including just Zara
:
SELECT * FROM baby_names
WHERE name LIKE '%Zara%';
A sample of the output:
state |
sex |
year |
name |
count |
USA |
F |
2014 |
Zara |
754 |
USA |
F |
2014 |
Zarah |
91 |
USA |
F |
2014 |
Jahzara |
74 |
USA |
F |
2014 |
Chizara |
26 |
USA |
F |
2014 |
Chizaram |
23 |
USA |
F |
2014 |
Azara |
22 |
USA |
F |
2014 |
Jazarah |
18 |
And one more relatively common variation: Names that begin with za
and end in ra
, including Zara
:
SELECT * FROM baby_names
WHERE name LIKE 'za%ra';
Sample output:
state |
sex |
year |
name |
count |
USA |
F |
2011 |
Zara |
444 |
USA |
F |
2011 |
Zahra |
245 |
USA |
F |
2011 |
Zaira |
153 |
USA |
F |
2011 |
Zahara |
149 |
Match any single character with _
The underscore character – _
– is used to match one of any kind of character.
To match all 3-letter names that begin with Jo
and end with any character:
SELECT * FROM baby_names
WHERE name LIKE 'Jo_';
Sample output:
state |
sex |
year |
name |
count |
USA |
F |
2014 |
Joy |
687 |
USA |
F |
2014 |
Joi |
91 |
USA |
M |
2014 |
Joe |
488 |
USA |
M |
2014 |
Jon |
374 |
USA |
M |
2014 |
Job |
80 |
USA |
M |
2014 |
Joy |
23 |
Two important things to note:
- The underscore matches only one character, so the results in the above query will only return 3-letter names, not a name such as
'John'
- The underscore matches at least one character, so the results do not include
'Jo'
Try running the previous query using %
instead of _
to see the difference.
NOT LIKE
The NOT
keyword can be used to negate a LIKE
operator, similar to what !=
is to =
.
To find all names that begin with the letter J
but do not have the letter e
in them:
SELECT * from baby_names
WHERE name LIKE 'J%n'
AND name NOT LIKE '%e%';
Sample output:
state |
sex |
year |
name |
count |
USA |
F |
2014 |
Jwan |
5 |
USA |
M |
2014 |
Jackson |
12121 |
USA |
M |
2014 |
John |
10600 |
USA |
M |
2014 |
Jonathan |
8035 |
USA |
M |
2014 |
Jaxon |
7635 |
Note: There must be better ways to phrase the above query. In fact, I can't think of a time where I've actually used NOT LIKE
except just now, which may explain the lameness of my example. But I include NOT LIKE
because its existence and functionality is a natural consequence with how NOT
and LIKE
and the rest of SQLite syntax works. You may not actually use it, but it is there, because of course it's there.
Using IN
to match against multiple possibilities
The IN
keyword can be seen as a way to clean up multiple OR
conditions. The following two queries are equivalent:
SELECT * FROM baby_names
WHERE state = 'CA'
OR state = 'NY'
OR state = 'TX';
SELECT * FROM baby_names
WHERE state IN('CA', 'NY', 'TX');
Using NOT IN
to exclude multiple possibilities
The NOT IN
comparator works the same as if we used multiple conditions that used !=
and were joined with AND
(i.e. not OR
).
The following query finds all data rows in which the state is neither California, New York, nor Texas:
SELECT * FROM baby_names
WHERE state != 'CA'
AND state != 'NY'
AND state != 'TX';
Or, I guess if you prefer to think of things in OR
, this would work too:
SELECT * FROM baby_names
WHERE NOT(state ='CA'
OR state = 'NY'
OR state = 'TX');
Both of the above formulations is equivalent to this use of NOT IN
:
SELECT * FROM baby_names
WHERE state NOT IN('CA', 'NY', 'TX');
Specifying a range with BETWEEN
The BETWEEN
keyword is pretty straightforward. The structure of the clause is:
some_value_or_column BETWEEN lower_bound AND upper_bound
So instead of this:
SELECT * FROM baby_names
WHERE year >= 2010 AND year <= 2014
We can write this:
SELECT * FROM baby_names
WHERE year BETWEEN 2010 AND 2014
Note that BETWEEN
is inclusive of both endpoints – e.g. in the above example, rows with year
of 2010
or 2014
are also included.
There's a couple of advantages with BETWEEN
:
- It prevents the all-too common mistake of screwing up the greater-than and less-than signs.
- The comparative operation is done only once, as opposed to two separate comparisons (i.e. is
x
greater than equal to a
and is x
less than or equal to b
). The performance savings is probably not that big of a deal for us, though.