Getting Started with the IIS Manager in IIS
Previous chapters discussed the important topics of organizing your
tables well and getting data in to them. In this chapter, we will cover a
key objective that makes the others pay off: retrieving the data stored in a
database. This is commonly called a database query.
The simplest way to retrieve data from a MySQL or MariaDB
database— to select data—is to use the SQL statement,
SELECT
. We used this SQL statement a few times in previous
chapters. In this chapter, we will cover it in greater detail. It’s not
necessary to know or use all of the may options, but some techniques such as
joining tables together are basic to using relational databases.
We’ll begin this chapter by reviewing the basics of the
SELECT
statement, and then progress to more involved variants.
When you finish this chapter, you will hopefully have a good understanding
of how to use SELECT
for most of your needs as you start out as
a database developer, as well as be prepared for the many possibilities and
special situations that may arise over the years of developing databases
with MySQL and MariaDB.
In previous chapters, especially in the exercises, you were asked to
enter data into the tables that we created and altered in the chapters of
the previous part of this book. Entering data on your own was good for
training purposes, but we now need much more data in our database to better
appreciate the examples in this chapter. If you haven’t done so already, go
to this book’s website
and download the dump files that contain tables of data.
Download rookery.sql to get the
whole rookery
database, with plenty of data for use in our
explorations. Once you have the dump file on your system (let’s assume you
put it in /tmp/rookery.sql), enter the
following from the command line:
mysql --user='your_name' -p \
rookery < /tmp/rookery.sql
The command prompts for your password, logs you in using the username
assigned to you, and runs the statements in the rookery.sql file on the rookery
database. If everything goes well, there should be no message in response,
just the command-line prompt when it’s finished.
Basic Selection
The basic elements of the syntax for the SELECT
statement are the
SELECT
keyword, the column you want to select, and the table
from which to retrieve the data:
SELECT column FROM table;
If you want to select more than one column, list them separated by commas. If you want to select all of the columns in a table,
you can use the asterisk as a wildcard instead of listing all of the
columns. Let’s use the rookery
database you just loaded with
data to see a practical example of this basic syntax. Enter the following
SQL statement in mysql to get a list of
all of the columns and rows in the birds
table:
USE rookery;
SELECT * FROM birds;
This is the most minimal SELECT
statement that you can
execute successfully. It tells MySQL to retrieve all of the data contained
in the birds
table. It displays the columns in the order you
defined them in the table’s CREATE
or ALTER
statements, and displays rows in the order they are found in the table,
which is usually the order that the data was entered into the
table.
To select only certain columns, do something like this:
SELECT bird_id, scientific_name, common_name
FROM birds;
This SELECT
statement selects only three columns from
each row found in the birds
table. There are also many ways
to choose particular rows, change the order in which they are displayed,
and limit the number shown. These are covered in the following sections of
this chapter.
Selecting by a Criteria
Suppose that we want to select only birds of a certain family, say the
Charadriidae (i.e., Plovers). Looking in the
bird_families
table, we find that its family_id
is 103. Using a WHERE
clause with the SELECT
statement, we can retrieve a list of birds from the birds
table for this particular family of birds like so:
SELECT common_name, scientific_name
FROM birds WHERE family_id = 103
LIMIT 3;
+----------------------+-------------------------+
| common_name | scientific_name |
+----------------------+-------------------------+
| Mountain Plover | Charadrius montanus |
| Snowy Plover | Charadrius alexandrinus |
| Black-bellied Plover | Pluvialis squatarola |
+----------------------+-------------------------+
This SELECT
statement requests two columns, in a
different order from the way the data is listed in the table—in the table
itself, scientific_name
precedes common_name
. I
also added the LIMIT
clause to keep the results down to
the first three rows in the table. We’ll talk more about the
LIMIT
clause in a little while.
Note
Because we separated families into a separate table, you had to
look at the bird_families
table to get the right ID before
selecting birds from the birds
table. That seems
round-about. There is a streamlined way to ask for a family name such as
Charadriidae instead of a number. They’re called
joins. We’ll cover them later.
This is all fairly straightforward and in line with what we’ve seen
in several other examples in previous chapters. Let’s move on and take a
look at how to change the order of the results.
Ordering Results
The previous example selected specific columns from the birds
table and
limited the results with the LIMIT
clause. However, the rows
were listed in whatever order they were found in the table. We’ve decided
to see only a tiny subset of the birds in the
Charadriidae family, so ordering can make a
difference. If we want to put the results in alphabetical order based on
the values of the common_name
column, we add an ORDER
clause like this:
BY
SELECT common_name, scientific_name
FROM birds WHERE family_id = 103
ORDER BY common_name
LIMIT 3;
+-----------------------+----------------------+
| common_name | scientific_name |
+-----------------------+----------------------+
| Black-bellied Plover | Pluvialis squatarola |
| Mountain Plover | Charadrius montanus |
| Pacific Golden Plover | Pluvialis fulva |
+-----------------------+----------------------+
Notice that the ORDER BY
clause is located after
the WHERE
clause and before the
LIMIT
clause. Not only will this statement display the rows
in order by common_name
, but it will retrieve only the first
three rows based on the ordering. That is to say, MySQL will first
retrieve all of the rows based on the WHERE
clause, store
those results in a temporary table behind the scenes, order the data based
on the ORDER BY
clause, and then return to the mysql client the first three rows found in that
temporary table based on the LIMIT
clause. This activity is
the reason for the positioning of each clause.
By default, the ORDER BY
clause uses ascending order,
which means from A to Z for an alphabetic column. If you want to display
data in descending order, add the DESC
option, as in ORDER BY
. There’s also a contrasting
DESC ASC
option, but you
probably won’t need to use it because ascending order is the
default.
To order by more than one column, give all the columns in the
ORDER BY
clause in a comma-separated list. Each column can be
sorted in ascending or descending order. The clause sorts all the data by
the first column you specify, and then within that order by the second
column, etc. To illustrate this, we’ll select another column from the
birds
table, family_id
, and we’ll get birds from
a few more families. We’ll select some other types of shore birds:
Oystercatchers (i.e., Haematopodidae), Stilts (e.g.,
Recurvirostridae), and Sandpipers (e.g.,
Scolopacidae). First, we need the
family_id
for each of these families. Execute the following
on your server:
SELECT * FROM bird_families
WHERE scientific_name
IN('Charadriidae','Haematopodidae','Recurvirostridae','Scolopacidae');
+-----------+------------------+------------------------------+----------+
| family_id | scientific_name | brief_description | order_id |
+-----------+------------------+------------------------------+----------+
| 103 | Charadriidae | Plovers, Dotterels, Lapwings | 102 |
| 160 | Haematopodidae | Oystercatchers | 102 |
| 162 | Recurvirostridae | Stilts and Avocets | 102 |
| 164 | Scolopacidae | Sandpipers and Allies | 102 |
+-----------+------------------+------------------------------+----------+
In this SELECT
statement, we added another item to the
WHERE
clause, the IN
operator. It lists, within parentheses,
the various values we want in the scientific_name
column.
Let’s use the IN
operator again to get a list of birds and
also test the LIMIT
clause:
SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
ORDER BY common_name
LIMIT 3;
+-------------+--------------------------------+-----------+
| common_name | scientific_name | family_id |
+-------------+--------------------------------+-----------+
| | Charadrius obscurus aquilonius | 103 |
| | Numenius phaeopus phaeopus | 164 |
| | Tringa totanus eurhinus | 164 |
+-------------+--------------------------------+-----------+
Notice that we didn’t put the numeric values in quotes as we did
with the family names in the previous SQL statement. Single or double
quotes are necessary for strings, but they’re optional for numeric values.
However, it’s a better practice to not use quotes around numeric values.
They can affect performance and cause incorrect results if you mix them
with strings.
There is one odd thing about the results here: there aren’t any
common names for the birds returned. That’s not a mistake. About 10,000
birds in the birds
table are true species of birds, and about
20,000 are subspecies. Many subspecies don’t have a unique common name.
With about 30,000 species and subspecies of birds, with all of the minor
nuances between the subspecies bird families, there just aren’t common
names for all of them. Each bird has a scientific name assigned by
ornithologists, but everyday people who use the common names for birds
don’t see the subtle distinctions that ornithologists see. This is why the
scientific_name
column is necessary and why the
common_name
column cannot be a key column in the
table.
Let’s execute that SQL statement again, but add another factor to
the WHERE
clause to show only birds with a value for the
common_name
column:
SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
AND common_name != ''
ORDER BY common_name
LIMIT 3;
+-----------------------+-----------------------+-----------+
| common_name | scientific_name | family_id |
+-----------------------+-----------------------+-----------+
| African Oystercatcher | Haematopus moquini | 160 |
| African Snipe | Gallinago nigripennis | 164 |
| Amami Woodcock | Scolopax mira | 164 |
+-----------------------+-----------------------+-----------+
In the WHERE
clause, we added the AND
logical operator to specify a
second filter. For a row to match the WHERE
clause, the
family_id
must be one in the list given and the
common_name
must not be equal to a blank value.
Nonprogrammers will have to learn a few conventions to use large
WHERE
clauses. We’ve seen that an equals sign says, “The
column must contain this value,” but the !=
construct
says, “The column must not contain this value.” And in our
statement, we used ''
to refer to an empty string. So we’ll
get the rows where the common name exists.
In this case, we couldn’t ask for non-NULL columns. We could have
set up the table so that birds without common names had NULL in the
common_name
column, but we chose to instead use empty
strings. That’s totally different in meaning: NULL means there is no
value, whereas the empty string is still a string even if there are no
characters in it. We could have used NULL, but having chosen the empty
string, we must use the right value in our WHERE
clause.
Incidentally, !=
is the same as <>
(i.e., less-than sign followed by greater-than sign).
Limiting Results
The birds
table has nearly 30,000 rows, so selecting
data without limits can return more rows than you might want to view at a
time. We’ve already used the LIMIT
clause to resolve
this problem. We limited the results of the SELECT
statement
to three rows, the first three rows based on the WHERE
and
ORDER BY
clauses. If we’d like to see the subsequent rows,
maybe the next two based on the criteria we gave previously, we could
change the LIMIT
clause to show five rows. But an
alternative, which is often a better choice, is to do something like
this:
SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
AND common_name != ''
ORDER BY common_name
LIMIT 3, 2;
+------------------------+-------------------------+-----------+
| common_name | scientific_name | family_id |
+------------------------+-------------------------+-----------+
| American Avocet | Recurvirostra americana | 162 |
| American Golden-Plover | Pluvialis dominica | 103 |
+------------------------+-------------------------+-----------+
This LIMIT
clause has two values: the point where we
want the results to begin, then the number of rows to display. The result
is to show rows 3 and 4. Incidentally, LIMIT 3
used
previously is the same as LIMIT 0, 3
: the 0 tells MySQL not
to skip any rows.
Combining Tables
So far in this chapter we’ve been working with just one table. Let’s look at some ways to
select data from more than one table. To do this, we will have to tell
MySQL the tables from which we want data and how to join them
together.
For an example, let’s get a list of birds with their family names.
To keep the query simple, we’ll select birds from different families, but
all in the same order of birds. In earlier examples where we got a list of
shore birds, they all had the same order_id
of 102. We’ll use
that value again. Enter this SELECT
statement on your
server:
SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family'
FROM birds, bird_families
WHERE birds.family_id = bird_families.family_id
AND order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10;
+------------------------+------------------+
| Bird | Family |
+------------------------+------------------+
| African Jacana | Jacanidae |
| African Oystercatcher | Haematopodidae |
| African Skimmer | Laridae |
| African Snipe | Scolopacidae |
| Aleutian Tern | Laridae |
| Amami Woodcock | Scolopacidae |
| American Avocet | Recurvirostridae |
| American Golden-Plover | Charadriidae |
| American Oystercatcher | Haematopodidae |
| American Woodcock | Scolopacidae |
+------------------------+------------------+
This SELECT
statement returns one column from the
birds
table and one from the bird_families
table. This is a hefty SQL statement, but don’t let it fluster you. It’s
like previous statements in this chapter, but with some minor changes and
one significant one. First, let’s focus on the one significant change: how
we’ve drawn data from two tables.
The FROM
clause lists the two tables, separated by a
comma. In the WHERE
clause, we indicated that we want rows in
which the value of family_id
in the two tables is equal.
Otherwise, we would have duplicate rows in the results. Because those
columns have the same name ( family_id
) in both tables, to
prevent ambiguity, we put the table name before the colum name, separated
by a dot (e.g., birds.family_id
). We did the same thing for
the scientific name in the column list
( bird_families.scientific_name
). If we don’t do that, MySQL
would be confused as to whether we want the scientific_name
from the birds
or the bird_families
table. This
would generate an error like this:
ERROR 1052 (23000): Column 'scientific_name' in field list is ambiguous
You may have noticed that another new item was added to the SELECT
statement: the AS
keyword. This specifies a substitute name, or
alias, for the heading in the results set for the
column. Without the AS
keyword for the column containing the
family names, the heading would say
bird_families.scientific_name
. That’s not as attractive. This
is another style factor, but it can have more practical aspects that we’ll
see later. The keyword AS
can also be used to specify a table
name like so:
SELECT common_name AS 'Bird',
families.scientific_name AS 'Family'
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10;
In this example, we provided an alias for the
bird_families
table. We set it to the shorter name
families
. Note that aliases for table names must not be in
quotes.
After setting the alias, we must use it wherever we want to refer to
the table. So we have to change the column selected in the field list from
bird_families.scientific_name
to
families.scientific_name
. We also have to change the column
name bird_families.family_id
in the WHERE
clause
to families.family_id
. If we don’t make this final change,
we’ll get the following error:
ERROR 1054 (42S22):
Unknown column 'bird_families.family_id' in 'where clause'
Let’s add a third table to the previous SQL statement, to get the
name of the order of birds to which the birds belong. You can do that by
entering this SQL statement on your server:
SELECT common_name AS 'Bird',
families.scientific_name AS 'Family',
orders.scientific_name AS 'Order'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND families.order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10, 5;
+------------------+------------------+-----------------+
| Bird | Family | Order |
+------------------+------------------+-----------------+
| Ancient Murrelet | Alcidae | Charadriiformes |
| Andean Avocet | Recurvirostridae | Charadriiformes |
| Andean Gull | Laridae | Charadriiformes |
| Andean Lapwing | Charadriidae | Charadriiformes |
| Andean Snipe | Scolopacidae | Charadriiformes |
+------------------+------------------+-----------------+
Let’s look at the changes from the previous statement to this one.
We added the third table to the FROM
clause and gave it an alias of orders
. To properly
connect the third table, we had to add another evaluator to the
WHERE
clause: families.order_id =
. This allows the
orders.order_id SELECT
to retrieve
the right rows containing the scientific names of the orders that
correspond to the rows we select from the families. We also added a column
to the field list to display the name of the order. Because the families
we’ve selected are all from the same order, that field seems a little
pointless in these results but can be useful as we search more orders in
the future. We gave a starting point for the LIMIT
clause so
that we could see the next five birds in the results.
Note
It’s not necessary to put the field alias name for a column in
quotes if the alias is only one word. However, if you use a reserved
word (e.g., Order), you will need to use
quotes.
Expressions and the Like
Let’s change the latest SELECT
statement to include birds from
multiple orders. To do this, we’ll focus in on the operator in the WHERE
clause for the common_name
:
AND common_name != ''
We’ll change the simple comparison here (i.e., the LIKE
operator, which we saw in Chapter 6) to
select multiple names that are similar. Among many families of birds,
there are often bird species that are similar but have different sizes.
The smallest is sometimes referred to as the least in
the common name. So let’s search the database for birds with
Least in their name:
SELECT common_name AS 'Bird',
families.scientific_name AS 'Family',
orders.scientific_name AS 'Order'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND common_name LIKE 'Least%'
ORDER BY orders.scientific_name, families.scientific_name, common_name
LIMIT 10;
+------------------+---------------+------------------+
| Bird | Family | Order |
+------------------+---------------+------------------+
| Least Nighthawk | Caprimulgidae | Caprimulgiformes |
| Least Pauraque | Caprimulgidae | Caprimulgiformes |
| Least Auklet | Alcidae | Charadriiformes |
| Least Tern | Laridae | Charadriiformes |
| Least Sandpiper | Scolopacidae | Charadriiformes |
| Least Seedsnipe | Thinocoridae | Charadriiformes |
| Least Flycatcher | Tyrannidae | Passeriformes |
| Least Bittern | Ardeidae | Pelecaniformes |
| Least Honeyguide | Indicatoridae | Piciformes |
| Least Grebe | Podicipedidae | Podicipediformes |
+------------------+---------------+------------------+
In the preceding example, using the LIKE
operator,
MySQL selected rows in which the common_name
starts with
Least and ends with anything (i.e., the wildcard,
%
). We also removed the families.order_id = 102
clause, so that we wouldn’t limit the birds to a single order. The results
now have birds from a few different orders.
We also changed the ORDER BY
clause to have MySQL order the results in the temporary table first by
the bird order’s scientific name, then by the bird family’s scientific
name, and then by the bird’s common name. If you look at the results, you
can see that’s what it did: it sorted the orders first. If you look at the
rows for the Charadriiformes, you can see that the
families for that order are in alphabetical order. The two birds in the
Caprimulgidae family are in alphabetical
order.
Note
You cannot use alias names for columns in the ORDER
clause, but you can use alias table names. In fact, they’re
BY
required if you’ve used the aliases in the FROM
clause.
The previous example used the LIKE
operator, which has
limited pattern matching abilities. As an alternative, you can use
REGEXP
, which has many pattern matching characters and
classes. Let’s look at a simpler example, of the previous SELECT
statement, but using
REGEXP
. In the previous example we searched for small birds,
birds with a common name starting with the word
Least. The largest bird in a family is typically
called Great. To add these birds, enter the following
SQL statement on your server:
SELECT common_name AS 'Birds Great and Small'
FROM birds
WHERE common_name REGEXP 'Great|Least'
ORDER BY family_id LIMIT 10;
+-----------------------------+
| Birds Great and Small |
+-----------------------------+
| Great Northern Loon |
| Greater Scaup |
| Greater White-fronted Goose |
| Greater Sand-Plover |
| Great Crested Tern |
| Least Tern |
| Great Black-backed Gull |
| Least Nighthawk |
| Least Pauraque |
| Great Slaty Woodpecker |
+-----------------------------+
The expression we’re giving with REGEXP
, within the
quote marks, contains two string values: Great and
Least. By default, MySQL assumes the text given for
REGEXP
is meant to be for the start of the string. To be
emphatic, you can insert a carat (i.e., ^
) at the start of these string
values, but it’s unnecessary. The vertical bar (i.e., |
) between the two
expressions signifies that either value is acceptable—it means
or.
In the results, you can see some common bird names starting with
Greater, not just Great. If we
don’t want to include the Greater birds, we can
exclude them with the NOT REGEXP
operator. Enter the following on
your server:
SELECT common_name AS 'Birds Great and Small'
FROM birds
WHERE common_name REGEXP 'Great|Least'
AND common_name NOT REGEXP 'Greater'
ORDER BY family_id LIMIT 10;
+--------------------------+
| Birds Great and Small |
+--------------------------+
| Great Northern Loon |
| Least Tern |
| Great Black-backed Gull |
| Great Crested Tern |
| Least Nighthawk |
| Least Pauraque |
| Great Slaty Woodpecker |
| Great Spotted Woodpecker |
| Great Black-Hawk |
| Least Flycatcher |
+--------------------------+
Using NOT REGEXP
eliminated all of the
Greater birds. Notice that it was included with
AND
, and not as part of the REGEXP
.
Incidentally, we’re ordering here by family_id
to keep
similar birds together in the list and to have a good mix of
Great and Least birds. The
results may seem awkward, though, as the names of the birds are not
ordered. We could add another column to the ORDER BY
clause
to alphabetize them within each family.
REGEXP
and NOT REGEXP
are case
insensitive. If we want an expression to be case sensitive, we’ll need to
add the BINARY
option. Let’s get another list of birds to see this. This time we’ll
search for Hawks, with the first letter in uppercase. This is because we
want only Hawks and not other birds that have the word,
hawk in their name, but are not a Hawk. For instance,
we don’t want Nighthawks and we don’t want Hawk-Owls. The way the data is
in the birds
table, each word of a common name starts with an
uppercase letter—the names are in title case. So we’ll eliminate birds
such as Nighthawks by using the BINARY
option to require that
“Hawk” be spelled with an uppercase H and the other
letters in lowercase. We’ll use NOT REGEXP
to not allow
Hawk-Owls. Try the following on your server:
SELECT common_name AS 'Hawks'
FROM birds
WHERE common_name REGEXP BINARY 'Hawk'
AND common_name NOT REGEXP 'Hawk-Owl'
ORDER BY family_id LIMIT 10;
+-------------------+
| Hawks |
+-------------------+
| Red-tailed Hawk |
| Bicolored Hawk |
| Common Black-Hawk |
| Cuban Black-Hawk |
| Rufous Crab Hawk |
| Great Black-Hawk |
| Black-faced Hawk |
| White-browed Hawk |
| Ridgway's Hawk |
| Broad-winged Hawk |
+-------------------+
I stated that REGEXP
and NOT REGEXP
are
case insensitive, unless you add the BINARY
option as we did to stipulate the collating method as binary (e.g., the
letter H has a different binary value fromn the
letter h). For the common_name
column,
though, we didn’t need to add the BINARY
option because the
column has a binary collation setting. We did this unknowingly when we
created the rookery
database near the beginning of Chapter 4. See how we created the database by
entering this from the mysql
client:
SHOW CREATE DATABASE rookery \G
*************************** 1. row ***************************
Database: rookery
Create Database: CREATE DATABASE `rookery` /*!40100 DEFAULT
CHARACTER SET latin1 COLLATE latin1_bin */
The COLLATE
clause is set to latin1_bin
, meaning Latin1 binary. Any columns that we create in tables in the
rookery
database, unless we specify otherwise, will be
collated using latin1_bin
. Execute the following statement to
see how the common_name
column in the birds
table is set:
SHOW FULL COLUMNS
FROM birds LIKE 'common_name' \G
*************************** 1. row ***************************
Field: common_name
Type: varchar(255)
Collation: latin1_bin
Null: YES
Key:
Default: NULL
Extra:
Privileges: select,insert,update,references
Comment:
This shows information just on the common_name
column.
Notice that the Collation is latin1_bin
.
Because of that, regular expressions using REGEXP
are case
sensitive without having to add the BINARY
option.
Looking through the birds
table, we discover some
common names for birds that contain the words, “Hawk Owls,” without the
hyphen in between. We didn’t allow for that in the expression we gave. We
discover also that there are birds in which the word “Hawk” is not in
title case—so we can’t count on looking for the uppercase letter,
H. Our previous regular expression left those birds
out of the results. So we’ll have to change the expression and try a
different method. Enter this on your server:
SELECT common_name AS 'Hawks'
FROM birds
WHERE common_name REGEXP '[[:space:]]Hawk|[[.hyphen.]]Hawk'
AND common_name NOT REGEXP 'Hawk-Owl|Hawk Owl'
ORDER BY family_id;
This first, rather long REGEXP
expression uses a character class and a
character name. The format of character classes and
character names is to put the type of character between two sets of double
brackets. A character class is given between a pair of colons (e.g.,
[[:alpha:]]
for alphabetic characters). A character name is
given between two dots (e.g., [[.hyphen.]]
for a hyphen).
Looking at the first expression, you can deduce that we want rows in which
the common_name
contains either “Hawk” or “-Hawk”—that is to
say, Hawk preceded by a space or a hyphen. This won’t
allow for Hawk preceded by a letter (e.g.,
Nighthawk). The second expression excludes
Hawk-Owl and Hawk Owl.
Pattern matching in regular expressions in MySQL tends to be more
verbose than they are in other languages like Perl or PHP. But they do
work for basic requirements. For elaborate regular expressions, you’ll
have to use an API like the Perl DBI to process the data outside of MySQL.
Because that may be a performance hit, it’s better to try to accomplish
such tasks within MySQL using REGEXP
.
Counting and Grouping Results
In many of our examples, we displayed only a few rows of data because the
results could potentially contain thousands of rows. Suppose we’d like to
know how many are contained in the table. We can do that by adding a
function to the statement. In this case, we want
the COUNT()
function. Let’s see how that would work:
SELECT COUNT(*) FROM birds;
+----------+
| COUNT(*) |
+----------+
| 28891 |
+----------+
We put an asterisk within the parentheses of the function to indicate that we want all of the rows. We could
put a column name instead of an asterisk to count only rows that have
data. Using a column prevents MySQL from counting rows that have a NULL
value in that column. But it will count rows that have a blank or empty
value (i.e., ''
).
It’s nice to know how many rows are in the birds
table,
but suppose we’d like to break apart that count. Let’s use
COUNT()
to count the number of rows for a particular family
of birds, the Pelecanidae—those are Pelicans. Enter
this SQL statement in the mysql client
on your server:
SELECT families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND families.scientific_name = 'Pelecanidae'
+-------------+-----------------+
| Family | Number of Birds |
+-------------+-----------------+
| Pelecanidae | 10 |
+-------------+-----------------+
As you can see, there are 10 bird species recorded for the
Pelecanidae family in the birds
table.
In this example, we used the WHERE
clause to limit the
results to the Pelecanidae family. Suppose we want to
know the number of birds for other bird families in the same order to
which Pelicans belong, to the order called
Pelecaniformes. To do this, we’ll add the
bird_orders
table to the previous SELECT
statement. Enter the following from the mysql client:
SELECT orders.scientific_name AS 'Order',
families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes';
+----------------+-------------+-----------------+
| Order | Family | Number of Birds |
+----------------+-------------+-----------------+
| Pelecaniformes | Pelecanidae | 224 |
+----------------+-------------+-----------------++
This tells us that there are 224 birds in the birds
table that belong to Pelecaniformes. There are five
families in that order of birds, but it returned only the first family
name found. If we want to know the name of each family and the number of
birds in each family, we need to get MySQL to group the results. To do
this, we have to tell it the column by which to group. This is where the GROUP BY
clause comes in. This clause
tells MySQL to group the results based on the columns given with the
clause. Let’s see how that might look. Enter the following on your
server:
SELECT orders.scientific_name AS 'Order',
families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes'
GROUP BY Family;
+----------------+-------------------+-----------------+
| Order | Family | Number of Birds |
+----------------+-------------------+-----------------+
| Pelecaniformes | Ardeidae | 157 |
| Pelecaniformes | Balaenicipitidae | 1 |
| Pelecaniformes | Pelecanidae | 10 |
| Pelecaniformes | Scopidae | 3 |
| Pelecaniformes | Threskiornithidae | 53 |
+----------------+-------------------+-----------------+
We gave the GROUP BY
clause the Family
alias, which is the scientific_name
column from the
bird_families
table. MySQL returns one results set for all
five families, for one SELECT
statement.
The GROUP BY
clause is very useful. You’ll use it
often, so learn it well. This clause and related functions are covered in
greater detail in Chapter 12.
Summary
The SELECT
statement offers quite a number of
parameters and possibilities that I had to skip to keep this chapter from
becoming too lengthy and too advanced for a learning book. For instance,
there are several options for caching results and a clause for exporting a
results set to a text file. You can learn about these from other sources
if you need them.
At this point, make sure you’re comfortable with the
SELECT
statement and its main components: choosing columns
and using field aliases; choosing multiple tables in the FROM
clause; how to construct a WHERE
clause, including the basics
of regular expressions; using the ORDER BY
and the
GROUP BY
clauses; and limiting results with the
LIMIT
clause. It will take time and practice to become very
comfortable with all of these components. Before moving on to Chapter 8, make sure to complete the exercises in
the next section.
Exercises
The following exercises will help cement your understanding of the
SELECT
statement. The act of typing SQL statements,
especially ones that you will use often like SELECT
, helps
you to learn, memorize, and know them well.
-
Construct a
SELECT
statement to select the common
names of birds from thebirds
table. Use the
LIKE
operator to select only Pigeons from the table.
Order the table by thecommon_name
column, but give it a
field alias ofBird'
. Don’t limit the results; let MySQL
retrieve all of the rows that match. Execute the statement on your
server and look over the results.Next, use the same
SELECT
statement, but add a
LIMIT
clause. Limit the results to the first ten rows and
execute it. Compare the results to the previousSELECT
statement to make sure the results show the 1st through 10th row. Then
modify theSELECT
statement again to display the next 10
rows. Compare these results to the results from the first
SELECT
statement to make sure you retrieved the 11th
through 20th row. If you didn’t, find your mistake and correct it
until you get it right. -
In this exercise, you’ll begin with a simple
SELECT
statement and then make it more complicated. To start, construct a
SELECT
statement in which you select the
scientific_name
and thebrief_description
from thebird_orders
table. Give the field for the
scientific_name
an alias of
Order—and don’t forget to put quotes around it
because it’s a reserved word. Use an alias of Types of Birds
in Order forbrief_description
. Don’t limit
the results. When you think that you have theSELECT
statement constructed properly, execute it. If you have errors, try to
determine the problem and fix the statement until you get it
right.Construct another
SELECT
statement in which you
retrieve data from thebirds
table. Select the
common_name
and thescientific_name
columns.
Give them field aliases: Common Name of Bird and
Scientific Name of Bird. Exclude rows in which
thecommon_name
column is blank. Order the data by the
common_name
column. Limit the results to 25 rows of data.
Execute the statement until it works without an error.Merge the first and second
SELECT
statements
together to form oneSELECT
statement that retrieves the
same four columns with the same alias from the same two tables (this
was covered in Combining Tables). It
involves giving more than one table in theFROM
clause
and providing value pairs in theWHERE
clause for
temporarily connecting the tables to each other. This one may seem
tricky. So take your time and don’t get frustrated. If necessary,
refer back to Combining Tables.Limit the results to 25 rows. If you do it right, you should get
the same 25 birds from the secondSELECT
of this
exercise, but with two more fields of data. Be sure to exclude rows in
which thecommon_name
column is blank. -
Use the
SELECT
statement in conjunction with
REGEXP
in theWHERE
clause to get a list of
birds from thebirds
table in which the
common_name
contains the word “Pigeon” or “Dove” (this
was covered in Expressions and the Like). Give
the field for thecommon_name
column the alias
>Type of Columbidae—that’s the name of the
family to which Doves and Pigeons belong.