loading...

MySQL – MariaDB – Selecting Data

Getting Started with the IIS Manager in IIS

Previous chapters discussed the important topics of organizing your
tables well and getting data in to them. In this chapter, we will cover a
key objective that makes the others pay off: retrieving the data stored in a
database. This is commonly called a database query.

The simplest way to retrieve data from a MySQL or MariaDB
database— to select data—is to use the SQL statement,
SELECT. We used this SQL statement a few times in previous
chapters. In this chapter, we will cover it in greater detail. It’s not
necessary to know or use all of the may options, but some techniques such as
joining tables together are basic to using relational databases.

We’ll begin this chapter by reviewing the basics of the
SELECT statement, and then progress to more involved variants.
When you finish this chapter, you will hopefully have a good understanding
of how to use SELECT for most of your needs as you start out as
a database developer, as well as be prepared for the many possibilities and
special situations that may arise over the years of developing databases
with MySQL and MariaDB.

In previous chapters, especially in the exercises, you were asked to
enter data into the tables that we created and altered in the chapters of
the previous part of this book. Entering data on your own was good for
training purposes, but we now need much more data in our database to better
appreciate the examples in this chapter. If you haven’t done so already, go
to this book’s website
and download the dump files that contain tables of data.

Download rookery.sql to get the
whole rookery database, with plenty of data for use in our
explorations. Once you have the dump file on your system (let’s assume you
put it in /tmp/rookery.sql), enter the
following from the command line:

mysql --user='your_name' -p \
rookery < /tmp/rookery.sql

The command prompts for your password, logs you in using the username
assigned to you, and runs the statements in the rookery.sql file on the rookery
database. If everything goes well, there should be no message in response,
just the command-line prompt when it’s finished.

Basic Selection

The basic elements of the syntax for the SELECT statement are the
SELECT keyword, the column you want to select, and the table
from which to retrieve the data:

SELECT column FROM table;

If you want to select more than one column, list them separated by commas. If you want to select all of the columns in a table,
you can use the asterisk as a wildcard instead of listing all of the
columns. Let’s use the rookery database you just loaded with
data to see a practical example of this basic syntax. Enter the following
SQL statement in mysql to get a list of
all of the columns and rows in the birds table:

USE rookery;

SELECT * FROM birds;

This is the most minimal SELECT statement that you can
execute successfully. It tells MySQL to retrieve all of the data contained
in the birds table. It displays the columns in the order you
defined them in the table’s CREATE or ALTER
statements, and displays rows in the order they are found in the table,
which is usually the order that the data was entered into the
table.

To select only certain columns, do something like this:

SELECT bird_id, scientific_name, common_name
FROM birds;

This SELECT statement selects only three columns from
each row found in the birds table. There are also many ways
to choose particular rows, change the order in which they are displayed,
and limit the number shown. These are covered in the following sections of
this chapter.

Selecting by a Criteria

Suppose that we want to select only birds of a certain family, say the
Charadriidae (i.e., Plovers). Looking in the
bird_families table, we find that its family_id
is 103. Using a WHERE clause with the SELECT
statement, we can retrieve a list of birds from the birds
table for this particular family of birds like so:

SELECT common_name, scientific_name
FROM birds WHERE family_id = 103
LIMIT 3;

+----------------------+-------------------------+
| common_name          | scientific_name         |
+----------------------+-------------------------+
| Mountain Plover      | Charadrius montanus     |
| Snowy Plover         | Charadrius alexandrinus |
| Black-bellied Plover | Pluvialis squatarola    |
+----------------------+-------------------------+

This SELECT statement requests two columns, in a
different order from the way the data is listed in the table—in the table
itself, scientific_name precedes common_name. I
also added the LIMIT clause to keep the results down to
the first three rows in the table. We’ll talk more about the
LIMIT clause in a little while.

Note

Because we separated families into a separate table, you had to
look at the bird_families table to get the right ID before
selecting birds from the birds table. That seems
round-about. There is a streamlined way to ask for a family name such as
Charadriidae instead of a number. They’re called
joins. We’ll cover them later.

This is all fairly straightforward and in line with what we’ve seen
in several other examples in previous chapters. Let’s move on and take a
look at how to change the order of the results.

Ordering Results

The previous example selected specific columns from the birds table and
limited the results with the LIMIT clause. However, the rows
were listed in whatever order they were found in the table. We’ve decided
to see only a tiny subset of the birds in the
Charadriidae family, so ordering can make a
difference. If we want to put the results in alphabetical order based on
the values of the common_name column, we add an ORDER
BY
clause like this:

SELECT common_name, scientific_name
FROM birds WHERE family_id = 103
ORDER BY common_name
LIMIT 3;

+-----------------------+----------------------+
| common_name           | scientific_name      |
+-----------------------+----------------------+
| Black-bellied Plover  | Pluvialis squatarola |
| Mountain Plover       | Charadrius montanus  |
| Pacific Golden Plover | Pluvialis fulva      |
+-----------------------+----------------------+

Notice that the ORDER BY clause is located after
the WHERE clause and before the
LIMIT clause. Not only will this statement display the rows
in order by common_name, but it will retrieve only the first
three rows based on the ordering. That is to say, MySQL will first
retrieve all of the rows based on the WHERE clause, store
those results in a temporary table behind the scenes, order the data based
on the ORDER BY clause, and then return to the mysql client the first three rows found in that
temporary table based on the LIMIT clause. This activity is
the reason for the positioning of each clause.

By default, the ORDER BY clause uses ascending order,
which means from A to Z for an alphabetic column. If you want to display
data in descending order, add the DESC option, as in ORDER BY
DESC
. There’s also a contrasting ASC option, but you
probably won’t need to use it because ascending order is the
default.

To order by more than one column, give all the columns in the
ORDER BY clause in a comma-separated list. Each column can be
sorted in ascending or descending order. The clause sorts all the data by
the first column you specify, and then within that order by the second
column, etc. To illustrate this, we’ll select another column from the
birds table, family_id, and we’ll get birds from
a few more families. We’ll select some other types of shore birds:
Oystercatchers (i.e., Haematopodidae), Stilts (e.g.,
Recurvirostridae), and Sandpipers (e.g.,
Scolopacidae). First, we need the
family_id for each of these families. Execute the following
on your server:

SELECT * FROM bird_families
WHERE scientific_name
IN('Charadriidae','Haematopodidae','Recurvirostridae','Scolopacidae');

+-----------+------------------+------------------------------+----------+
| family_id | scientific_name  | brief_description            | order_id |
+-----------+------------------+------------------------------+----------+
|       103 | Charadriidae     | Plovers, Dotterels, Lapwings |      102 |
|       160 | Haematopodidae   | Oystercatchers               |      102 |
|       162 | Recurvirostridae | Stilts and Avocets           |      102 |
|       164 | Scolopacidae     | Sandpipers and Allies        |      102 |
+-----------+------------------+------------------------------+----------+

In this SELECT statement, we added another item to the
WHERE clause, the IN operator. It lists, within parentheses,
the various values we want in the scientific_name column.
Let’s use the IN operator again to get a list of birds and
also test the LIMIT clause:

SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
ORDER BY common_name
LIMIT 3;
+-------------+--------------------------------+-----------+
| common_name | scientific_name                | family_id |
+-------------+--------------------------------+-----------+
|             | Charadrius obscurus aquilonius |       103 |
|             | Numenius phaeopus phaeopus     |       164 |
|             | Tringa totanus eurhinus        |       164 |
+-------------+--------------------------------+-----------+

Notice that we didn’t put the numeric values in quotes as we did
with the family names in the previous SQL statement. Single or double
quotes are necessary for strings, but they’re optional for numeric values.
However, it’s a better practice to not use quotes around numeric values.
They can affect performance and cause incorrect results if you mix them
with strings.

There is one odd thing about the results here: there aren’t any
common names for the birds returned. That’s not a mistake. About 10,000
birds in the birds table are true species of birds, and about
20,000 are subspecies. Many subspecies don’t have a unique common name.
With about 30,000 species and subspecies of birds, with all of the minor
nuances between the subspecies bird families, there just aren’t common
names for all of them. Each bird has a scientific name assigned by
ornithologists, but everyday people who use the common names for birds
don’t see the subtle distinctions that ornithologists see. This is why the
scientific_name column is necessary and why the
common_name column cannot be a key column in the
table.

Let’s execute that SQL statement again, but add another factor to
the WHERE clause to show only birds with a value for the
common_name column:

SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
AND common_name != ''
ORDER BY common_name
LIMIT 3;

+-----------------------+-----------------------+-----------+
| common_name           | scientific_name       | family_id |
+-----------------------+-----------------------+-----------+
| African Oystercatcher | Haematopus moquini    |       160 |
| African Snipe         | Gallinago nigripennis |       164 |
| Amami Woodcock        | Scolopax mira         |       164 |
+-----------------------+-----------------------+-----------+

In the WHERE clause, we added the AND logical operator to specify a
second filter. For a row to match the WHERE clause, the
family_id must be one in the list given and the
common_name must not be equal to a blank value.

Nonprogrammers will have to learn a few conventions to use large
WHERE clauses. We’ve seen that an equals sign says, “The
column must contain this value,” but the != construct
says, “The column must not contain this value.” And in our
statement, we used '' to refer to an empty string. So we’ll
get the rows where the common name exists.

In this case, we couldn’t ask for non-NULL columns. We could have
set up the table so that birds without common names had NULL in the
common_name column, but we chose to instead use empty
strings. That’s totally different in meaning: NULL means there is no
value, whereas the empty string is still a string even if there are no
characters in it. We could have used NULL, but having chosen the empty
string, we must use the right value in our WHERE
clause.

Incidentally, != is the same as <>
(i.e., less-than sign followed by greater-than sign).

Limiting Results

The birds table has nearly 30,000 rows, so selecting
data without limits can return more rows than you might want to view at a
time. We’ve already used the LIMIT clause to resolve
this problem. We limited the results of the SELECT statement
to three rows, the first three rows based on the WHERE and
ORDER BY clauses. If we’d like to see the subsequent rows,
maybe the next two based on the criteria we gave previously, we could
change the LIMIT clause to show five rows. But an
alternative, which is often a better choice, is to do something like
this:

SELECT common_name, scientific_name, family_id
FROM birds
WHERE family_id IN(103, 160, 162, 164)
AND common_name != ''
ORDER BY common_name
LIMIT 3, 2;

+------------------------+-------------------------+-----------+
| common_name            | scientific_name         | family_id |
+------------------------+-------------------------+-----------+
| American Avocet        | Recurvirostra americana |       162 |
| American Golden-Plover | Pluvialis dominica      |       103 |
+------------------------+-------------------------+-----------+

This LIMIT clause has two values: the point where we
want the results to begin, then the number of rows to display. The result
is to show rows 3 and 4. Incidentally, LIMIT 3 used
previously is the same as LIMIT 0, 3: the 0 tells MySQL not
to skip any rows.

Combining Tables

So far in this chapter we’ve been working with just one table. Let’s look at some ways to
select data from more than one table. To do this, we will have to tell
MySQL the tables from which we want data and how to join them
together.

For an example, let’s get a list of birds with their family names.
To keep the query simple, we’ll select birds from different families, but
all in the same order of birds. In earlier examples where we got a list of
shore birds, they all had the same order_id of 102. We’ll use
that value again. Enter this SELECT statement on your
server:

SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family'
FROM birds, bird_families
WHERE birds.family_id = bird_families.family_id
AND order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10;

+------------------------+------------------+
| Bird                   | Family           |
+------------------------+------------------+
| African Jacana         | Jacanidae        |
| African Oystercatcher  | Haematopodidae   |
| African Skimmer        | Laridae          |
| African Snipe          | Scolopacidae     |
| Aleutian Tern          | Laridae          |
| Amami Woodcock         | Scolopacidae     |
| American Avocet        | Recurvirostridae |
| American Golden-Plover | Charadriidae     |
| American Oystercatcher | Haematopodidae   |
| American Woodcock      | Scolopacidae     |
+------------------------+------------------+

This SELECT statement returns one column from the
birds table and one from the bird_families
table. This is a hefty SQL statement, but don’t let it fluster you. It’s
like previous statements in this chapter, but with some minor changes and
one significant one. First, let’s focus on the one significant change: how
we’ve drawn data from two tables.

The FROM clause lists the two tables, separated by a
comma. In the WHERE clause, we indicated that we want rows in
which the value of family_id in the two tables is equal.
Otherwise, we would have duplicate rows in the results. Because those
columns have the same name ( family_id) in both tables, to
prevent ambiguity, we put the table name before the colum name, separated
by a dot (e.g., birds.family_id). We did the same thing for
the scientific name in the column list
( bird_families.scientific_name). If we don’t do that, MySQL
would be confused as to whether we want the scientific_name
from the birds or the bird_families table. This
would generate an error like this:

ERROR 1052 (23000): Column 'scientific_name' in field list is ambiguous

You may have noticed that another new item was added to the SELECT statement: the AS
keyword. This specifies a substitute name, or
alias, for the heading in the results set for the
column. Without the AS keyword for the column containing the
family names, the heading would say
bird_families.scientific_name. That’s not as attractive. This
is another style factor, but it can have more practical aspects that we’ll
see later. The keyword AS can also be used to specify a table
name like so:

SELECT common_name AS 'Bird',
families.scientific_name AS 'Family'
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10;

In this example, we provided an alias for the
bird_families table. We set it to the shorter name
families. Note that aliases for table names must not be in
quotes.

After setting the alias, we must use it wherever we want to refer to
the table. So we have to change the column selected in the field list from
bird_families.scientific_name to
families.scientific_name. We also have to change the column
name bird_families.family_id in the WHERE clause
to families.family_id. If we don’t make this final change,
we’ll get the following error:

ERROR 1054 (42S22):
Unknown column 'bird_families.family_id' in 'where clause'

Let’s add a third table to the previous SQL statement, to get the
name of the order of birds to which the birds belong. You can do that by
entering this SQL statement on your server:

SELECT common_name AS 'Bird',
families.scientific_name AS 'Family',
orders.scientific_name AS 'Order'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND families.order_id = 102
AND common_name != ''
ORDER BY common_name LIMIT 10, 5;

+------------------+------------------+-----------------+
| Bird             | Family           | Order           |
+------------------+------------------+-----------------+
| Ancient Murrelet | Alcidae          | Charadriiformes |
| Andean Avocet    | Recurvirostridae | Charadriiformes |
| Andean Gull      | Laridae          | Charadriiformes |
| Andean Lapwing   | Charadriidae     | Charadriiformes |
| Andean Snipe     | Scolopacidae     | Charadriiformes |
+------------------+------------------+-----------------+

Let’s look at the changes from the previous statement to this one.
We added the third table to the FROM clause and gave it an alias of orders. To properly
connect the third table, we had to add another evaluator to the
WHERE clause: families.order_id =
orders.order_id
. This allows the SELECT to retrieve
the right rows containing the scientific names of the orders that
correspond to the rows we select from the families. We also added a column
to the field list to display the name of the order. Because the families
we’ve selected are all from the same order, that field seems a little
pointless in these results but can be useful as we search more orders in
the future. We gave a starting point for the LIMIT clause so
that we could see the next five birds in the results.

Note

It’s not necessary to put the field alias name for a column in
quotes if the alias is only one word. However, if you use a reserved
word (e.g., Order), you will need to use
quotes.

Expressions and the Like

Let’s change the latest SELECT statement to include birds from
multiple orders. To do this, we’ll focus in on the operator in the WHERE clause for the common_name:

AND common_name != ''

We’ll change the simple comparison here (i.e., the LIKE
operator, which we saw in Chapter 6) to
select multiple names that are similar. Among many families of birds,
there are often bird species that are similar but have different sizes.
The smallest is sometimes referred to as the least in
the common name. So let’s search the database for birds with
Least in their name:

SELECT common_name AS 'Bird',
families.scientific_name AS 'Family',
orders.scientific_name AS 'Order'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND common_name LIKE 'Least%'
ORDER BY orders.scientific_name, families.scientific_name, common_name
LIMIT 10;
+------------------+---------------+------------------+
| Bird             | Family        | Order            |
+------------------+---------------+------------------+
| Least Nighthawk  | Caprimulgidae | Caprimulgiformes |
| Least Pauraque   | Caprimulgidae | Caprimulgiformes |
| Least Auklet     | Alcidae       | Charadriiformes  |
| Least Tern       | Laridae       | Charadriiformes  |
| Least Sandpiper  | Scolopacidae  | Charadriiformes  |
| Least Seedsnipe  | Thinocoridae  | Charadriiformes  |
| Least Flycatcher | Tyrannidae    | Passeriformes    |
| Least Bittern    | Ardeidae      | Pelecaniformes   |
| Least Honeyguide | Indicatoridae | Piciformes       |
| Least Grebe      | Podicipedidae | Podicipediformes |
+------------------+---------------+------------------+

In the preceding example, using the LIKE operator,
MySQL selected rows in which the common_name starts with
Least and ends with anything (i.e., the wildcard,
%). We also removed the families.order_id = 102
clause, so that we wouldn’t limit the birds to a single order. The results
now have birds from a few different orders.

We also changed the ORDER BY clause to have MySQL order the results in the temporary table first by
the bird order’s scientific name, then by the bird family’s scientific
name, and then by the bird’s common name. If you look at the results, you
can see that’s what it did: it sorted the orders first. If you look at the
rows for the Charadriiformes, you can see that the
families for that order are in alphabetical order. The two birds in the
Caprimulgidae family are in alphabetical
order.

Note

You cannot use alias names for columns in the ORDER
BY
clause, but you can use alias table names. In fact, they’re
required if you’ve used the aliases in the FROM clause.

The previous example used the LIKE operator, which has
limited pattern matching abilities. As an alternative, you can use
REGEXP, which has many pattern matching characters and
classes. Let’s look at a simpler example, of the previous SELECT statement, but using
REGEXP. In the previous example we searched for small birds,
birds with a common name starting with the word
Least. The largest bird in a family is typically
called Great. To add these birds, enter the following
SQL statement on your server:

SELECT common_name AS 'Birds Great and Small'
FROM birds
WHERE common_name REGEXP 'Great|Least'
ORDER BY family_id LIMIT 10;
+-----------------------------+
| Birds Great and Small       |
+-----------------------------+
| Great Northern Loon         |
| Greater Scaup               |
| Greater White-fronted Goose |
| Greater Sand-Plover         |
| Great Crested Tern          |
| Least Tern                  |
| Great Black-backed Gull     |
| Least Nighthawk             |
| Least Pauraque              |
| Great Slaty Woodpecker      |
+-----------------------------+

The expression we’re giving with REGEXP, within the
quote marks, contains two string values: Great and
Least. By default, MySQL assumes the text given for
REGEXP is meant to be for the start of the string. To be
emphatic, you can insert a carat (i.e., ^) at the start of these string
values, but it’s unnecessary. The vertical bar (i.e., |) between the two
expressions signifies that either value is acceptable—it means
or.

In the results, you can see some common bird names starting with
Greater, not just Great. If we
don’t want to include the Greater birds, we can
exclude them with the NOT REGEXP operator. Enter the following on
your server:

SELECT common_name AS 'Birds Great and Small'
FROM birds
WHERE common_name REGEXP 'Great|Least'
AND common_name NOT REGEXP 'Greater'
ORDER BY family_id LIMIT 10;

+--------------------------+
| Birds Great and Small    |
+--------------------------+
| Great Northern Loon      |
| Least Tern               |
| Great Black-backed Gull  |
| Great Crested Tern       |
| Least Nighthawk          |
| Least Pauraque           |
| Great Slaty Woodpecker   |
| Great Spotted Woodpecker |
| Great Black-Hawk         |
| Least Flycatcher         |
+--------------------------+

Using NOT REGEXP eliminated all of the
Greater birds. Notice that it was included with
AND, and not as part of the REGEXP.

Incidentally, we’re ordering here by family_id to keep
similar birds together in the list and to have a good mix of
Great and Least birds. The
results may seem awkward, though, as the names of the birds are not
ordered. We could add another column to the ORDER BY clause
to alphabetize them within each family.

REGEXP and NOT REGEXP are case
insensitive. If we want an expression to be case sensitive, we’ll need to
add the BINARY option. Let’s get another list of birds to see this. This time we’ll
search for Hawks, with the first letter in uppercase. This is because we
want only Hawks and not other birds that have the word,
hawk in their name, but are not a Hawk. For instance,
we don’t want Nighthawks and we don’t want Hawk-Owls. The way the data is
in the birds table, each word of a common name starts with an
uppercase letter—the names are in title case. So we’ll eliminate birds
such as Nighthawks by using the BINARY option to require that
“Hawk” be spelled with an uppercase H and the other
letters in lowercase. We’ll use NOT REGEXP to not allow
Hawk-Owls. Try the following on your server:

SELECT common_name AS 'Hawks'
FROM birds
WHERE common_name REGEXP BINARY 'Hawk'
AND common_name NOT REGEXP 'Hawk-Owl'
ORDER BY family_id LIMIT 10;

+-------------------+
| Hawks             |
+-------------------+
| Red-tailed Hawk   |
| Bicolored Hawk    |
| Common Black-Hawk |
| Cuban Black-Hawk  |
| Rufous Crab Hawk  |
| Great Black-Hawk  |
| Black-faced Hawk  |
| White-browed Hawk |
| Ridgway's Hawk    |
| Broad-winged Hawk |
+-------------------+

I stated that REGEXP and NOT REGEXP are
case insensitive, unless you add the BINARY
option as we did to stipulate the collating method as binary (e.g., the
letter H has a different binary value fromn the
letter h). For the common_name column,
though, we didn’t need to add the BINARY option because the
column has a binary collation setting. We did this unknowingly when we
created the rookery database near the beginning of Chapter 4. See how we created the database by
entering this from the mysql
client:

SHOW CREATE DATABASE rookery \G

*************************** 1. row ***************************
       Database: rookery
Create Database: CREATE DATABASE `rookery` /*!40100 DEFAULT
                 CHARACTER SET latin1 COLLATE latin1_bin */

The COLLATE clause is set to latin1_bin, meaning Latin1 binary. Any columns that we create in tables in the
rookery database, unless we specify otherwise, will be
collated using latin1_bin. Execute the following statement to
see how the common_name column in the birds
table is set:

SHOW FULL COLUMNS
FROM birds LIKE 'common_name' \G

*************************** 1. row ***************************
     Field: common_name
      Type: varchar(255)
 Collation: latin1_bin
      Null: YES
       Key:
   Default: NULL
     Extra:
Privileges: select,insert,update,references
   Comment: 

This shows information just on the common_name column.
Notice that the Collation is latin1_bin.
Because of that, regular expressions using REGEXP are case
sensitive without having to add the BINARY option.

Looking through the birds table, we discover some
common names for birds that contain the words, “Hawk Owls,” without the
hyphen in between. We didn’t allow for that in the expression we gave. We
discover also that there are birds in which the word “Hawk” is not in
title case—so we can’t count on looking for the uppercase letter,
H. Our previous regular expression left those birds
out of the results. So we’ll have to change the expression and try a
different method. Enter this on your server:

SELECT common_name AS 'Hawks'
FROM birds
WHERE common_name REGEXP '[[:space:]]Hawk|[[.hyphen.]]Hawk'
AND common_name NOT REGEXP 'Hawk-Owl|Hawk Owl'
ORDER BY family_id;

This first, rather long REGEXP expression uses a character class and a
character name. The format of character classes and
character names is to put the type of character between two sets of double
brackets. A character class is given between a pair of colons (e.g.,
[[:alpha:]] for alphabetic characters). A character name is
given between two dots (e.g., [[.hyphen.]] for a hyphen).
Looking at the first expression, you can deduce that we want rows in which
the common_name contains either “Hawk” or “-Hawk”—that is to
say, Hawk preceded by a space or a hyphen. This won’t
allow for Hawk preceded by a letter (e.g.,
Nighthawk). The second expression excludes
Hawk-Owl and Hawk Owl.

Pattern matching in regular expressions in MySQL tends to be more
verbose than they are in other languages like Perl or PHP. But they do
work for basic requirements. For elaborate regular expressions, you’ll
have to use an API like the Perl DBI to process the data outside of MySQL.
Because that may be a performance hit, it’s better to try to accomplish
such tasks within MySQL using REGEXP.

Counting and Grouping Results

In many of our examples, we displayed only a few rows of data because the
results could potentially contain thousands of rows. Suppose we’d like to
know how many are contained in the table. We can do that by adding a
function to the statement. In this case, we want
the COUNT() function. Let’s see how that would work:

SELECT COUNT(*) FROM birds;

+----------+
| COUNT(*) |
+----------+
|    28891 |
+----------+

We put an asterisk within the parentheses of the function to indicate that we want all of the rows. We could
put a column name instead of an asterisk to count only rows that have
data. Using a column prevents MySQL from counting rows that have a NULL
value in that column. But it will count rows that have a blank or empty
value (i.e., '').

It’s nice to know how many rows are in the birds table,
but suppose we’d like to break apart that count. Let’s use
COUNT() to count the number of rows for a particular family
of birds, the Pelecanidae—those are Pelicans. Enter
this SQL statement in the mysql client
on your server:

SELECT families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families
WHERE birds.family_id = families.family_id
AND families.scientific_name = 'Pelecanidae'

+-------------+-----------------+
| Family      | Number of Birds |
+-------------+-----------------+
| Pelecanidae |              10 |
+-------------+-----------------+

As you can see, there are 10 bird species recorded for the
Pelecanidae family in the birds table.
In this example, we used the WHERE clause to limit the
results to the Pelecanidae family. Suppose we want to
know the number of birds for other bird families in the same order to
which Pelicans belong, to the order called
Pelecaniformes. To do this, we’ll add the
bird_orders table to the previous SELECT
statement. Enter the following from the mysql client:

SELECT orders.scientific_name AS 'Order',
families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes';

+----------------+-------------+-----------------+
| Order          | Family      | Number of Birds |
+----------------+-------------+-----------------+
| Pelecaniformes | Pelecanidae |             224 |
+----------------+-------------+-----------------++

This tells us that there are 224 birds in the birds
table that belong to Pelecaniformes. There are five
families in that order of birds, but it returned only the first family
name found. If we want to know the name of each family and the number of
birds in each family, we need to get MySQL to group the results. To do
this, we have to tell it the column by which to group. This is where the GROUP BY clause comes in. This clause
tells MySQL to group the results based on the columns given with the
clause. Let’s see how that might look. Enter the following on your
server:

SELECT orders.scientific_name AS 'Order',
families.scientific_name AS 'Family',
COUNT(*) AS 'Number of Birds'
FROM birds, bird_families AS families, bird_orders AS orders
WHERE birds.family_id = families.family_id
AND families.order_id = orders.order_id
AND orders.scientific_name = 'Pelecaniformes'
GROUP BY Family;

+----------------+-------------------+-----------------+
| Order          | Family            | Number of Birds |
+----------------+-------------------+-----------------+
| Pelecaniformes | Ardeidae          |             157 |
| Pelecaniformes | Balaenicipitidae  |               1 |
| Pelecaniformes | Pelecanidae       |              10 |
| Pelecaniformes | Scopidae          |               3 |
| Pelecaniformes | Threskiornithidae |              53 |
+----------------+-------------------+-----------------+

We gave the GROUP BY clause the Family
alias, which is the scientific_name column from the
bird_families table. MySQL returns one results set for all
five families, for one SELECT statement.

The GROUP BY clause is very useful. You’ll use it
often, so learn it well. This clause and related functions are covered in
greater detail in Chapter 12.

Summary

The SELECT statement offers quite a number of
parameters and possibilities that I had to skip to keep this chapter from
becoming too lengthy and too advanced for a learning book. For instance,
there are several options for caching results and a clause for exporting a
results set to a text file. You can learn about these from other sources
if you need them.

At this point, make sure you’re comfortable with the
SELECT statement and its main components: choosing columns
and using field aliases; choosing multiple tables in the FROM
clause; how to construct a WHERE clause, including the basics
of regular expressions; using the ORDER BY and the
GROUP BY clauses; and limiting results with the
LIMIT clause. It will take time and practice to become very
comfortable with all of these components. Before moving on to Chapter 8, make sure to complete the exercises in
the next section.

Exercises

The following exercises will help cement your understanding of the
SELECT statement. The act of typing SQL statements,
especially ones that you will use often like SELECT, helps
you to learn, memorize, and know them well.

  1. Construct a SELECT statement to select the common
    names of birds from the birds table. Use the
    LIKE operator to select only Pigeons from the table.
    Order the table by the common_name column, but give it a
    field alias of Bird'. Don’t limit the results; let MySQL
    retrieve all of the rows that match. Execute the statement on your
    server and look over the results.

    Next, use the same SELECT statement, but add a
    LIMIT clause. Limit the results to the first ten rows and
    execute it. Compare the results to the previous SELECT
    statement to make sure the results show the 1st through 10th row. Then
    modify the SELECT statement again to display the next 10
    rows. Compare these results to the results from the first
    SELECT statement to make sure you retrieved the 11th
    through 20th row. If you didn’t, find your mistake and correct it
    until you get it right.

  2. In this exercise, you’ll begin with a simple SELECT
    statement and then make it more complicated. To start, construct a
    SELECT statement in which you select the
    scientific_name and the brief_description
    from the bird_orders table. Give the field for the
    scientific_name an alias of
    Order—and don’t forget to put quotes around it
    because it’s a reserved word. Use an alias of Types of Birds
    in Order
    for brief_description. Don’t limit
    the results. When you think that you have the SELECT
    statement constructed properly, execute it. If you have errors, try to
    determine the problem and fix the statement until you get it
    right.

    Construct another SELECT statement in which you
    retrieve data from the birds table. Select the
    common_name and the scientific_name columns.
    Give them field aliases: Common Name of Bird and
    Scientific Name of Bird. Exclude rows in which
    the common_name column is blank. Order the data by the
    common_name column. Limit the results to 25 rows of data.
    Execute the statement until it works without an error.

    Merge the first and second SELECT statements
    together to form one SELECT statement that retrieves the
    same four columns with the same alias from the same two tables (this
    was covered in Combining Tables). It
    involves giving more than one table in the FROM clause
    and providing value pairs in the WHERE clause for
    temporarily connecting the tables to each other. This one may seem
    tricky. So take your time and don’t get frustrated. If necessary,
    refer back to Combining Tables.

    Limit the results to 25 rows. If you do it right, you should get
    the same 25 birds from the second SELECT of this
    exercise, but with two more fields of data. Be sure to exclude rows in
    which the common_name column is blank.

  3. Use the SELECT statement in conjunction with
    REGEXP in the WHERE clause to get a list of
    birds from the birds table in which the
    common_name contains the word “Pigeon” or “Dove” (this
    was covered in Expressions and the Like). Give
    the field for the common_name column the alias
    >Type of Columbidae—that’s the name of the
    family to which Doves and Pigeons belong.

Comments are closed.

loading...