loading...

MySQL – MariaDB – Joining and Subquerying Data

Getting Started with the IIS Manager in IIS

Most of the examples used in this book thus far have intentionally
involved one table per SQL statement in order to allow you to focus on the
basic syntax of each SQL statement. When developing a MySQL or MariaDB
database, though, you will often query multiple tables. There are a few
methods by which you may do that—you’ve seen some simple examples of them in
previous chapters. This chapter covers how to merge results from multiple
SQL statements, how to join tables, and how to use subqueries to achieve
similar results.

Unifying Results

Let’s start this chapter by looking at a simple method of unifying results from multiple
SQL statements. There may be times when you just want the unified results
of two SELECT statements that don’t interact with each other.
In this situation, you can use the UNION operator, which
merges two SELECT statements to form a unified results set.
You can merge many SELECT statements together simply by
placing the UNION between them in a chain. Let’s look at an
example.

In Counting and Grouping Results, we queried the
birds table to get a count of the number of birds in the
Pelecanidae family (i.e., Pelicans). Suppose we want
to also know how many birds are in the Ardeidae
family (i.e., Herons). That’s easy to do: we’d use a copy of the same
SELECT, but change the value in the WHERE
clause. Suppose further that we want to merge the results of the
SELECT statement counting Pelicans with the results of a
SELECT counting Herons. We’ll do this with a
UNION operator, so we can enter two complete
SELECT statements and unite them into one results set. Enter
the following in the mysql
client:

  SELECT 'Pelecanidae' AS 'Family',
    COUNT(*) AS 'Species'
    FROM birds, bird_families AS families
    WHERE birds.family_id = families.family_id
    AND families.scientific_name = 'Pelecanidae'
UNION
  SELECT 'Ardeidae',
    COUNT(*)
    FROM birds, bird_families AS families
    WHERE birds.family_id = families.family_id
    AND families.scientific_name = 'Ardeidae';

+-------------+---------+
| Family      | Species |
+-------------+---------+
| Pelecanidae |      10 |
| Ardeidae    |     157 |
+-------------+---------+

First notice that the column headings in the results is taken only
from the first SELECT statement. Next notice that for the
first fields in both SELECT statements, we didn’t reference a
column. Instead, we gave plain text within quotes:
'Pelecanidae' and 'Ardeidae'. That’s an
acceptable choice in MySQL and MariaDB. It works well when you want to
fill a field with text like this. Notice that we gave field aliases for
the columns in the first SELECT statement, but not in the
second one. MySQL uses the first ones it’s given for the column headings
of the results set when using the UNION operator. It ignores
any field aliases in subsequent SELECT statements, so they’re
not needed. If you don’t give aliases, it uses the column names of the
first SQL statement of the UNION.

The reason a UNION was somewhat necessary in the
preceding example is because we’re using an aggregate function,
COUNT() with GROUP BY. We can group by multiple
columns, but to get results like this which show separate counts for two
specific values of the same column, a UNION or some other
method is necessary.

There are a few minor things to know about using a
UNION. It’s used only with SELECT statements.
The SELECT statements can select columns from different
tables. Duplicate rows are combined into a single column in the results
set.

You can use the ORDER BY clause to order the unified
results. If you want to order the results of a SELECT
statements, independently of the unified results, you have to put that
SELECT statement within parentheses and add an ORDER
BY
clause to it. When specifying the columns in the ORDER
BY
clauses, you cannot preface column names with the table names
(e.g., families.scientific_name). If using the column names
would be ambiguous, you should instead use column aliases. Let’s expand
our previous example to better illustrate how to use the ORDER
BY
clause with UNION. Let’s get a count for each bird
family within two orders: Pelecaniformes and
Suliformes. Enter the following:

  SELECT families.scientific_name AS 'Family',
    COUNT(*) AS 'Species'
    FROM birds, bird_families AS families, bird_orders AS orders
    WHERE birds.family_id = families.family_id
    AND families.order_id = orders.order_id
    AND orders.scientific_name = 'Pelecaniformes'
    GROUP BY families.family_id
UNION
  SELECT families.scientific_name, COUNT(*)
    FROM birds, bird_families AS families, bird_orders AS orders
    WHERE birds.family_id = families.family_id
    AND families.order_id = orders.order_id
    AND orders.scientific_name = 'Suliformes'
    GROUP BY families.family_id;

+-------------------+---------+
| Family            | Species |
+-------------------+---------+
| Pelecanidae       |      10 |
| Balaenicipitidae  |       1 |
| Scopidae          |       3 |
| Ardeidae          |     157 |
| Threskiornithidae |      53 |
| Fregatidae        |      13 |
| Sulidae           |      16 |
| Phalacrocoracidae |      61 |
| Anhingidae        |       8 |
+-------------------+---------+

The first five rows are are Pelecaniformes and
the remaining rows are Suliformes. The results are
not in alphabetical order, but in the order of each SELECT
statement and the order that server found the rows for each
SELECT statement based on the family_id. If we
want to order the results alphabetically by the family name, we have to
use an ORDER BY clause, but after the unified results are
generated. To do this, we’ll wrap the results set in parentheses to tell
MySQL to treat it as a table. Then we’ll select all of the columns and
rows of that results set and use the ORDER BY clause to order
them based on the family name. To avoid confusion, we’ll add the name of
the order to the results. Enter the following:

  SELECT * FROM
(
    SELECT families.scientific_name AS 'Family',
    COUNT(*) AS 'Species',
    orders.scientific_name AS 'Order'
    FROM birds, bird_families AS families, bird_orders AS orders
    WHERE birds.family_id = families.family_id
    AND families.order_id = orders.order_id
    AND orders.scientific_name = 'Pelecaniformes'
    GROUP BY families.family_id
UNION
  SELECT families.scientific_name, COUNT(*), orders.scientific_name
    FROM birds, bird_families AS families, bird_orders AS orders
    WHERE birds.family_id = families.family_id
    AND families.order_id = orders.order_id
    AND orders.scientific_name = 'Suliformes'
    GROUP BY families.family_id ) AS derived_1
ORDER BY Family;

+-------------------+---------+----------------+
| Family            | Species | Order          |
+-------------------+---------+----------------+
| Anhingidae        |       8 | Suliformes     |
| Ardeidae          |     157 | Pelecaniformes |
| Balaenicipitidae  |       1 | Pelecaniformes |
| Fregatidae        |      13 | Suliformes     |
| Pelecanidae       |      10 | Pelecaniformes |
| Phalacrocoracidae |      61 | Suliformes     |
| Scopidae          |       3 | Pelecaniformes |
| Sulidae           |      16 | Suliformes     |
| Threskiornithidae |      53 | Pelecaniformes |
+-------------------+---------+----------------+

In these examples, it may seem to be a lot of typing to achieve very
little. But there are times—albeit rare times—when UNION is
the best or simplest choice. It’s more useful when you retrieve data from
very distinct, separate sources or other situations that would require
contortions to fit into a single SELECT statement and are
executed more easily as separate ones, still giving you a unified results
set.

You can get the same results as the previous examples, though, with
less effort by using a subquery. Actually, when we put the
UNION within parentheses, that became a subquery, just not
much of one. We’ll cover subqueries later in this chapter. For now, let’s
consider how to join multiple tables in one SQL statement.

Joining Tables

The JOIN clause links two tables together in a SELECT,
UPDATE, or DELETE statement. JOIN
links tables based on columns with common data for purposes of selecting,
updating, or deleting data. In A Little Complexity, for instance, we joined two
tables named books and status_names, taking
advantage of the design that put identical values in the
status column of books and the
status_id column of status_names. That way, we
could show data from each table about the same book:

SELECT book_id, title, status_name
FROM books JOIN status_names
WHERE status = status_id;

Let’s review the way a join works, using this example. The
status and status_id fields both contain numbers
that refer to a status. In the books table, the numbers have
no intrinsic meaning. But the status_names table associates
the numbers with meaningful text. Thus, by joining the tables, you can
associate a book with its status.

Sometimes there are alternatives to the JOIN clause.
For instance, when constructing an SQL statement that includes multiple
tables, a simple method is to list the tables in a comma-separated list in
the appropriate position of the SQL statement—for a SELECT
statement, you would list them in the FROM clause—and to
provide pairing of columns in the WHERE clause
on which the tables will be joined. This is the method we have used
several times in the previous chapters. Although this method works fine
and would seem fairly straightforward, a more agreeable method is to use a
JOIN clause to join both tables and to specify the join point
columns. When you have an error with an SQL statement, keeping these items
together and not having part of them in the WHERE clause
makes troubleshooting SQL statements easier.

With JOIN, tables are linked together based on columns
with common data for purposes of selecting, updating, or deleting data.
The JOIN clause is entered in the relevant statement where
tables referenced are specified usually. This precludes the need to join
the tables based on key columns in the WHERE clause. The
ON operator is used to indicate the pair of columns by which the tables
are to be joined (indicated with the equals-sign operator). If needed, you
may specify multiple pairs of columns, separated by AND. If
the column names by which the two tables are joined are the same in both
tables, as an alternative method, the USING operator may be
given along with a comma-separated list of columns that both tables have
in common, contained within parentheses. The columns must be contained in
each table that is joined. To improve performance, join to a column that
is indexed.

Here is how the first of these two syntax looks using a
JOIN:

SELECT book_id, title, status_name
FROM books
JOIN status_names ON(status = status_id);

This is the same example as before, but without the
WHERE clause. It doesn’t need it, because it uses
ON instead to indicate the join point. If we were to alter
the books table to modify the name of the status
column to be status_id, so that the names of both columns on
which we join these two tables are the same, we could do the join like
this:

SELECT book_id, title, status_name
FROM books
JOIN status_names USING(status_id);

Here we use the keyword USING in the JOIN clause
to indicate the identical column by which to join.

These syntaxes are only two of a few possible with the
JOIN. They show how you might construct a SELECT
statement using a JOIN. It’s basically the same for the
UPDATE and DELETE statements. In the next
subsections, we’ll consider the methods for using JOIN with
each of these three SQL statements, and look at some examples for
each.

Selecting a Basic Join

Suppose we want to get a list of species of Geese whose existence is
Threatened—that’s a category of conservation states. We
will need to construct a SELECT statement that takes data
from the birds table and the
conservation_status table. The shared data in the
birds and the conservation_status tables is
the conservation_status_id column of each table. We didn’t
have to give the column the same name in each table, but doing so makes
it easier to know where to join them.

Enter the following in the mysql client:

SELECT common_name, conservation_state
FROM birds
JOIN conservation_status
ON(birds.conservation_status_id = conservation_status.conservation_status_id)
WHERE conservation_category = 'Threatened'
AND common_name LIKE '%Goose%';

+----------------------------+--------------------+
| common_name                | conservation_state |
+----------------------------+--------------------+
| Swan Goose                 | Vulnerable         |
| Lesser White-fronted Goose | Vulnerable         |
| Hawaiian Goose             | Vulnerable         |
| Red-breasted Goose         | Endangered         |
| Blue-winged Goose          | Vulnerable         |
+----------------------------+--------------------+

The ON operator specifies the
conservation_status_id columns from each table as the
common item on which to join the tables. MySQL knows the proper table in
which to find the conservation_category and
common_name columns, and pulls the rows that match.

That works fine, but it’s a lot to type. Let’s modify this
statement to use the USING operator, specifing
conservation_status_id just once to make the join. MySQL
will understand what to do. Here’s that same SQL statement, but with the
USING operator:

SELECT common_name, conservation_state
FROM birds
JOIN conservation_status
USING(conservation_status_id)
WHERE conservation_category = 'Threatened'
AND common_name LIKE '%Goose%';

Now let’s modify the SQL statement to include the bird family. To
do that, we’ll have to add another table, the
bird_families. Let’s also include Ducks in the list. Try
executing the following:

SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family', conservation_state AS 'Status'
FROM birds
JOIN conservation_status USING(conservation_status_id)
JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened'
AND common_name REGEXP 'Goose|Duck'
ORDER BY Status, Bird;

+----------------------------+----------+-----------------------+
| Bird                       | Family   | Status                |
+----------------------------+----------+-----------------------+
| Laysan Duck                | Anatidae | Critically Endangered |
| Pink-headed Duck           | Anatidae | Critically Endangered |
| Blue Duck                  | Anatidae | Endangered            |
| Hawaiian Duck              | Anatidae | Endangered            |
| Meller's Duck              | Anatidae | Endangered            |
| Red-breasted Goose         | Anatidae | Endangered            |
| White-headed Duck          | Anatidae | Endangered            |
| White-winged Duck          | Anatidae | Endangered            |
| Blue-winged Goose          | Anatidae | Vulnerable            |
| Hawaiian Goose             | Anatidae | Vulnerable            |
| Lesser White-fronted Goose | Anatidae | Vulnerable            |
| Long-tailed Duck           | Anatidae | Vulnerable            |
| Philippine Duck            | Anatidae | Vulnerable            |
| Swan Goose                 | Anatidae | Vulnerable            |
| West Indian Whistling-Duck | Anatidae | Vulnerable            |
| White-headed Steamer-Duck  | Anatidae | Vulnerable            |
+----------------------------+----------+-----------------------+

We gave two JOIN clauses in this SQL statement. It
doesn’t usually matter which table is listed where. For instance,
although bird_families is listed just after the join for
the conservation_statustable, MySQL determined that
bird_families is to be joined to the birds
table. Without using JOIN, we would have to be more
emphatic in specifying the join points, and we would have to list them
in the WHERE clause. It would have to be entered
like this:

SELECT common_name AS 'Bird',
bird_families.scientific_name AS 'Family', conservation_state AS 'Status'
FROM birds, conservation_status, bird_families
WHERE birds.conservation_status_id = conservation_status.conservation_status_id
AND birds.family_id = bird_families.family_id
AND conservation_category = 'Threatened'
AND common_name REGEXP 'Goose|Duck'
ORDER BY Status, Bird;

That’s a very cluttered WHERE clause, making it
difficult to see clearly the conditions by which we’re selecting data
from the tables. Using JOIN clauses is much tidier.

Incidentally, the SQL statement with two JOIN clauses
used a regular expression—the REGEXP operator in the WHERE
clause—to specify that the clause find either Goose or
Duck. We also added an ORDER BY clause to order first by
Status, then by Bird name.

In this example, though, there’s little point in listing the bird
family name, because the birds are all of the same family. Plus, there
may be similar birds that we might like to have in the list, but that
don’t have the words Goose or Duck in their
name. So let’s change that in the SQL statement. Let’s also order the
results differently and list birds from the least endangered to the most
endangered. Enter the following:

SELECT common_name AS 'Bird from Anatidae',
conservation_state AS 'Conservation Status'
FROM birds
JOIN conservation_status AS states USING(conservation_status_id)
JOIN bird_families USING(family_id)
WHERE conservation_category = 'Threatened'
AND bird_families.scientific_name = 'Anatidae'
ORDER BY states.conservation_status_id DESC, common_name ASC;

+----------------------------+-----------------------+
| Bird from Anatidae         | Conservation Status   |
+----------------------------+-----------------------+
| Auckland Islands Teal      | Vulnerable            |
| Blue-winged Goose          | Vulnerable            |
| Eaton's Pintail            | Vulnerable            |
| Hawaiian Goose             | Vulnerable            |
| Lesser White-fronted Goose | Vulnerable            |
| Long-tailed Duck           | Vulnerable            |
| Marbled Teal               | Vulnerable            |
| Philippine Duck            | Vulnerable            |
| Salvadori's Teal           | Vulnerable            |
| Steller's Eider            | Vulnerable            |
| Swan Goose                 | Vulnerable            |
| West Indian Whistling-Duck | Vulnerable            |
| White-headed Steamer-Duck  | Vulnerable            |
| Bernier's Teal             | Endangered            |
| Blue Duck                  | Endangered            |
| Brown Teal                 | Endangered            |
| Campbell Islands Teal      | Endangered            |
| Hawaiian Duck              | Endangered            |
| Meller's Duck              | Endangered            |
| Red-breasted Goose         | Endangered            |
| Scaly-sided Merganser      | Endangered            |
| White-headed Duck          | Endangered            |
| White-winged Duck          | Endangered            |
| White-winged Scoter        | Endangered            |
| Baer's Pochard             | Critically Endangered |
| Brazilian Merganser        | Critically Endangered |
| Crested Shelduck           | Critically Endangered |
| Laysan Duck                | Critically Endangered |
| Madagascar Pochard         | Critically Endangered |
| Pink-headed Duck           | Critically Endangered |
+----------------------------+-----------------------+

An obvious change to this example is the elimination of
bird_families.scientific_name from the list of selected
columns, so only two columns appear in the output. Another change, which
is cosmetic, is to provide the alias states to the
conservation_status table so we could refer to the short
alias later instead of the long name.

Finally, the ORDER BY clause orders the output by
conservation_status_id, because that value happens to be in
the order of severity in the conservation_status table. We
want to override the default order, which puts the most threatened
species first, so we add the DESC option to put the least
threatened first. We’re still ordering results secondarily by the common
name of the birds, but using the actual column name this time instead of
an alias. This is because we changed the alias for the
common_name column from Birds to Birds
from Anatidae
, because all the results are in that family. We
could have used 'Birds from Anatidae' in the ORDER
BY
clause, but that’s bothersome to type.

Let’s look at one more basic example of a JOIN.
Suppose we wanted to get a list of members located in Russia (i.e.,
where country_id has a value of ru) who have
reported sighting a bird from the Scolopacidae family
(shore and wader birds like Sandpipers and Curlews). Information on bird
sightings is stored in the bird_sightings table. It
includes GPS coordinates recorded from a bird list application on the
member’s mobile phone when they note the sighting. Enter this SQL
statement:

SELECT CONCAT(name_first, ' ', name_last) AS Birder,
common_name AS Bird, location_gps AS 'Location of Sighting'
FROM birdwatchers.humans
JOIN birdwatchers.bird_sightings USING(human_id)
JOIN rookery.birds USING(bird_id)
JOIN rookery.bird_families USING(family_id)
WHERE country_id = 'ru'
AND bird_families.scientific_name = 'Scolopacidae'
ORDER BY Birder;

+-------------------+-------------------+---------------------------+
| Birder            | Bird              | Location of Sighting      |
+-------------------+-------------------+---------------------------+
| Anahit Vanetsyan  | Bar-tailed Godwit | 42.81958072; 133.02246094 |
| Elena Bokova      | Eurasian Curlew   | 51.70469364; 58.63746643  |
| Elena Bokova      | Eskimo Curlew     | 66.16051056; -162.7734375 |
| Katerina Smirnova | Eurasian Curlew   | 42.69096856; 130.78185081 |
+-------------------+-------------------+---------------------------+

This SQL statement joins together four tables, two from the
birdwatchers database and two from the birds
database. Look closely at this SQL statement and consider the purpose of
including each of those four tables. All of them were needed to assemble
the results shown. Incidentally, we used the CONCAT()
function to concatenate together the member’s first and last name for
the Birder field in the results.

There are other types of joins besides a plain JOIN.
Let’s do another SELECT using another type of
JOIN. For an example of this, we’ll get a list of Egrets
and their conservation status. Enter the following SQL statement:

SELECT common_name AS 'Bird',
conservation_state AS 'Status'
FROM birds
LEFT JOIN conservation_status USING(conservation_status_id)
WHERE common_name LIKE '%Egret%'
ORDER BY Status, Bird;

+--------------------+-----------------+
| Bird               | Status          |
+--------------------+-----------------+
| Great Egret        | NULL            |
| Cattle Egret       | Least Concern   |
| Intermediate Egret | Least Concern   |
| Little Egret       | Least Concern   |
| Snowy Egret        | Least Concern   |
| Reddish Egret      | Near Threatened |
| Chinese Egret      | Vulnerable      |
| Slaty Egret        | Vulnerable      |
+--------------------+-----------------+

This SELECT statement is like the previous examples,
except that instead of using a JOIN, we’re using a
LEFT JOIN. This type of join selects rows in the table on the left (i.e.,
birds) regardless of whether there is a matching row in the
table on the right (i.e., conservation_status). Because
there is no match on the right, MySQL returns a NULL value for columns
it cannot reconcile from the table on the right. You can see this in the
results. The Great Egret has a value of NULL for its
Status. This is because no value was entered in the
conservation_status_id column of the row related to that
bird species. It would return NULL if the value of that column is NULL,
blank if the column was set to empty (e.g., ''), or any
value that does not match in the right table.

Because of the LEFT JOIN, the results show all birds
with the word Egret in the common name even if we don’t
know their conservation status. It also indicates which Egrets need to
set the value of conservation_status_id. We’ll need to
update that row and others like it. An UPDATE statement with this same LEFT
JOIN
can easily do that. We’ll show a couple in the next
section.

Updating Joined Tables

If you want to use the UPDATE statement to change the data in multiple tables, or change data in a
table based on criteria from multiple tables, you can use the
JOIN clause. The syntax of the JOIN clause for
UPDATE is the same as it is for SELECT. So
let’s go straight to some practical examples. We’ll start with the
example at the end of the previous subsection.

Let’s use UPDATE with LEFT JOIN to
locate rows in the birds table that don’t have a value in
conservation_status_id. We could update all of the rows,
but let’s do only rows for one bird family,
Ardeidae (i.e., Herons, Egrets, and Bitterns).
First, execute this SELECT statement to
test our joins and WHERE clause:

SELECT common_name,
conservation_state
FROM birds
LEFT JOIN conservation_status USING(conservation_status_id)
JOIN bird_families USING(family_id)
WHERE bird_families.scientific_name = 'Ardeidae';

If you’re working from the data from the MySQL Resources site, you
should have over 150 rows in the results. You’ll notice that many of the
rows have nothing in the common_name field. That’s because
there are many bird species for which there are scientific names, but no
common names. Those rows also have no value for the
conservation_status_id. There are also a few rows for bird
species that do have common names.

Let’s add another row to the conservation_status, one
for an unknown state. We’ll set these unknown rows to that state. Enter
these two SQL statements:

INSERT INTO conservation_status (conservation_state)
VALUES('Unknown');

SELECT LAST_INSERT_ID();

+------------------+
| LAST_INSERT_ID() |
+------------------+
|                9 |
+------------------+

In the first SQL statement here we entered only a value for
conservation_state. The defaults for the other columns are
fine. We’ll use the UPDATE statement to set the rows for
the birds in Ardeidae to this new state, so we want
to know the conservation_status_id for it. To get that
value, we issue a SELECT statement with the LAST_INSERT_ID() function. It returns
the identifier generated from the previous SQL statement entered, which
added a row for the current client connection (i.e., just us). Let’s use
that number to set the conservation_status_id in the
birds table for bird species in
Ardeidae. If your identification number is
different, use what you received in the following SQL statement:

UPDATE birds
LEFT JOIN conservation_status USING(conservation_status_id)
JOIN bird_families USING(family_id)
SET birds.conservation_status_id = 9
WHERE bird_families.scientific_name = 'Ardeidae'
AND conservation_status.conservation_status_id IS NULL;

This UPDATE statement should have changed almost 100
rows on your server. The joins here are the same as we used in the
previous SELECT statement, in which we discovered that we
did not have a conservation status set for the Great Egret. Notice in
the WHERE clause here that one of the conditions is that
conservation_status.conservation_status_id has a value of
NULL. We could have removed the LEFT JOIN to the
conservation_status table and then updated simply all of
the rows for the Ardeidae birds that have a NULL
value in the conservation_status_id column. But that would
not have included any rows that might have other nonmatching values
(e.g., a blank column). By including this LEFT JOIN, we
updated all of these possibilities. However, it requires the condition
that the conservation_status.conservation_status_id is
NULL, the column from the right table—it will be assumed NULL if not
matched.

Because the method of joining tables is the same for both the
SELECT statement and the UPDATE statement, you
can easily test the JOIN clauses and WHERE
clause using a SELECT first. When
that’s successful, you can then execute an UPDATE statement
with the same JOIN and WHERE clauses. That’s
the best procedure to follow to ensure proper updating of data when
joining multiple tables.

Deleting Within Joined Tables

Having used JOIN with SELECT and
UPDATE statements, let’s look at some practical examples using
DELETE. In Deleting in Multiple Tables,
we saw an example of DELETE with a JOIN. In
that example, we wanted to delete the rows where the member Elena Bokova
has a yahoo.com email address from both the
humans and the prize_winners tables from the
birdwatchers database. For that purpose, we constructed a
DELETE statement that worked fine, but there was
potentially a problem with it. Here is that SQL statement again:

DELETE FROM humans, prize_winners
USING humans JOIN prize_winners
WHERE name_first = 'Elena'
AND name_last = 'Bokova'
AND email_address LIKE '%yahoo.com'
AND humans.human_id = prize_winners.human_id;

Compared to the JOIN clauses we’ve been using, the
syntax here may look strange. This is how it works with a
DELETE statement. Tables from which data is deleted are
listed in the FROM clause, while tables used in the
WHERE clause to provide filters to determine which rows to
delete are listed in a USING clause. The clause
USING humans JOIN prize_winners” just tells the server
that those two tables provide the columns in the WHERE
clause.

Note

Don’t confuse a USING clause, which has JOIN subclauses, with the USING operator, which can be used
in a JOIN clause.

As the preceding DELETE SQL statement is constructed,
if MySQL finds a row in the humans table where the name and
email information match, there has to be a matching row in the
prize_winners table for the human_id. If
there’s not a row in both, MySQL won’t delete the row in the
humans table and no error will be returned—you might not
realize it failed. To allow for this possibility, we could use a LEFT JOIN like so:

DELETE FROM humans, prize_winners
USING humans LEFT JOIN prize_winners
ON humans.human_id = prize_winners.human_id
WHERE name_first = 'Elena'
AND name_last = 'Bokova'
AND email_address LIKE '%yahoo.com';

Notice that for this syntax we moved the valuation of the
human_id columns to the USING clause, adding a LEFT
JOIN
and an ON operator to replace that condition in the WHERE clause. That’s necessary because if
there’s not a match in the other table, the WHERE clause
won’t include that row in the results to be deleted. With the LEFT
JOIN
, all of the rows in both the humans and the
prize_winners tables that match the criteria given to it
will be deleted, and any rows found in the humans table for
which there isn’t a match in the prize_winners table, but
which match the criteria of the WHERE clause will be
deleted also. This prevents what are known as orphaned rows.

For general maintenance, we should check occasionally to see if
there are rows in the prize_winners table that don’t have
matching rows in the humans table, and then delete them.
Someone might have had us delete their account, but we may have
forgotten to remove entries for them in related tables. To handle that
possibility, we could use RIGHT JOIN instead of LEFT
JOIN
. We could enter something like this:

DELETE FROM prize_winners
USING humans RIGHT JOIN prize_winners
ON humans.human_id = prize_winners.human_id
WHERE humans.human_id IS NULL;

In this DELETE statement, we listed only the
prize_winners table in the FROM clause because
that’s the only one from which we want to delete rows. It’s a good
policy not to list tables that are not to be affected in the FROM clause of a DELETE
statement, even if you think there’s no possible way that there is a row
that would be deleted in the other tables.

Because we put the humans table first in the
USING clause and the prize_winners table
second, we’re doing a RIGHT JOIN so that columns from the
table on the right ( prize_winners) will be deleted even if
there is no value in the table on the left. If we reversed the order of
the tables, we would then need a LEFT JOIN for this
task.

It’s worth focusing for a moment on the final clause of the
previous DELETE statement, a WHERE clause
checking for NULLs in one column. As we saw earlier, a LEFT
JOIN
or RIGHT JOIN can return rows where there was
nothing in the column you’re doing the join on. The results contain NULL
for the missing value. So in the WHERE clause here, we’re
using that as the condition for finding the orphaned rows in the
prize_winners table.

There are many contortions to the JOIN clause. The
basic JOIN syntaxes that we covered in Selecting a Basic Join are worth learning well; they
will be the ones you will use primarily. You will sometimes have a need
for using a LEFT JOIN or a RIGHT JOIN. Let’s
move on to a related topic that can be valuable in many situations:
subqueries.

Subqueries

A subquery is a query within another query, a SELECT statement
within another SQL statement. A subquery returns a single value, a row of
data, a single column from several rows, or several columns from several
rows. These are known respectively as scalar, column, row, and table
subqueries. I’ll refer to these distinctions later in this chapter.

Although the same results can be accomplished by using the
JOIN clause and sometimes the UNION, depending
on the situation, subqueries are a cleaner approach. They make a complex
query more modular, which makes it easier to create and to troubleshoot
problems. Here are two generic examples of subqueries (we also used a few
subqueries in Chapter 8):

UPDATE table_1
SET col_5 = 1
WHERE col_id =
  SELECT col_id
  FROM table_2
  WHERE col_1 = value;

SELECT column_a, column_1
FROM table_1
JOIN
  (SELECT column_1, column_2
   FROM table_2
   WHERE column_2 = value) AS derived_table
USING(col_id);

In the first example, the SELECT statement is an inner query. The UPDATE statement is
referred to as the main or outer query. In the second example, the
SELECT within parentheses is the inner query and the
SELECT outside of the parentheses is the outer query. An
outer query containing a subquery can be a SELECT,
INSERT, UPDATE, DELETE,
DO, or even a SET statement. There are some
limitations, though. An outer query cannot generally select data or modify
data from the same table of an inner query. This doesn’t apply
though if the subquery is part of a FROM clause.

These generic examples may be confusing. Generic examples aren’t
usually easy to follow. I’d rather present first the syntax for
subqueries, but there is no syntax per se for the use of subqueries—other
than the syntax inherent in the SQL statements used for the inner and
outer queries. Subqueries are rather a method of constructing combinations
of SQL statements. As such, you need only to make sure of two
basic factors with subqueries.

The first factor of which you need to be mindful is how a subquery
is contained within an outer query, where you position it. For instance,
if you construct an outer query which is an UPDATE statement,
you could place a subquery in the WHERE clause to provide a set of values to
which a column is equal (e.g., as in the first generic example). Or you
might locate a subquery in the FROM clause of an outer,
SELECT statement (e.g., as in the second generic example).
These are where subqueries may be positioned. You can have multiple
subqueries within an outer query, but they will be positioned generally
within the FROM clause or the WHERE
clause.

The second factor is whether the results returned from a subquery
are in keeping with the expectations of the outer query. For instance, in
the first generic example, the UPDATE clause has a
WHERE clause that expects a single value from the subquery.
If the subquery returns several values, a row of columns, or a table of
results, it will confuse MySQL and cause an error. So you need to be sure
that the subquery you construct will return the type of values required by
the outer query as you constructed it.

You’ll better understand these factors as we look at examples of
them. As mentioned at the start of this section, the different types of
subqueries are scalar, column, row, and table subqueries. In the following
subsections, we’ll look at each of these types, along with examples of
them.

Scalar Subqueries

The most basic subquery is one that returns a single value, a scalar value. This
type of subquery is particularly useful in a WHERE clause in conjunction with an
= operator, or in other instances where a single value from
an expression is permitted. Let’s look at simple example of this. Let’s
get a list of bird families that are members of the
Galliformes bird order (i.e., Grouse, Partridges,
Quails, and Turkeys). This can be done easily with a JOIN in which we join the
birds and bird_families tables together based
on the order_id for Galliformes. We’ll
use instead a scalar subquery to get the order_id we need.
Enter this in mysql:

SELECT scientific_name AS Family
FROM bird_families
WHERE order_id =
  (SELECT order_id
   FROM bird_orders
   WHERE scientific_name = 'Galliformes');
+----------------+
| Family         |
+----------------+
| Megapodiidae   |
| Cracidae       |
| Numididae      |
| Odontophoridae |
| Phasianidae    |
+----------------+

The inner query (i.e., the subquery here) returns one value, the
order_id. That’s used to complete the WHERE
clause of the outer query. That was pretty simple. Let’s look at another
example of a scalar subquery.

We had an example earlier in this chapter, in the section related
to using a JOIN, in which we selected members from Russia
who had sighted birds of the family Scolopacidae.
To thank members in Russia for using our telephone application for
recording sightings, we’re going to give a one-year premium membership
to one of those members. Enter this hefty SQL statement in mysql:

UPDATE humans
SET membership_type = 'premium',
membership_expiration = DATE_ADD(IFNULL(membership_expiration, 
  CURDATE()), INTERVAL 1 YEAR)
WHERE human_id =
  (SELECT human_id
   FROM
     (SELECT human_id, COUNT(*) AS sightings, join_date
      FROM birdwatchers.bird_sightings
      JOIN birdwatchers.humans USING(human_id)
      JOIN rookery.birds USING(bird_id)
      JOIN rookery.bird_families USING(family_id)
      WHERE country_id = 'ru'
      AND bird_families.scientific_name = 'Scolopacidae'
      GROUP BY human_id) AS derived_1
   WHERE sightings > 5
   ORDER BY join_date ASC
   LIMIT 1);

The most inner query here is basically the same as the one in the
example mentioned earlier. The difference is that here we’re not
selecting the names involved. Instead, we’re selecting the
human_id and the join_date (i.e., the date
that the member joined). With the GROUP BY clause, we’re grouping members
based on the human_id to get a count with the
COUNT() function. Put another way, we’re counting the
number of entries of each human_id in the
bird_sightings table for the bird family and member country
we specified. That subquery will return a table of results; it’s a table
subquery. We’ll talk more about that type of subquery later in this
chapter.

The query wrapped around the most inner query, which is also a
subquery, selects only rows where the number of sightings is more than
five. It orders the rows with newer members first based on the date the
members joined—we want the newest Russian member reporting several
Curlews and the like to be awarded a year of premium membership. This
subquery is limited to one row with one column. It’s a scalar
query.

The main query in the preceding example is using the single value
from the scalar query to determine which member to give one year of
premium membership. If we hadn’t added the LIMIT to the scalar query, it would have
returned more than one value—it then wouldn’t have been a scalar query.
Based on the operator in the WHERE clause of its outer
query, MySQL would have returned an error message like this:

ERROR 1242 (ER_SUBSELECT_NO_1_ROW)
SQLSTATE = 21000
Message = "Subquery returns more than 1 row"

As with all subqueries, there’s always a way to get the same
results without a subquery, using JOIN or some other method
to bring results together in complex ways. To some extent, it’s a matter
of style which method you decide to use. I generally prefer subqueries,
especially when using them in applications I develop in PHP or Perl.
They’re easier for me to decipher months or years later when I want to
make changes to a program I’ve written.

Column Subqueries

In the preceding subsection, we discussed instances in which one scalar
value was obtained in a WHERE clause. However,
there are times when you may want to match multiple values. For those
situations, you will need to use the subquery in conjunction with an
operator such as IN, which is used to specify a
comma-separated list of values. Let’s look at an example of this.

In one of the examples in the previous subsection, we used a
scalar subquery to get a list of bird families for the bird order
Galliformes. Suppose that we also want the common
name of one bird species from each family in the order; we want to
randomly select a bird name from each. To do this, we will create a
subquery that will select a list of bird family names for the order.
Enter the following SQL statement:

SELECT * FROM
  (SELECT common_name AS 'Bird',
   families.scientific_name AS 'Family'
   FROM birds
   JOIN bird_families AS families USING(family_id)
   JOIN bird_orders AS orders USING(order_id)
   WHERE common_name != ''
   AND families.scientific_name IN
     (SELECT DISTINCT families.scientific_name AS 'Family'
      FROM bird_families AS families
      JOIN bird_orders AS orders USING(order_id)
      WHERE orders.scientific_name = 'Galliformes'
      ORDER BY Family)
   ORDER BY RAND()) AS derived_1
GROUP BY (Family);

+------------------------+----------------+
| Bird                   | Family         |
+------------------------+----------------+
| White-crested Guan     | Cracidae       |
| Forsten's Scrubfowl    | Megapodiidae   |
| Helmeted Guineafowl    | Numididae      |
| Mountain Quail         | Odontophoridae |
| Gray-striped Francolin | Phasianidae    |
+------------------------+----------------+

In this example, we have two subqueries, a subquery within a
subquery, within an outer query. The most inner subquery is known as a
nested subquery. The subqueries here are executed before the outer
query, so the results will be available before the WHERE
clause of the outer query is executed. In that vein, the nested subquery
will be executed before the subquery in which it is contained. In this
example, the nested query is contained within the parentheses of the
IN operator—the most indented query. That SQL statement
selects the bird family name where the name of the order is
Galliformes. The DISTINCT flag
by the alias Family instructs MySQL to return
only one entry for each distinct family name. If we had manually entered
that information, it would look like this:
(‘Cracidae’,’Megapodiidae’,’Numididae’,’Odontophoridae’,’Phasianidae’).
This subquery is a multiple-field or column subquery.

The inner subquery in the preceding example is a table subquery.
It selects a list of all birds that are in the list of bird families
provided by its subquery. We could just select one bird for each family
at this level using a GROUP BY clause to group by the Family name to get one bird
species per family. But that would select the first rows found and the
results would be the same every time. We want to select randomly each
time this SQL statement is executed. To do that, we’re selecting all of
the birds for each bird family and then using ORDER BY RAND() to randomly order the
rows of the results table. Then we’re wrapping that in another query,
the outer query to GROUP BY the bird family. That will give
us one entry for each bird family.

Row Subqueries

Row subqueries retrieve a single row of data that is then used by the
outer query. It’s used in a WHERE clause to compare one row of columns
to one row of columns selected in the subquery. Let’s consider an
example of this and then we’ll discuss it more. Suppose another
bird-watcher site closes, this one in Eastern Europe. They send us their
database, which contains a table with the names of their members, and
another table with information members provided related to birds they
spotted. We put both of these tables in the birdwatchers
database to import into our tables. In the process of importing these
members into our humans table, we discover people who are
already members of our site. That’s OK: we know how to avoid importing
the duplicates. Now we want to import the table of birds spottings.
Because there were duplicate members, maybe those members have logged
information on birds they saw in the wild on this Eastern European site.
So we want to check that each entry is not a duplicate and then import
it. Look at this SQL statement:

INSERT INTO bird_sightings
(bird_id, human_id, time_seen, location_gps)
VALUES
  (SELECT birds.bird_id, humans.human_id,
   date_spotted, gps_coordinates
   FROM
     (SELECT personal_name, family_name, science_name, date_spotted,
      CONCAT(latitude, '; ', longitude) AS gps_coordinates
      FROM eastern_birders
      JOIN eastern_birders_spottings USING(birder_id)
      WHERE
          (personal_name, family_name,
           science_name, CONCAT(latitude, '; ', longitude) )
        NOT IN
          (SELECT name_first, name_last, scientific_name, location_gps
           FROM humans
           JOIN bird_sightings USING(human_id)
           JOIN rookery.birds USING(bird_id) ) ) AS derived_1
   JOIN humans
   ON(personal_name = name_first
      AND family_name = name_last)
   JOIN rookery.birds
   ON(scientific_name = science_name) );

This looks very complicated and can be difficult to understand or
construct correctly. Let’s discern the major elements here. Look first
at the subquery in parentheses, the nested subquery. We’re selecting
data from tables in our database: the names of each person, the bird
species and where the member sighted it. This nested subquery is
contained within the WHERE clause of another subquery, a
row subquery. Notice that a list of columns from the tables of the row
subquery is given in parentheses. So the condition of the
WHERE clause is that the values of those columns for each
row of the joined tables are compared to the values of the columns for
each row from joined tables in its subquery. The outer query inserts the
relevant values into the bird_sightings table.

The preceding example is certainly an odd one and seemingly,
unnecessarily complex. But there are times when a row query like this
can be useful. To put our example more simply, if there’s a row with the
same human name who spotted the same bird species at the exact same map
coordinates, don’t import it. If all of those values are not the same,
then insert it into the bird_sightings table. There are
other ways, though, you can accomplish this task. For instance, you
might do this in stages with multiple SQL statements and a temporary
table. You could also do it in stages within a program using one of the
languages like Perl and an API like the Perl DBI. But it’s good to know
you have the option of doing it within one SQL statement if that’s what
you want.

Table Subqueries

A subquery can be used to generate a results set, a table from which an
outer query can select data. That is to say, a subquery can be
used in a FROM clause as if it were another
table in a database. It is said to be a derived table.

There are a few rules related to table subqueries. Each derived
table must be assigned an alias—any unique name is fine. You can use the
keyword AS for assigning an alias. Each column in a
subquery that is in part of a FROM clause must have a
unique name. For instance, if you select the same column twice in a
subquery, you have to assign at least one of them an alias that is
unique. A subquery contained in a FROM clause cannot
generally be a correlated subquery; it cannot reference the same table
as the outer query.

For an example of a table subquery, let’s use the example near the
beginning of this chapter that used a UNION. In that example, we had two SELECT statements which counted the
number of rows for birds in two bird families:
Pelecanidae and Ardeidae. With
a UNION, the results were merged into one results set. That
was a bulky method. We can do better with a table subquery. The subquery
we’ll use will select just the bird family name for each bird of the two
families that we wanted to count. That may seem silly, to list the bird
family name multiple times, especially when we already know the name of
the bird families we want to count. But that’s how we can count them and
use the name for our results set. MySQL won’t display the names multiple
times—that will go on behind the scenes. It will display only one entry
per family because of the GROUP BY clause. Enter the
following:

SELECT family AS 'Bird Family',
COUNT(*) AS 'Number of Birds'
FROM
  (SELECT families.scientific_name AS family
   FROM birds
   JOIN bird_families AS families USING(family_id)
   WHERE families.scientific_name IN('Pelecanidae','Ardeidae')) AS derived_1
GROUP BY family;

+-------------+-----------------+
| Bird Family | Number of Birds |
+-------------+-----------------+
| Ardeidae    |             157 |
| Pelecanidae |              10 |
+-------------+-----------------+

This a much better way to form this unified results set than using
a UNION. We could add more bird family names to the WHERE clause in the subquery to get more
rows in the results set, instead of having to copy the
SELECT statement for each family we add.

You can see in this example that a table subquery is the same as a
table in the FROM clause. We can even give it an alias
(e.g., derived_1) as we can with a normal table. The
subquery returns a table of results (i.e., the bird family names). The
GROUP BY clause tells MySQL to group the results based on
the family field, the alias in the subquery for the
scientific_name column of the bird_families
table. We used that same alias to select that field in the column list
of the outer query. When a column in a subquery is set to an alias, you
have to use the alias; the column name becomes inaccessible outside the
subquery when an alias is given.

Performance Considerations with Subqueries

Performance problems can occur with subqueries if they are not well
constructed. There can be a performance drain when a subquery is placed
within an IN() operator as part of a
WHERE clause of the outer query. It’s generally better to
use instead the = operator, along with AND for
each column= value
pair. For situations in which you suspect poor performance with a
subquery, try reconstructing the SQL statement with JOIN
and compare the differences between the two SQL statements using
the BENCHMARK() function. For ideas on
improving subquery performance, Oracle has tips on their site for Optimizing
Subqueries.

Summary

Many developers prefer subqueries—I do. They’re easier to construct
and decipher when you have problems later. If you work on a database that
is very large and has a huge amount of activity, subqueries may not be a
good choice because they can sometimes affect performance. For small
databases, though, they’re fine. You should learn to use subqueries and
learn how to work without them (i.e, use JOIN) so you can
handle any situation presented to you. You cannot be sure which method
your next employer and team of developers may being using. It’s best to be
versatile.

As for learning to use JOIN, that’s hardly optional.
Very few developers don’t use JOIN. Even if you prefer
subqueries, they still call for JOIN. You can see this in
almost all of the examples of subqueries in this chapter. You may rarely
use UNION. But there’s not much to learn there. However, you
should be proficient in using JOIN. So don’t avoid them;
practice manually entering SQL statements that use them. The act of typing
them helps.

Exercises

The goal of the following exercises is to give you practice
assembling tables using JOIN and creating subqueries. In the
process of doing these exercises, think about how tables and data come
together. Try to envision each table as a separate piece of paper with a
list of data on it, and how you might place them on a desk to find
information on them in relation to each other. In such a scenario, you
might tend to place your left index finger at one point on a page on the
left and your right index finger on a point on another page on your right.
That’s a join. Where you point on each are the join points. As you type
the SQL statements in these exercises, think of this scene and say aloud
what you’re doing, what you’re telling MySQL to do. It helps to better
understand the joining of tables and creating of subqueries.

  1. In the birdwatchers database, there is a table
    called bird_sightings in which there are records of birds
    that members have seen in the wild. Suppose we have a contest in which
    we will award a prize based on the most sightings of birds from the
    order Galliformes. A member gets one point for
    each sighting of birds in this order.

    Construct an SQL statement to count the number of entries from
    each member. There should be two fields in the results set: one
    containing the human_id with Birder as the
    alias; and the second field containing the number of entries with
    Entries as its alias. To accomplish this, join the
    bird_sightings table to birds,
    bird_families, and bird_orders. Remember
    that these tables are in a different database. You will have to use
    the COUNT() function and a GROUP BY clause.
    Do all of this with JOIN and not with subqueries. Your
    results should look like the following:

    +--------+---------+
    | Birder | Entries |
    +--------+---------+
    |     19 |       1 |
    |     28 |       5 |
    +--------+---------+

    When you have successfully constructed this SQL statement,
    modify it to join in the humans table. In the column
    list, replace the field for human_id with the first and
    last name of the member. Use the CONCAT() function to
    put them together into a single field (with a space in between the
    names), with the same alias. Once you make the needed changes and
    execute it, the results should look like this, but the number of names
    and points may be different:

    +--------------+--------+
    | Birder       | Points |
    +--------------+--------+
    | Elena Bokova |      4 |
    | Marie Dyer   |      8 |
    +--------------+--------+
  2. In the preceding exercises, you were asked to count the number
    of bird species the members sighted from the
    Galliformes. So that the contest is more fun,
    instead of giving one point for each bird species in that order, give
    a point for only one bird species per bird family in the bird order.
    That means that a member doesn’t get more points for sighting the same
    bird species multiple times. A member also doesn’t get more points for
    spotting several birds in the same family. Instead, the member has to
    look through bird guides to find a species for each species and then
    go looking for one from each in their area. This should make the
    contest more of an adventure for the members.

    To allow for the change to the contest, you will need to modify
    the SQL statement you constructed at the end of the previous exercise.
    First, you will need to add a DISTINCT to the start of
    the column list in the outer query. You’ll need to remove the
    CONCAT() and GROUP BY. When you’ve done
    that, execute the SQL statement to make sure you have no errors. You
    should get a results set that shows multiple entries for some members.
    Next, place the whole SQL statement inside another SQL statement to
    make it a subquery. The new, outer query should include
    CONCAT() and GROUP BY so that it can count
    the single entries from each family for each member. It should return
    results like this:

    +--------------+--------+
    | Birder       | Points |
    +--------------+--------+
    | Elena Bokova |      1 |
    | Marie Dyer   |      5 |
    +--------------+--------+
  3. There are five families in the Galliformes
    bird order. For the contest described in the last two exercises, the
    most points that a member could achieve therefore is 5. Change the SQL
    statement you entered at the end of the previous exercise to list only
    members who have 5 points. To do this, you will need to wrap the
    previous SQL statement inside another, creating a nested query. When
    you execute the full SQL statement, the results should look like
    this:

    +------------+--------+
    | Birder     | Points |
    +------------+--------+
    | Marie Dyer |      5 |
    +------------+--------+

Comments are closed.

loading...