MySQL – MariaDB – String Functions

How to install Chocolatey on Windows 10

A string is a value that can contain alphabetical characters, digits, and other characters
(e.g., the ampersand, the dollar sign). Although a string can contain
numbers, they are not considered numeric values. It’s a matter of context
and perspective. For instance, postal codes in the United States are all
digits, but you shouldn’t store them as integers because the postal code for
02138 would become 2138. You should use a string to store the postal
code.

To make the handling of strings easier, MySQL provides many built-in
functions. You can format text for nicer results, make better expressions in
a WHERE clause, or otherwise extract and manipulate data from a
string or column. Therefore, in this chapter, we’ll go through several
string functions, grouping them by similar features, and provide examples of
how they might be used.

Basic Rules for Using Functions

There are a few things to remember when using functions. String functions
also have some conventions of their own. Some of these rules can be
different depending on how your server is configured:

  • The basic syntax of a function is to a keyword immediately
    followed by arguments in parentheses. You cannot generally have a
    space between the keyword and the opening parenthesis like you can
    with operators in SQL statements (e.g., IN () within a
    WHERE clause).

  • Some functions take no arguments, such as NOW(), which returns
    the current date or time. Other functions accept a particular number
    of arguments. Arguments are generally separated by commas, and some
    arguments can be augmented with keywords.

  • When you pass text as an argument to a string function, put the
    text in single or double quotes.

  • When giving a column as an argument, you generally don’t use
    single quotes around the column name—if you do, MySQL will think you
    mean the literal text given. You can use backticks around the column
    name if the name is a reserved word or contains a character that might
    cause other problems.

  • If by chance a string function tries to return a value that is
    larger (i.e., more characters) than allowed by the system settings
    (set by the max_allowed_packet configuration
    option), MySQL will return NULL instead.

  • Some arguments to string functions represent positions within
    the strings. The first character in a string is numbered 1, not 0.
    Some functions let you count back from the end of the string, using
    negative integers. In these arguments, -1 refers to the last
    character.

  • Some string functions call for a character length as an
    argument. If you give a fractional value to these functions, MySQL
    will round that value to the nearest integer.

Formatting Strings

Several string functions can format or reconstitute text for a better display. They allow
you to store data in columns in a raw form or in separate components and
then create the display you want when you retrieve the data.

For instance, in the humans table, we are able to store
each member’s title, first name, and last name in separate columns because
we can put them together when needed. Breaking apart the names allows us
to sort easily based on last name or first name. You’ll see how this is
done in the next subsection.

Concatenating Strings

The CONCAT() function is very useful for pasting together the contents of different
columns, or adding some other text to the results retrieved from a
column. This is probably the most used string function—we’ve already
used it in several examples in previous chapters. Within the parentheses
of the function, in a comma-separated list, you give the strings,
columns, and other elements that you want to merge together into one
string.

Let’s look at an example of how it might be used within a SELECT statement. Suppose we want to get a
list of a few members and birds that they’ve seen. We could enter an SQL
statement like this:

SELECT CONCAT(formal_title, '. ', name_first, SPACE(1), name_last) AS Birder,
CONCAT(common_name, ' - ', birds.scientific_name) AS Bird,
time_seen AS 'When Spotted'
FROM birdwatchers.bird_sightings
JOIN birdwatchers.humans USING(human_id)
JOIN rookery.birds USING(bird_id)
GROUP BY human_id DESC
LIMIT 4;

+----------------------+----------------------------------+--------------------+
| Birder               | Bird                             | When Spotted       |
+----------------------+----------------------------------+--------------------+
| Ms. Marie Dyer       | Red-billed Curassow - Crax blu...| 2013-10-02 07:39:44|
| Ms. Anahit Vanetsyan | Bar-tailed Godwit - Limosa lap...| 2013-10-01 05:40:00|
| Ms. Katerina Smirnova| Eurasian Curlew - Numenius arq...| 2013-10-01 07:06:46|
| Ms. Elena Bokova     | Eskimo Curlew - Numenius borea...| 2013-10-01 05:09:27|
+----------------------+----------------------------------+--------------------+

The first field displayed by this SQL statement is not a single
column from the table, but a CONCAT() function that merges
the bird-watcher’s title, first name, and last name. We added a period
in quotes after the title, as we’ve decided to store the titles without
a period. We used quote marks to add spaces where needed. For the second
field, we concatenated the common name of each bird species with the
scientific name, and put spaces and a hyphen between them.

Without CONCAT(), we might be tempted to combine
text in one column that really should be separated. For instance, we
might put the common and scientific names of bird species in one column.
Keeping values in separate columns makes a database more efficient and
flexible. String functions like CONCAT() alleviate the
need to do otherwise.

A less common concatenating function is CONCAT_WS(). It puts together columns
with a separator between each. The first argument is the element you
want to use as a separator (e.g., a space) and the rest of the arguments
are the values to be separated. This can be useful when making data
available for other programs.

For instance, suppose we have embroidered patches made with the
name of the Rookery site on them and we want to mail one to each premium
member. To do this, we use an advertising and marketing agency that will
handle the mailing. The agency needs the names and addresses of members,
and would like that data in a text file, with the values of each field
separated by vertical bars. To do this, we’ll run mysql on the command line, passing a single
statement to it:

mysql -p --skip-column-names -e \
"SELECT CONCAT_WS('|', formal_title, name_first, name_last,
street_address, city, state_province, postal_code, country_id)
FROM birdwatchers.humans WHERE membership_type = 'premium'
AND membership_expiration > CURDATE();" > rookery_patch_mailinglist.txt

This example uses mysql with
several options. The --skip-column-names option tells MySQL
not to display the column headings—we want just the data separated by
bars. The -e option says that what follows within quotes is
to be executed. We then put the SQL statement within double quotes. The
first argument to CONCAT_WS() is the vertical bar that the
company wants as a separator. The remaining arguments are the columns to
be strung together. After the closing double quotes, we use
> to redirect the results to a text file that we’ll
email to the agency. There is a potential problem with the SQL statement
we used. If a column has a NULL value, nothing will be exported and no
bar will be put in the file to indicate an empty field. Here’s an
example of how the text file would look:

Ms|Rusty|Osborne|ch
Ms|Elena|Bokova|ru

We have only four fields for these members, although we told MySQL
to export eight fields. If these two records were in the midst of
thousands of records, they would cause errors that might not be obvious
when imported. Although it’s more cumbersome, we should wrap each column
name in an IFNULL() function. Then we can give a value to
display if the column is NULL, such as the word
unknown or a blank space. Here’s the same example
again, but with the IFNULL() function:

mysql -p --skip-column-names -e \
"SELECT CONCAT_WS('|', IFNULL(formal_title, ' '), IFNULL(name_first, ' '),
IFNULL(name_last, ' '), IFNULL(street_address, ' '),
IFNULL(city, ' '), IFNULL(state_province, ' '),
IFNULL(postal_code, ' '), IFNULL(country_id, ' '))
FROM birdwatchers.humans WHERE membership_type = 'premium'
AND membership_expiration > CURDATE();" > rookery_patch_mailinglist.txt

It looks daunting and excessive, but it’s simple to MySQL. The new
contents of the text file follow:

Ms|Rusty|Osborne| | | | |ch
Ms|Elena|Bokova| | | | |ru

That’s a manageable data file. When the results are like this, the
marketing company can import all of the records without errors and then
contact us to try to get the missing information. They can add it to
their system without having to reimport the text file.

Setting Case and Quotes

Occasionally, you might want to convert the text from a column to either all lowercase letters or
all uppercase letters. For these situations, there are LOWER() and UPPER(), which
can also be spelled LCASE() and UCASE(),
respectively. In the example that follows, the output of the first
column is converted to lowercase and the second to uppercase:

SELECT LCASE(common_name) AS Species,
UCASE(bird_families.scientific_name) AS Family
FROM birds
JOIN bird_families USING(family_id)
WHERE common_name LIKE '%Wren%'
ORDER BY Species
LIMIT 5;

+-------------------------+---------------+
| Species                 | Family        |
+-------------------------+---------------+
| apolinar's wren         | TROGLODYTIDAE |
| band-backed wren        | TROGLODYTIDAE |
| banded wren             | TROGLODYTIDAE |
| bar-winged wood-wren    | TROGLODYTIDAE |
| bar-winged wren-babbler | TIMALIIDAE    |
+-------------------------+---------------+

The QUOTE() function takes a string and returns it enclosed in single quotes. But it
does a good deal more: it makes it input-safe by marking certain
characters that could cause trouble in SQL statements or other
programming languages. These characters are single quotes, backslashes,
null (zero) bytes, and Ctrl-Z characters. The QUOTE()
function precedes each of these with a backslash so that they won’t be
interpreted in some way or (in the case of a single quote) cause SQL to
prematurely terminate the string.

In the following example, we’re selecting a list of bird species
named for a Prince or
Princess:

SELECT QUOTE(common_name)
FROM birds
WHERE common_name LIKE "%Prince%"
ORDER BY common_name;

+----------------------------------+
| QUOTE(common_name)               |
+----------------------------------+
| 'Prince Henry\'s Laughingthrush' |
| 'Prince Ruspoli\'s Turaco'       |
| 'Princess Parrot'                |
+----------------------------------+

Notice in the results that because of the QUOTE()
function, the strings returned are enclosed in single quotes, and any
single quotes within the strings are escaped with a backslash. This can
prevent errors if the value is passed to another program.

Trimming and Padding Strings

One of the problems with allowing the public to enter data into a website is
that they’re not always careful. They do things like adding spaces
before and after the text. There are a few functions for trimming any
leading or trailing spaces from the values of a column.
The LTRIM() function eliminates any leading
spaces to the left. For columns with spaces on the right, RTRIM() will remove them. A more
versatile trimming function, though, is TRIM(). With it,
you can trim both left and right spaces.

These trim functions can be useful for cleaning data with the
UPDATE statement. Let’s look at an example of their use. In
these SQL statements, we’ll use LTRIM() and
RTRIM() to eliminate both leading and trailing
spaces:

UPDATE humans
SET name_first = LTRIM(name_first),
name_last = LTRIM(name_last);

UPDATE humans
SET name_first = RTRIM(name_first),
name_last = RTRIM(name_last);

In this example, we trimmed the leading spaces with the first UPDATE and the trailing spaces with the
second one. Notice that we set the value of the columns to the same
values, but with the strings trimmed. We can combine these functions
into one SQL statement like so:

UPDATE humans
SET name_first = LTRIM( RTRIM(name_last) ),
name_last = LTRIM( RTRIM(name_last) );

You can always combine functions like this for a more dynamic
result. In this case, though, the TRIM() function is
a better alternative. Here’s the same SQL statement using
it:

UPDATE humans
SET name_first = TRIM(name_first),
name_last = TRIM(name_last);

The TRIM() function also offers more options. You
can specify something other than spaces to remove. For instance, suppose
we receive a small table with bird sightings from another bird-watcher
club, as we did in Row Subqueries.
However, in this table, the scientific names of bird species are within
double quotes. If we wanted to insert that data into our
bird_sightings table, we could use the same SQL query as we
did before, with the addition of the TRIM() function. Here
is the relevant excerpt, the last lines on which we join their table to
our birds table:

…
JOIN rookery.birds
ON(scientific_name = TRIM(BOTH '"' FROM science_name) ) );

It may be difficult to see, but we’re enclosing the character that
we want trimmed—a double quote—within single quotes. The keyword
BOTH isn’t actually necessary because it’s the
default—that’s why we didn’t specify it in the previous example. If you
don’t want to remove the string given from one end or the other, you can
specify LEADING or TRAILING, thus making
TRIM() work like LTRIM() or
RTRIM(). The default string to trim is a space, as we have
seen.

When displaying data in web forms and other such settings, it’s sometimes
useful to pad the data displayed with dots or some other filler. This
can be necessary when dealing with VARCHAR columns where
the width varies. Padding the results of a column selected can help the
user to see the column limits. There are two functions that may be used
for padding: LPAD() and RPAD().
There is also SPACE(), which pads the string with spaces:

SELECT CONCAT(RPAD(common_name, 20, '.' ),
RPAD(Families.scientific_name, 15, '.'),
Orders.scientific_name) AS Birds
FROM birds
JOIN bird_families AS Families USING(family_id)
JOIN bird_orders AS Orders
WHERE common_name != ''
AND Orders.scientific_name = 'Ciconiiformes'
ORDER BY common_name LIMIT 3;

+--------------------------------------------------+
| Birds                                            |
+--------------------------------------------------+
| Abbott's Babbler....Pellorneidae...Ciconiiformes |
| Abbott's Booby......Sulidae........Ciconiiformes |
| Abbott's Starling...Sturnidae......Ciconiiformes |
+--------------------------------------------------+

Notice how all the bird families and orders are aligned
vertically. This is because we padded each value out to its maximum
width using RPAD(). The first argument was the column to
read, the second was the total size of the resulting string we want, and
the third was a period so that periods apear for columns that have less
text. This happens to work because MySQL uses a fixed-width font. We
could uses spaces instead of dots for a similar effect. For web display,
we might use   as padding element for non-breaking spaces.

Extracting Text

There are a few functions for extracting a piece of text from a string. You
indicate the point from which to start selecting text and how much text
you want. There are four such functions: LEFT(),
MID(), RIGHT(), and SUBSTRING().
The SUBSTRING_INDEX() function is also related. We’ll look
at each one here.

Let’s look at the LEFT(), MID(), and
RIGHT() functions first. Suppose our marketing agency
acquires a table called prospects containing a list of people
who are known to be bird-watchers. Each person’s title and first and last
name is stored in a column called prospect_name, with email
addresses in another column. The prospect_name column is a
fixed character length data type, CHAR(54). The marketing
agency tells us that the title is contained in the first four characters,
the first name in the next 25, and the last name in the remaining 25. For
the titles, they’re using only Mr. and
Ms. with a space after each—hence the first four
characters—but we will extract just the first two characters for our
tables. Let’s see how that column looks by executing a simple
SELECT to retrieve four names:

SELECT prospect_name
FROM prospects LIMIT 4;

+--------------------------------------------------------+
| prospect_name                                          |
+--------------------------------------------------------+
| Ms. Caryn-Amy                Rose                      |
| Mr. Colin                    Charles                   |
| Mr. Kenneth                  Dyer                      |
| Ms. Sveta                    Smirnova                  |
+--------------------------------------------------------+

As you can see, the data is a fixed width for each element.
Normally, with a CHAR column, MySQL would not store the
trailing spaces. Whoever created this table enforced the rigid format (4,
25, and 25 characters) by executing SET sql_mode =
'PAD_CHAR_TO_FULL_LENGTH';
before inserting data into the
column.

With an INSERT INTO…SELECT statement and a few
functions, we can extract and separate the data we need and put these
prospects in a new table we created that we call
membership_prospects. Let’s execute the SELECT
first to test our organization of the functions before we insert the
data:

SELECT LEFT(prospect_name, 2) AS title,
MID(prospect_name, 5, 25) AS first_name,
RIGHT(prospect_name, 25) AS last_name
FROM prospects LIMIT 4;

+-------+---------------------------+---------------------------+
| title | first_name                | last_name                 |
+-------+---------------------------+---------------------------+
| Ms    | Caryn-Amy                 | Rose                      |
| Mr    | Kenneth                   | Dyer                      |
| Mr    | Colin                     | Charles                   |
| Ms    | Sveta                     | Smirnova                  |
+-------+---------------------------+---------------------------+

In the example’s LEFT() function, the starting point
for extracting data is the first character. The number we gave as an
argument (i.e., 2), is the number of characters we want to extract
starting from the first. The RIGHT() function is similar,
but it starts from the last character on the right, counting left. The
MID() function is a little different. With it, you can
specify the starting point (i.e., the fifth character in our example) and
how many characters you want (i.e., 25 characters).

The SUBSTRING() function is synonymous with MID() and their syntax is
the same. By default, if the number of characters to capture isn’t
specified, it’s assumed that all the remaining ones are to be extracted.
This makes these functions work like the LEFT() function. If
the second argument to SUBSTRING() or MID() is
a negative number, the function will start from the end of the string,
making it like the RIGHT() function.

Because the SUBSTRING() function is so versatile, we
can use it to accomplish all the text extraction in the previous example.
The equivalent SELECT would look like this:

SELECT SUBSTRING(prospect_name, 1, 2) AS title,
SUBSTRING(prospect_name FROM 5 FOR 25) AS first_name,
SUBSTRING(prospect_name, -25) AS last_name
FROM prospects LIMIT 3;

This example shows three ways to use
SUBSTRING():

SUBSTRING(prospect_name, 1, 2) AS
title

This has the same syntax we have used for other functions in
this section: three arguments to specify the column with the text,
the starting point for extracting text, and the number of characters
to extract.

SUBSTRING(prospect_name FROM 5 FOR 25)
AS first_name

This shows a different, wordier syntax. The starting point
here is 5 and the number of characters to extract is 25.

SUBSTRING(prospect_name, -25) AS
last_name

This specifies a starting point of −25 characters. Because it
doesn’t specify how many to extract, MySQL takes the remaining
characters from that starting point.

You can use whatever style you prefer.

The SUBSTRING_INDEX() is similar to the previous functions, but looks for elements that
separate data within a string. For example, suppose the
prospect_name column was constructed differently. Suppose
that instead of having fixed width for the title and names, the text had
vertical bars between them. This would be odd for data in a column, but it
is possible. Here’s how we could separate the same column containing the
vertical bar character as the separator (the first and third third lines
using SUBSTRING_INDEX() are fairly understandable, but the
second one is more complex):

SELECT SUBSTRING_INDEX(prospect_name, '|', 1) AS title,
SUBSTRING_INDEX( SUBSTRING_INDEX(prospect_name, '|', 2), '|', -1) AS first_name,
SUBSTRING_INDEX(prospect_name, '|', -1) AS last_name
FROM prospects WHERE prospect_id = 7;

The second argument to SUBSTRING_INDEX() tells MySQL
how to break the string into the pieces of text we want. In our example,
we use '|' to specify the vertical bar. The number in the
third argument tells how many elements to take. So in the first line here
we’re saying to get the first element. In the third line, because it has a
negative sign in front of the number, we’re saying to count from the end
and get one element there. In the second line, we’re using
SUBSTRING_INDEX() twice, one call embedded inside the other.
The inner call extracts the first two elements. Using those results, we
then use an outer call to extract its first element starting from the
end.

Using SUBSTRING() is much nicer, but you need to know
the starting point and how many characters to take. In our vertical bar
example, we’d need to know exactly where the vertical bars are in each
name. To do that, you will need to use other functions to search strings.
Those are covered in the next section.

Searching Strings and Using Lengths

MySQL and MariaDB do not have comprehensive functions for searching string based
on patterns. Yes, there’s the REGEXP operator that permits some pattern matching. But this isn’t as robust
and isn’t fine tuned as easily as the capabilities offered by programming
languages like PHP and Perl. But there are a few functions that assist in
searching strings. We’ll look at some of them in this section.

Locating Text Within a String

MySQL and MariaDB have a few built-in functions that can find characters within a
string. These functions return the location where the search parameter
was found.

The LOCATE() function returns the numeric starting point just left of the first
occurrence of a given substring in a given string. It does not search
beyond this point. Let’s look at an example. Suppose we want a list of
Avocet birds—they’re a type of shore birds that is part of the
Recurvirostridae family. We could enter something
like this:

SELECT common_name AS 'Avocet'
FROM birds
JOIN bird_families USING(family_id)
WHERE bird_families.scientific_name = 'Recurvirostridae'
AND birds.common_name LIKE '%Avocet%';

+-------------------+
| Avocet            |
+-------------------+
| Pied Avocet       |
| Red-necked Avocet |
| Andean Avocet     |
| American Avocet   |
+-------------------+

Now suppose we want to eliminate the word
Avocet from the names returned. There are a few
ways we might do that: one way is to use the LOCATE()
function to find the word Avocet, and extract all
text before it with the SUBSTRING() function:

SELECT
SUBSTRING(common_name, 1, LOCATE(' Avocet', common_name) ) AS 'Avocet'
FROM birds
JOIN bird_families USING(family_id)
WHERE bird_families.scientific_name = 'Recurvirostridae'
AND birds.common_name LIKE '%Avocet%';

+-------------+
| Avocet      |
+-------------+
| Pied        |
| Red-necked  |
| Andean      |
| American    |
+-------------+

That’s a cumbersome example, but it shows you how you can use
LOCATE() in conjunction with other functions to get what
you want from a string. Let’s look at another example.

Earlier in this chapter, in Trimming and Padding Strings, we had some examples
involving merging data from another bird-watcher group. That included
using the TRIM() function to remove quotes from
around the scientific names of birds spotted by people in that group.
Let’s use that column again, but assume that it doesn’t have quotes.
Instead, the bird species is given with its bird family in this format:
bird species - bird family. For this, we can
use the LOCATE() function to locate the hyphen and then
the SUBSTRING() to get the family name for the
JOIN clause in that earlier example. Here’s just the excerpt from the
JOIN clause:

…
JOIN rookery.birds
ON(scientific_name = SUBSTRING(science_name, LOCATE(' - ', science_name) + 3 ) );

Let’s pull this apart to understand it better. First, let’s focus
on the inner function, the LOCATE(). The search parameter
it’s given is a hyphen surrounded by spaces. The
science_name column is the string to search. This function
will return the position in the string where the search parameter is
found. We’re adding 3 to that because the search parameter is three
characters long—in other words, LOCATE() gives us the
point before the separator and we want to get the
substring after the end of the separator. So the
results of LOCATE() + 3 is given as the starting point for
the SUBSTRING() function. Because we’re not specifying how
many characters we want, MySQL will extract the remaining characters.
That will give us the scientific name of the bird in the table we’re
joining to birds.

The POSITION() function works like LOCATE(), except that it takes
the keyword IN instead of a comma between the substring
you’re searching for and the containing string:

POSITION(' - ' IN science_name)

In addition, LOCATE() accepts an optional argument
to indicate the starting point for the search, which is not available in POSITION().

Another function for searching a string is FIND_IN_SET(). If you have a string that
contains several pieces of data separated by commas, this function tells
you which element in that set of data contains the search pattern you
give it. To understand this better, suppose that we want to get a list
of members from Russia, but ordered by the date when the members joined.
We would enter this:

SELECT human_id,
CONCAT(name_first, SPACE(1), name_last) AS Name,
join_date
FROM humans
WHERE country_id = 'ru'
ORDER BY join_date;

+----------+-------------------+------------+
| human_id | Name              | join_date  |
+----------+-------------------+------------+
|       19 | Elena Bokova      | 2011-05-21 |
|       27 | Anahit Vanetsyan  | 2011-10-01 |
|       26 | Katerina Smirnova | 2012-02-01 |
+----------+-------------------+------------+

Now suppose that we want to know the position of the member
Anahit Vanetsyan in the list of Russian members. We
can see easily from the results just shown that she is the third member
from Russia to join. That’s because there are very few results here.
Imagine if the results contained hundreds of names. We could use
FIND_IN_SET() with a subquery to determine this:

SELECT FIND_IN_SET('Anahit Vanetsyan', Names) AS Position
  FROM
    (SELECT GROUP_CONCAT(Name ORDER BY join_date) AS Names
     FROM
       ( SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,
         join_date
         FROM humans
         WHERE country_id = 'ru')
       AS derived_1 )
    AS derived_2;

+----------+
| Position |
+----------+
|        2 |
+----------+

This is a pretty complex SQL statement. The innermost
SELECT is essentially the query we saw earlier, but
returning just the full name and join date for each Russian person.
These results are fed to GROUP_CONCAT, which produces a single huge string containing all the names.
The outermost SELECT finds the name we want and returns its
position.

Note

When you put a SELECT statement inside parentheses
and derive a table from it that you will use with an outer statement,
you must give that derived table a name using AS. For
naming simplicity, we’ve named the derived tables in this chapter
derived_1 and derived_2. Almost any unique
name is fine.

The statement can be useful if we associate it with a user profile
page on the Rookery website. We might want to use it to show members
where they rank in different lists, such as most sightings of birds or
most sightings of birds in a particular category.

FIND_IN_SET() returns 0 if the string is not found
in the set or if the string list is empty. It returns NULL if the value
of either argument is NULL.

String Lengths

There will be times you want to know how long a string is. There are a
few functions that return the character length of a string. This can be
useful when adjusting formatting or making other decisions related to a
string, and they are commonly used with functions like
LOCATE() and SUBSTRING().

The CHAR_LENGTH() or
CHARACTER_LENGTH() function returns the number of characters in a string. This could
be helpful when different rows have different-length strings in a
particular column.

For instance, suppose we want to display on the Rookery website a
list of the birds most recently sighted by members, as recorded in the
bird_sightings table. We’ll include the common and
scientific name and other information about the bird species. Suppose
that we want to also include the comments that the member
entered when they recorded the sighting. Because this column can contain
a lot of text, we want to know how many characters it contains when
displaying it. If there’s too much (i.e., more than 100 characters),
we’ll limit the text and include a link on the web page to view all of
the text. To check the length, we could construct an SQL statement like
this that would be part of a program:

SELECT IF(CHAR_LENGTH(comments) > 100), 'long', 'short')
FROM bird_sightings
WHERE sighting_id = 2;

Here we’re using CHAR_LENGTH() to count the number
of characters in the comments column for the row selected.
We’re using the IF() function to determine whether the
character length of the comments is greater than 100 characters. If it
is, the function will return the word long. If not, it will
return short. If this SQL statement was used in an API
script, the value in the WHERE clause for the
sighting_id could be dynamically replaced for each bird
sighting.

CHAR_LENGTH() understands the character set in
current use, as we touched on in Creating a Database. Characters that take up
multiple bytes—usually present in Asian languages—are still considered
one character. In contrast, the LENGTH() function returns the number of
bytes in a given string. Note that there are eight bits to a byte and
that Western languages normally use one byte for each letter. If you
want to count the number of bits, use the BIT_LENGTH()
function.

As an example, suppose we notice that the comments
column of the bird_sightings table contains some odd binary
characters. They have been entered into the column through the mobile
application we provide to members. To narrow the list of rows that have
these odd characters so that we can remove them, we can execute the
following SQL statement:

SELECT sighting_id
FROM bird_sightings
WHERE CHARACTER_LENGTH(comments) != LENGTH(comments);

This will give us the sighting_id for the rows in
which the number of characters does not equal the number of bytes
in the comments column.

Comparing and Searching Strings

The previous subsection used the output of CHAR_LENGTH()
as input to an IF() statement so that we had a choice of
what to return. In this subsection, we’ll look at some functions that
compare strings, which can also be handy when used with a logical
function such as IF() or in a WHERE
clause.

Let’s consider a situation where we might use one of these
functions—specifically, the STRCMP() function. The name of the
function, in the manner much loved by computer programmers, is a
compressed version of “string compare.”

Email addresses are critical for communicating with members so we
decide to require new members to enter their email address twice during
the registration process to ensure accuracy. However, in case the
connection is lost in the process or the joining member does not correct
a problem with their email address, we want to keep both addresses until
they do. So we’ll add a row to the humans table to store
whatever information they give us, and then store both email addresses
in another table to compare them. For that comparison, we could use the
STRCMP() function in an SQL statement.

This scenario is the kind of situation that you would automate
with an API program, a program you would create to interface with MySQL
or MariaDB. It would store the SQL statements needed for processing the
information the new member enters from the website. To start the process
related to checking the email, we might create a table that will store
the member’s identification number and the two email addresses. We could
do that like so:

CREATE TABLE possible_duplicate_email
(human_id INT,
email_address1 VARCHAR(255),
email_address2 VARCHAR(255),
entry_date datetime );

Now when new members register, after their information has been
stored in the humans table, our web interface can store
conditionally the two email addresses provided in the
possible_duplicate_email table. It might look like
this:

INSERT IGNORE INTO possible_duplicate_email
(human_id, email_address_1, email_address_2, entry_date)
VALUES(LAST_INSERT_ID(), 'bobyfischer@mymail.com', 'bobbyfischer@mymail.com')
WHERE ABS( STRCMP('bobbyrobin@mymail.com', 'bobyrobin@mymail.com') ) = 1 ;

For the email addresses, I’ve displayed the plain text. But in a
more realistic example, this SQL statement might be embedded in a PHP
script and would refer to variables (e.g., $email_1 and
$email_2) where the email addresses are here.

Using the STRCMP() in the WHERE clause,
if the email addresses match, STRCMP() returns
0. If the addresses don’t match, it will return 1 or -1. It returns -1
if the first value is alphabetically before the second. To allow for
that possibility, we put it inside of ABS(), which changes the value to the absolute value—it makes
negative values positive. So, if the two email addresses don’t match,
the statement will insert the addresses into the
possible_duplicate_email table for an administrator to
review. Incidentally, that would normally return an error message, but
IGNORE flag tells MySQL to ignore errors.

Another comparison function is MATCH() AGAINST(), which searches for a
string and returns matching rows from the table. It even ranks the rows
by relevance, but that is beyond the scope of this chapter. Among the
complications of MATCH() AGAINST(), it works only on
columns that have been indexed with a special FULLTEXT index. To test this
function, we’ll first add a FULLTEXT index to the
bird_sightings table, basing it on the
comments column because that’s a TEXT
column:

CREATE FULLTEXT INDEX comment_index
ON bird_sightings (comments);

Now you can use MATCH() AGAINST(). It is commonly
found in WHERE clauses as a condition to find
columns containing a given string. Text in the given string, which is
delimited by spaces or quotes, is parsed into separate words. Small
words (i.e., three characters or fewer) are generally ignored. Here is
an example:

SELECT CONCAT(name_first, SPACE(1), name_last) AS Name,
common_name AS Bird,
SUBSTRING(comments, 1, 25) AS Comments
FROM birdwatchers.bird_sightings
JOIN birdwatchers.humans USING(human_id)
JOIN rookery.birds USING(bird_id)
WHERE MATCH (comments) AGAINST ('beautiful');

+-------------------+-----------------+---------------------------+
| Name              | Bird            | Comments                  |
+-------------------+-----------------+---------------------------+
| Elena Bokova      | Eskimo Curlew   | It was a major effort get |
| Katerina Smirnova | Eurasian Curlew | Such a beautiful bird. I  |
+-------------------+-----------------+---------------------------+

In the WHERE clause, we’re able now to match the
comments column against the string beautiful.
The comments column from the
birdwatchers.bird_sightings is combined in the results with
three other columns: common_name from
rookery.birds and name_first and
name_last from birdwatchers.humans.

We’re using the SUBSTRING function to limit the
amount of text displayed. This cuts off the text abruptly. You could use
the CONCAT() function to append ellipses to indicate there is more text. You
might also use the IF() function to determine whether there is more text before appending
ellipses. There are other functions you can use for locating the
beautiful within the column so that you can display only
the text around it. We’ll cover that kind of function later in this
chapter.

Replacing and Inserting into Strings

If you want to insert or replace certain text from a column (but not all of its
contents), you could use the INSERT() function. Don’t
confuse this with the INSERT statement. The syntax of this
function consists of the string or column into which you want to insert
text, followed by the position in which to insert text. You may specify
also how much text to delete from that point, if you want. Finally, you
give the text to insert. Let’s look at some examples of this
function.

We’ll start with a simple example. Suppose that on a page of the
Rookery site, we are thinking of adding some text to the common names of
bird species with the word Least in their name. We
want to explain that it means Smallest, so that
uninformed birders don’t think it means these birds are the least
important. To test this, we enter this SQL statement:

SELECT INSERT(common_name, 6, 0, ' (i.e., Smallest)')
AS 'Smallest Birds'
FROM birds
WHERE common_name LIKE 'Least %' LIMIT 1;

+------------------------------+
| Smallest Birds               |
+------------------------------+
| Least (i.e., Smallest) Grebe |
+------------------------------+

The first argument is the column containing the string we’re
manipulating. The second argument is the starting point for inserting
text. Based on the WHERE clause, we’re looking for common
names that start with Least. That’s 5 characters.
We add 1 to that because the starting point for INSERT is
1. The third argument specifies how many characters after the starting
point should be replaced. In this case, we’re just inserting text, not
replacing any.

The SQL statement uses INSERT() to change the
results set, not the data in the table. So we could use the
INSERT() function to display the common names like this to
new members for the first month who have identified themselves as new to
bird-watching. We would have to construct a more complex SQL statement
to check who is new, but this example shows you how to insert text
within a string. Let’s look now at an example in which we will replace
data using INSERT().

Suppose we discover that parts of some of the common bird species
names are abbreviated in the birds table (e.g.,
Great is abbreviated as Gt.).
We prefer not to have any abbreviations for the common names. Before
changing the data, we’ll execute a SELECT statement to test
our use of the INSERT() function:

SELECT common_name AS Original,
INSERT(common_name, LOCATE('Gt.', common_name), 3, 'Great') AS Adjusted
FROM birds
WHERE common_name REGEXP 'Gt.' LIMIT 1;

+------------------+--------------------+
| Original         | Adjusted           |
+------------------+--------------------+
| Gt. Reed-Warbler | Great Reed-Warbler |
+------------------+--------------------+

We’ve already reviewed the arguments of the INSERT()
function in the previous example. The extra twist here is in the second
argument, which contains the LOCATE(). We’re using that function to
determine the position in the string where text is to be replaced. In
the previous example, we assumed that the common name would start with
the string we wanted to modify. In this case, we’re not assuming the
position of the string within the column. Instead, we’re letting MySQL
find it for us.

Another difference in this example is the third element: we’re
telling the function to replace three characters (i.e., the length of
Gt.) from the starting point with the text given
for the fourth argument (i.e., Great). Although the
text we’re adding is more than three characters, it’s fine because when
we update the table later, we’re updating a column with plenty of space
to hold the results.

If LOCATE() does not find the string we give it, it
returns 0. A value of 0 for the position in the INSERT()
function negates it and returns the value of
common_name unchanged. So with this usage of
INSERT(), because of the inclusion of
LOCATE() for the starting location, the WHERE
clause is unnecessary—except to see that it works where we want it
to.

Now that we’ve verified that our combination of functions works
correctly, we can update the data by entering the following SQL
statement:

UPDATE birds
SET common_name = INSERT(common_name, LOCATE('Gt.', common_name), 3, 'Great')
WHERE common_name REGEXP 'Gt.';

There is an alternative to using INSERT() for
replacing text in a string. In the previous example, we had to use the
LOCATE() function to determine the location of the text
where we wanted to insert text and we had to tell it how many characters
to replace. A simpler function for replacing text is
REPLACE(). We could use this function to replace all occurrences of
Gt. with Great in the
common_name column. Let’s test that with a SELECT statement like so:

SELECT common_name AS Original,
REPLACE(common_name, 'Gt.', 'Great') AS Replaced
FROM birds
WHERE common_name REGEXP 'Gt.' LIMIT 1;

+------------------+--------------------+
| Original         | Replaced           |
+------------------+--------------------+
| Gt. Reed-Warbler | Great Reed-Warbler |
+------------------+--------------------+

This works much better. We can use the REPLACE()
with the arguments we have here and enter the following
UPDATE to change the data in the table:

UPDATE birds
SET common_name = REPLACE(common_name, 'Gt.', 'Great');

Query OK, 8 rows affected (0.23 sec)
Rows matched: 28891  Changed: 8  Warnings: 0

Notice that we didn’t include the WHERE clause, but
the results message says that only eight rows were changed. This is
because there were only eight rows that contained
Gt. in the common_name column.
Updating data in a table with that many rows is intimidating and
dangerous without a WHERE clause. That’s why it’s good to
use them and to test the parameters with a SELECT statement first.

Converting String Types

There may be times when you will have to work with tables created by
people who might not have made the best choices for column data types.
Sometimes you can alter the tables, but sometimes you may not be allowed
to do so. For manipulating data from such tables or for importing data
from them, you can use the CAST() or CONVERT()
functions to change the data type of columns. The effect just takes place
within your SQL statement, not the database itself. Let’s look at some
examples of how and why you might use these two functions, which are
basically synonymous except for a minor syntax difference.

Suppose we’re given a table containing images of birds in a
particular area, showing female, male, and juvenile color patterns. One of
the columns contains numbers for ordering birds based loosely on the type
of bird and the date when usually seen in the area. This column isn’t a
numeric data type like INT, but is CHAR. When we
sort the data based on this column, MySQL will sort the rows lexically,
not numerically. Here’s an example of how that might look:

SELECT sorting_id, bird_name, bird_image
FROM bird_images
ORDER BY sorting_id
LIMIT 5;

+------------+-----------------+----------------------------+
| sorting_id | bird_name       | bird_image                 |
+------------+-----------------+----------------------------+
| 11         | Arctic Loon     | artic_loon_male.jpg        |
| 111        | Wilson's Plover | wilson_plover_male.jpg     |
| 112        | Wilson's Plover | wilson_plover_female.jpg   |
| 113        | Wilson's Plover | wilson_plover_juvenile.jpg |
| 12         | Pacific Loon    | pacific_loon_male.jpg      |
+------------+-----------------+----------------------------+

Notice that the rows with a sorting_id starting with
11 n are listed before one with the value of 12.
That’s because MySQL is reading the data as characters and not numbers.
The two Loons should be together, before the Plovers are listed.

We can use the CAST() function to cast the values
taken from sorting_id into the INT data
type:

SELECT sorting_id, bird_name, bird_image
FROM bird_images ORDER BY CAST(sorting_id AS INT) LIMIT 5;

+------------+-----------------+----------------------------+
| sorting_id | bird_name       | bird_image                 |
+------------+-----------------+----------------------------+
| 11         | Arctic Loon     | artic_loon_male.jpg        |
| 12         | Pacific Loon    | pacific_loon_male.jpg      |
| 111        | Wilson's Plover | wilson_plover_male.jpg     |
| 112        | Wilson's Plover | wilson_plover_female.jpg   |
| 113        | Wilson's Plover | wilson_plover_juvenile.jpg |
+------------+-----------------+----------------------------+

That worked correctly. Let’s suppose now that we don’t want to use
sorting_id, but instead the gender_age column.
This is an ENUM column specifying that the image file is for a male,
female, or a juvenile. The color patterns of
most birds deviate based on these factors. Let’s see how the results will
look if we sort based on this column:

SELECT bird_name, gender_age, bird_image
FROM bird_images
WHERE bird_name LIKE '%Plover%'
ORDER BY gender_age
LIMIT 5;

+-----------------+------------+----------------------------+
| bird_name       | gender_age | bird_image                 |
+-----------------+------------+----------------------------+
| Wilson's Plover | male       | wilson_plover_male.jpg     |
| Snowy Plover    | male       | snowy_plover_male.jpg      |
| Wilson's Plover | female     | wilson_plover_female.jpg   |
| Snowy Plover    | female     | snowy_plover_female.jpg    |
| Wilson's Plover | juvenile   | wilson_plover_juvenile.jpg |
+-----------------+------------+----------------------------+

Notice that the rows are grouped together based on the
gender_age column, but those values are not in alphabetical
order (i.e., female rows should be before
male rows). This is because of how the enumerated
values are listed in the gender_age column:

SHOW COLUMNS FROM bird_images LIKE 'gender_age' \G

*************************** 1. row ***************************
  Field: gender_age
   Type: enum('male','female','juvenile')
   Null: YES
    Key:
Default: NULL
  Extra:

To MySQL, the value of male for the
gender_age column is stored as 1, and female as
2. This controls the order of the display, even though the values are
rendered as text. If we use though the CAST() or the
CONVERT() function in the ORDER BY clause, MySQL will sort the
results based on their rendered values and not their column values. Here’s
how that would look:

SELECT bird_name, gender_age, bird_image
FROM bird_images
WHERE bird_name LIKE '%Plover%'
ORDER BY CONVERT(gender_age, CHAR)
LIMIT 5;

+-----------------+------------+----------------------------+
| bird_name       | gender_age | bird_image                 |
+-----------------+------------+----------------------------+
| Wilson's Plover | female     | wilson_plover_female.jpg   |
| Snowy Plover    | female     | snowy_plover_female.jpg    |
| Wilson's Plover | juvenile   | wilson_plover_juvenile.jpg |
| Snowy Plover    | juvenile   | snowy_plover_juvenile.jpg  |
| Wilson's Plover | male       | wilson_plover_male.jpg     |
+-----------------+------------+----------------------------+

Notice that for the CONVERT() function, a comma is
used to separate the string given from the data type instead of the
AS keyword. The data type given as the second argument can
be BINARY, CHAR,
DATE, DATETIME, SIGNED [INTEGER],
TIME, or UNSIGNED [INTEGER]. BINARY
converts a string to a binary string. You can add also CHARACTER
SET
to use a different character set from the default for the value
given. To convert the character set of a given string to another, you have
to use the USING option, like so:

SELECT bird_name, gender_age, bird_image
FROM bird_images
WHERE bird_name LIKE '%Plover%'
ORDER BY CONVERT(gender_age USING utf8)
LIMIT 5;

Compressing Strings

Some column data types allow large amounts of data. For instance,
the BLOB column can store plenty. To reduce the
size of tables that use this column data type, you can compress the data
it contains when inserting the data. The COMPRESS() function
compresses a string and the UNCOMPRESS()
function decompresses a compressed string. If you want to use them,
MySQL has to have been compiled with a compression library (i.e.,
zlib). If it wasn’t, a NULL value will be returned when using
COMPRESS(). Let’s look at some examples of their use.

The humans table has a column for
birding_background which is a BLOB. Members can
write as much as they like about themselves, which could result in pages
of information on their experiences and education as bird-watchers. This
could potentially slow down queries and updates if many members do this.
So we decide to use COMPRESS() to compress the member’s
background when inserting it into the humans table. Here’s
how that might look:

INSERT INTO humans
(formal_title, name_first, name_last, join_date, birding_background)
VALUES('Ms', 'Melissa', 'Lee', CURDATE(), COMPRESS("lengthy background..."));

This SQL statement inserts a new member’s information into the
humans table—it has more columns than shown here, but we’re
trying to keep this example simple. The statement uses the
COMPRESS() function to compress the background information
given (which isn’t much for this simple example). You would normally get
such data from an API variable using something like PHP to store text
entered by the user through a web page. So instead of the text shown here,
you would use a variable (e.g.,
$birding_background).

To see how the data looks in the compressed form, we could do
this:

SELECT birding_background AS Background
FROM humans
WHERE name_first = 'Melissa' AND name_last = 'Lee' \G

*************************** 1. row ***************************
Background:    x#####/ɨTHJL##/######## Z######

Notice that the results are not normal text. The mysql client substitutes a hash sign
( #) for binary values. In order to see the text contained in
this compressed format, we would use UNCOMPRESS(). It returns
NULL if the string is not compressed or if MySQL wasn’t compiled with
zlib:

SELECT UNCOMPRESS(birding_background) AS Background
FROM humans
WHERE name_first = 'Melissa' AND name_last = 'Lee' \G

*************************** 1. row ***************************
Background: lengthy background...

For small amounts of text like this, compression takes more space
than the plain text. But for large amounts of text, it will save plenty of
space. So use it sparingly and where appropriate.

Summary

There are more string functions available in MySQL and MariaDB. A
few of the functions mentioned here have aliases or close alternatives.
There are also functions for converting between ASCII, binary,
hexadecimal, and octal strings. And there are also string functions
related to text encryption and decryption that were not mentioned.
However, I think this chapter has given you a good collection of common
string functions that will assist you in building more powerful SQL
statements and formatting results to be more attractive.

Exercises

String functions are very necessary to developing databases in MySQL
and MariaDB. You need to know them well. To become an expert, you need to
practice using them, so be sure to complete all of the following
exercises.

  1. One of the most commonly used string functions is
    CONCAT(). Construct a SELECT statement to
    query the humans table. Use the CONCAT()
    function to merge together values from the name_first
    column with the name_last column. Use the
    SPACE() function to put a space between them in the
    results. Give that field an alias of Full Name—and
    remember to put quotes around this alias, as it contains a space.
    Limit the results to four people. Execute it to be sure it has no
    errors.

    Add a WHERE clause to that SELECT
    statement. For the condition of the WHERE clause, copy
    the CONCAT() you just assembled. List rows where the
    name is in a set of the following names: Lexi Hollar, Michael
    Zabalaoui, and Rusty Johnson.

    After you successfully execute the SELECT with that
    WHERE clause, add an ORDER BY clause to sort
    the data based on the concatenated name. Do it without using
    CONCAT().

  2. Construct a SELECT statement that selects, from the
    birds table, the common_name and the
    scientific_name. Use a string function to change the
    scientific_name to all lowercase letters. Use the
    CONCAT() function to put them into one field, with a
    space after the common name, followed by the scientific name in
    parentheses—for example, African Desert Warbler (sylvia
    deserti)
    . Don’t use the SPACE() function.
    Instead, put the spaces and parentheses within single quote marks
    within the CONCAT(). Give the
    resulting field an alias of Bird Species. Limit the
    results to 10 rows.

    After you’ve successfully executed that SQL statement, modify
    that statement to join in the bird_families and the
    bird_orders tables. The JOIN statement was
    covered extensively in Unifying Results.
    Then add the scientific_name columns from both of these
    tables to the fields returned.

    Execute this modified statement to make sure your joins are
    correct. When they are, move the scientific_name columns
    for the two additional tables into the CONCAT(). Using
    the RPAD() function, put dots after the bird species
    name, before the bird family and the bird order names. The results for
    a field will look like this:

    Speckled Warbler (pyrrholaemus sagittatus)...Acanthizidae...Passeriformes

    This will probably require you to use CONCAT()
    twice. Use a WHERE clause to list only Warblers. Limit
    the results to 10 rows.

  3. Construct another SELECT statement to list all of
    the common names of bird species from the birds table,
    where the common name contains the word Shrike.
    When you execute that statement you should see some names with a
    hyphen after the word Shrike. Add the
    REPLACE() function to the SELECT statement
    to replace those hyphens with a space in the results, and then execute
    the SQL statement again.

  4. Some of the names of the birds in the results from the
    SELECT statement in the previous exercise have more than
    one hyphen (e.g., Yellow-browed Shrike-Vireo).
    Redo that SQL statement to replace only the hyphens after the word
    Shrike (e.g., to look like this:
    Yellow-browed Shrike Vireo). In order to do this,
    use LOCATE() with REPLACE(). You will need
    to use LOCATE() twice: one within another.

  5. True Shrikes are of the Laniidae family.
    Construct another SELECT to select the common bird names
    with the word Shrike, but belonging to
    Laniidae. This will require a join to the
    bird_families table. Use one of the substring functions
    like SUBSTRING() to extract the words before
    Shrike. To do this, you will need to use
    LOCATE() or a similar function. Then use
    CONCAT() to display that value extracted after
    Shrike with a comma and space in between. The
    results for each field should look like this: Shrike,
    Rufous-tailed
    . Give the field an alias of
    Shrikes.

  6. The humans table contains entries in which the member used
    either all lowercase letters or all uppercase letters to enter their
    first and last names (e.g., andy oram and MICHAEL
    STONE
    ). Use UPDATE to change the names to title
    case (i.e., the first letter capital and the rest lowercase). First
    experiment with SELECT to make sure you have the
    functions organized properly. Use the UCASE() and
    LCASE() functions to set the cases. You will need to use
    SUBSTRING() or a similar function a few times, and
    CONCAT() a couple of times.

Comments are closed.