JavaScript – String Methods for Pattern Matching

Until now, this chapter has discussed the grammar used to create
regular expressions, but it hasn’t examined how those regular
expressions can actually be used in JavaScript code. This section
discusses methods of the String object that use regular expressions to
perform pattern matching and search-and-replace operations. The
sections that follow this one continue the discussion of pattern
matching with JavaScript regular expressions by discussing the RegExp
object and its methods and properties. Note that the discussion that
follows is merely an overview of the various methods and properties
related to regular expressions. As usual, complete details can be
found in Part III.

Strings support four methods that use regular expressions. The
simplest is search(). This method
takes a regular-expression argument and returns either the character
position of the start of the first matching substring or −1 if there
is no match. For example, the following call returns 4:

"JavaScript".search(/script/i);

If the argument to search()
is not a regular expression, it is first converted to one by passing
it to the RegExp constructor.
search() does not support global
searches; it ignores the
g flag of its regular expression
argument.

The replace() method performs
a search-and-replace operation. It takes a regular expression as its
first argument and a replacement string as its second argument. It
searches the string on which it is called for matches with the
specified pattern. If the regular expression has the g flag set, the replace() method replaces all matches in the
string with the replacement string; otherwise, it replaces only the
first match it finds. If the first argument to replace() is a string rather than a regular
expression, the method searches for that string literally rather than
converting it to a regular expression with the RegExp() constructor, as search() does. As an example, you can use
replace() as follows to provide
uniform capitalization of the word “JavaScript” throughout a string of
text:

// No matter how it is capitalized, replace it with the correct capitalization
text.replace(/javascript/gi, "JavaScript");

replace() is more powerful
than this, however. Recall that parenthesized subexpressions of a
regular expression are numbered from left to right and that the
regular expression remembers the text that each subexpression matches.
If a $ followed by a digit appears
in the replacement string, replace() replaces those two characters with
the text that matches the specified subexpression. This is a very
useful feature. You can use it, for example, to replace straight
quotes in a string with curly quotes, simulated with ASCII
characters:

// A quote is a quotation mark, followed by any number of
// nonquotation-mark characters (which we remember), followed
// by another quotation mark.
var quote = /"([^"]*)"/g;
// Replace the straight quotation marks with curly quotes,
// leaving the quoted text (stored in $1) unchanged.
text.replace(quote, '“$1”');

The replace() method has
other important features as well, which are described in the String.replace() reference page in Part III. Most notably, the second argument to replace() can be a function that dynamically
computes the replacement string.

The match() method is the
most general of the String regular-expression methods. It takes a
regular expression as its only argument (or converts its argument to a
regular expression by passing it to the RegExp() constructor) and returns an array
that contains the results of the match. If the regular expression has
the g flag set, the method returns
an array of all matches that appear in the string. For
example:

"1 plus 2 equals 3".match(/\d+/g)  // returns ["1", "2", "3"]

If the regular expression does not have the g flag set, match() does not do a global search; it
simply searches for the first match. However, match() returns an array even when it does
not perform a global search. In this case, the first element of the
array is the matching string, and any remaining elements are the
parenthesized subexpressions of the regular expression. Thus, if
match() returns an array a, a[0]
contains the complete match, a[1]
contains the substring that matched the first parenthesized
expression, and so on. To draw a parallel with the replace() method, a[ n ] holds the contents of $ n.

For example, consider parsing a URL with the following
code:

var url = /(\w+):\/\/([\w.]+)\/(\S*)/;
var text = "Visit my blog at http://www.example.com/~david";
var result = text.match(url);
if (result != null) {
    var fullurl = result[0];   // Contains "http://www.example.com/~david"
    var protocol = result[1];  // Contains "http"
    var host = result[2];      // Contains "www.example.com"
    var path = result[3];      // Contains "~david"
}

It is worth noting that passing a nonglobal regular expression
to the match() method of a string
is actually the same as passing the string to the exec() method of the regular expression: the
returned array has index and
input properties, as described for
the exec() method below.

The last of the regular-expression methods of the String object
is split(). This method breaks the
string on which it is called into an array of substrings, using the
argument as a separator. For example:

"123,456,789".split(",");  // Returns ["123","456","789"]

The split() method can also
take a regular expression as its argument. This ability makes the
method more powerful. For example, you can now specify a separator
character that allows an arbitrary amount of whitespace on either
side:

"1, 2, 3, 4, 5".split(/\s*,\s*/); // Returns ["1","2","3","4","5"]

The split() method has other
features as well. See the String.split() entry in Part III for complete details.

Comments are closed.