loading...

JavaScript – The RegExp Object

As mentioned at the beginning of this chapter, regular
expressions are represented as RegExp objects. In addition to the
RegExp() constructor, RegExp
objects support three methods and a number of properties. RegExp
pattern-matching methods and properties are described in the next two
sections.

The RegExp() constructor
takes one or two string arguments and creates a new RegExp object. The
first argument to this constructor is a string that contains the body
of the regular expression—the text that would appear within slashes in
a regular-expression literal. Note that both string literals and
regular expressions use the \
character for escape sequences, so when you pass a regular expression
to RegExp() as a string literal,
you must replace each \ character
with \\. The second argument to
RegExp() is optional. If supplied,
it indicates the regular-expression flags. It should be g, i,
m, or a combination of those
letters.

For example:

// Find all five-digit numbers in a string. Note the double \\ in this case.
var zipcode = new RegExp("\\d{5}", "g");

The RegExp() constructor is
useful when a regular expression is being dynamically created and thus
cannot be represented with the regular-expression literal syntax. For
example, to search for a string entered by the user, a regular
expression must be created at runtime with RegExp().

RegExp Properties

Each RegExp object has five properties. The source property is a read-only string that
contains the text of the regular expression. The global property is a read-only boolean
value that specifies whether the regular expression has the g flag. The ignoreCase property is a read-only boolean
value that specifies whether the regular expression has the i flag. The multiline property is a read-only boolean
value that specifies whether the regular expression has the m flag. The final property is lastIndex, a read/write integer. For
patterns with the g flag, this
property stores the position in the string at which the next search
is to begin. It is used by the exec() and test() methods, described below.

RegExp Methods

RegExp objects define two methods that perform
pattern-matching operations; they behave similarly to the String
methods described earlier. The main RegExp pattern-matching method
is exec(). It is similar to the
String match() method described
in String Methods for Pattern Matching, except that it is a RegExp
method that takes a string, rather than a String method that takes a
RegExp. The exec() method
executes a regular expression on the specified string. That is, it
searches the string for a match. If it finds none, it returns
null. If it does find one,
however, it returns an array just like the array returned by the
match() method for nonglobal
searches. Element 0 of the array contains the string that matched
the regular expression, and any subsequent array elements contain
the substrings that matched any parenthesized subexpressions.
Furthermore, the index property
contains the character position at which the match occurred, and the
input property refers to the
string that was searched.

Unlike the match() method,
exec() returns the same kind of
array whether or not the regular expression has the global g flag. Recall that match() returns an array of matches when
passed a global regular expression. exec(), by contrast, always returns a
single match and provides complete information about that match.
When exec() is called on a
regular expression that has the g
flag, it sets the lastIndex
property of the regular-expression object to the character position
immediately following the matched substring. When exec() is invoked a second time for the
same regular expression, it begins its search at the character
position indicated by the lastIndex property. If exec() does not find a match, it resets
lastIndex to 0. (You can also set
lastIndex to 0 at any time, which
you should do whenever you quit a search before you find the last
match in one string and begin searching another string with the same
RegExp object.) This special behavior allows you to call exec() repeatedly in order to loop through
all the regular expression matches in a string. For example:

var pattern = /Java/g;
var text = "JavaScript is more fun than Java!";
var result;
while((result = pattern.exec(text)) != null) {
    alert("Matched '" + result[0] + "'" +
          " at position " + result.index +
          "; next search begins at " + pattern.lastIndex);
}

The other RegExp method is test(). test() is a much simpler method than
exec(). It takes a string and
returns true if the string
contains a match for the regular expression:

var pattern = /java/i;
pattern.test("JavaScript");  // Returns true

Calling test() is
equivalent to calling exec() and
returning true if the return
value of exec() is not null. Because of this equivalence, the
test() method behaves the same
way as the exec() method when
invoked for a global regular expression: it begins searching the
specified string at the position specified by lastIndex, and if it finds a match, it
sets lastIndex to the position of
the character immediately following the match. Thus, you can loop
through a string using the test()
method just as you can with the exec() method.

The String methods search(), replace(), and match() do not use the lastIndexproperty as exec() and test() do. In fact, the String methods
simply reset lastIndex to 0. If
you use exec() or test() on a pattern that has the g flag set, and you are searching multiple
strings, you must either find all the matches in each string so that
lastIndex is automatically reset
to zero (this happens when the last search fails), or you must
explicitly set the lastIndex
property to 0 yourself. If you forget to do this, you may start
searching a new string at some arbitrary position within the string
rather than from the beginning. If your RegExp doesn’t have the
g flag set, then you don’t have
to worry about any of this, of course. Keep in mind also that in
ECMAScript 5 each evaluation of a regular expression literal creates
a new RegExp object with its own lastIndex property, and this reduces the
risk of accidentally using a “leftover” lastIndex value.

Comments are closed.

loading...