Two operators are closely related to *
. The first is +
, which matches one or more occurrences of whatever precedes it. Thus, read+
matches "read" and "readdddd" but not "rea," and file[0-9]+
requires that there be at least one digit after "file." The second is ?
, which matches zero or one occurrence of whatever precedes it (i.e., makes it optional). html?
matches "htm" or "html," and file[0-9]?
matches "file" followed by one optional digit.
Before we move on to other operators, a few more comments about character sets and ranges are in order. First, you can specify more than one range within a single character set. The set [A-Za-z]
can thus be used to specify all alphabetic characters; this is better than the nonportable [A-z]
. Combining ranges with lists of characters in sets is also possible; for example, [A-Za-z_]
means all alphabetic characters plus underscore, that is, all characters allowed in the names of identifiers in C. If you give ^
as the first character in a set, it acts as a "not" operator; the set matches all characters that aren't the characters after the ^
. For example, [^A-Za-z]
matches all nonalphabetic characters.
A ^
anywhere other than first in a character set has no special meaning; it's just the caret character. Conversely, -
has no special meaning if it is given first in the set; the same is true for ]
. However, we don't recommend that you use this shortcut; instead, you should double-backslash-escape these characters just to be on the safe side. A double backslash preceding a nonspecial character usually means just that character—but watch it! A few letters and punctuation characters are used as regular expression operators, some of which are covered in the following section. We list "booby trap" characters that become operators when double-backslash-escaped later. The ^
character has a different meaning when used outside of ranges, as we'll see soon.
11.3.2.2 Grouping and alternation
If you want to get *
, +
, or ?
to operate on more than one character, you can use the \\(
and \\)
operators for grouping. Notice that, in this case (and others to follow), the backslashes are part of the operator. (All of the nonbasic regular expression operators include backslashes so as to avoid making too many characters "special." This is the most profound way in which Emacs regular expressions differ from those used in other environments, like Perl, so it's something to which you'll need to pay careful attention.) As we saw before, these characters need to be double-backslash-escaped so that Emacs decodes them properly. If one of the basic operators immediately follows \\)
, it works on the entire group inside the \\(
and \\)
. For example, \\(read\\)*
matches the empty string, "read," "readread," and so on, and read\\(file\\)?
matches "read" or "readfile." Now we can handle Example 1, the first of the examples given at the beginning of this section, with the following Lisp code:
(replace-regexp "read\\(file\\)?" "get")
The alternation operator \\|
is a "one or the other" operator; it matches either whatever precedes it or whatever comes after it. \\|
treats parenthesized groups differently from the basic operators. Instead of requiring parenthesized groups to work with subexpressions of more than one character, its "power" goes out to the left and right as far as possible, until it reaches the beginning or end of the regexp, a \\(
, a \\)
, or another \\|
. Some examples should make this clearer:
• read\\|get
matches "read" or "get"
• readfile\\|read\\|get
matches "readfile", "read," or "get"
• \\(read\\|get\\)file
matches "readfile" or "getfile"
In the first example, the effect of the \\|
extends to both ends of the regular expression. In the second, the effect of the first \\|
extends to the beginning of the regexp on the left and to the second \\|
on the right. In the third, it extends to the backslash-parentheses.
Another important category of regular expression operators has to do with specifying the context of a string, that is, the text around it. In Chapter 3 Chapter 3. Search and Replace The commands we discussed in the first two chapters are enough to get you started, but they're certainly not enough to do any serious editing. If you're using Emacs for anything longer than a few paragraphs, you'll want the support this chapter describes. In this chapter, we cover the various ways that Emacs lets you search for and replace text. Emacs provides the traditional search and replace facilities you would expect in any editor; it also provides several important variants, including incremental searches, regular expression searches, and query-replace. We also cover spell-checking here, because it is a type of replacement (errors are sought and replaced with corrections). Finally, we cover word abbreviation mode; this feature is a type of automatic replacement that can be a real timesaver.
we saw the word-searchcommands, which are invoked as options within incremental search. These are special cases of context specification; in this case, the context is word-separation characters, for example, spaces or punctuation, on both sides of the string.
The simplest context operators for regular expressions are ^
and $
, two more basic operators that are used at the beginning and end of regular expressions respectively. The ^
operator causes the rest of the regular expression to match only if it is at the beginning of a line; $
causes the regular expression preceding it to match only if it is at the end of a line. In Example 2, we need a function that matches occurrences of one or more asterisks at the beginning of a line; this will do it:
(defun remove-outline-marks ( )
"Remove section header marks created in outline-mode."
(interactive)
(replace-regexp "^\\*+" ""))
This function finds lines that begin with one or more asterisks (the \\*
is a literal asterisk and the +
means "one or more"), and it replaces the asterisk(s) with the empty string "", thus deleting them.
Note that ^
and $
can't be used in the middle of regular expressions that are intended to match strings that span more than one line. Instead, you can put \n
(for Newline) in your regular expressions to match such strings. Another such character you may want to use is \t
for Tab. When ^
and $
are used with regular expression searches on strings instead of buffers, they match beginning- and end-of-string, respectively; the function string-match, described later in this chapter, can be used to do regular expression search on strings.
Here is a real-life example of a complex regular expression that covers the operators we have seen so far: sentence-end, a variable Emacs uses to recognize the ends of sentences for sentence motion commands like forward-sentence( M-e). Its value is:
"[.?!][]\"')}]*\\($\\|\t\\| \\)[ \t\n]*"
Let's look at this piece by piece. The first character set, [.?!]
, matches a period, question mark, or exclamation mark (the first two of these are regular expression operators, but they have no special meaning within character sets). The next part, []\"')}]*
, consists of a character set containing right bracket, double quote, single quote, right parenthesis, and right curly brace. A *
follows the set, meaning that zero or more occurrences of any of the characters in the set matches. So far, then, this regexp matches a sentence-ending punctuation mark followed by zero or more ending quotes, parentheses, or curly braces. Next, there is the group \\($\\|\t\\| \\)
, which matches any of the three alternatives $
(end of line), Tab
, or two spaces. Finally, [ \t\n]*
matches zero or more spaces, tabs, or newlines. Thus the sentence-ending characters can be followed by end-of-line or a combination of spaces (at least two), tabs, and newlines.
Читать дальше