There are other context operators besides ^
and $
; two of them can be used to make regular expression search act like word search. The operators \\<
and \\>
match the beginning and end of a word, respectively. With these we can go part of the way toward solving Example 3. The regular expression \\
matches "program" but not "programmer" or "programming" (it also won't match "microprogram"). So far so good; however, it won't match "program's" or "programs." For this, we need a more complex regular expression:
\\
This expression means, "a word beginning with program followed optionally by apostrophe s or just s." This does the trick as far as matching the right words goes.
11.3.2.4 Retrieving portions of matches
There is still one piece missing: the ability to replace "program" with "module" while leaving any s
or 's
untouched. This leads to the final regular expression feature we will cover here: the ability to retrieve portions of the matched string for later use. The preceding regular expression is indeed the correct one to give as the search string for replace-regexp. As for the replace string, the answer is module\\1
; in other words, the required Lisp code is:
(replace-regexp "\\" "module\\1")
The \\1
means, in effect, "substitute the portion of the matched string that matched the subexpression inside the \\(
and \\)
." It is the only regular-expression-related operator that can be used in replacements. In this case, it means to use 's
in the replace string if the match was "program's," s
if the match was "programs," or nothing if the match was just "program." The result is the correct substitution of "module" for "program," "modules" for "programs," and "module's" for "program's."
Another example of this feature solves Example 4. To match filenames .c and replace them with .java , use the Lisp code:
(replace-regexp "\\([a-zA-Z0-9_]+\\)\\.c" "\\1.java")
Remember that \\.
means a literal dot (.). Note also that the filename pattern (which matches a series of one or more alphanumerics or underscores) was surrounded by \\(
and \\)
in the search string for the sole purpose of retrieving it later with \\1
.
Actually, the \\1
operator is only a special case of a more powerful facility (as you may have guessed). In general, if you surround a portion of a regular expression with \\(
and \\)
, the string matching the parenthesized subexpression is saved. When you specify the replace string, you can retrieve the saved substrings with \\ n
, where n
is the number of the parenthesized subexpression from left to right, starting with 1. Parenthesized expressions can be nested; their corresponding \\ n
numbers are assigned in order of their \\(
delimiter from left to right.
Lisp code that takes full advantage of this feature tends to contain complicated regular expressions. The best example of this in Emacs's own Lisp code is compilation-error-regexp-alist, the list of regular expressions the compilepackage (discussed in Chapter 9) uses to parse error messages from compilers. Here is an excerpt, adapted from the Emacs source code (it's become much too long to reproduce in its entirety; see below for some hints on how to find the actual file to study in its full glory):
(defvar compilation-error-regexp-alist
'(
;; NOTE! See also grep-regexp-alist, below.
;; 4.3BSD grep, cc, lint pass 1:
;; /usr/src/foo/foo.c(8): warning: w may be used before set
;; or GNU utilities:
;; foo.c:8: error message
;; or HP-UX 7.0 fc:
;; foo.f :16 some horrible error message
;; or GNU utilities with column (GNAT 1.82):
;; foo.adb:2:1: Unit name does not match file name
;; or with column and program name:
;; jade:dbcommon.dsl:133:17:E: missing argument for function call
;;
;; We'll insist that the number be followed by a colon or closing
;; paren, because otherwise this matches just about anything
;; containing a number with spaces around it.
;; We insist on a non-digit in the file name
;; so that we don't mistake the file name for a command name
;; and take the line number as the file name.
("\\([a-zA-Z][-a-zA-Z._0-9]+: ?\\)?\
\\([a-zA-Z]?:?[^:( \t\n]*[^:( \t\n0-9][^:( \t\n]*\\)[:(][ \t]*\\([0-9]+\\)\
\\([) \t]\\|:\\(\\([0-9]+:\\)\\|[0-9]*[^:0-9]\\)\\)" 2 3 6)
;; Microsoft C/C++:
;; keyboard.c(537) : warning C4005: 'min' : macro redefinition
;; d:\tmp\test.c(23) : error C2143: syntax error : missing ';' before 'if'
;; This used to be less selective and allow characters other than
;; parens around the line number, but that caused confusion for
;; GNU-style error messages.
;; This used to reject spaces and dashes in file names,
;; but they are valid now; so I made it more strict about the error
;; message that follows.
("\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\)) \
: \\(error\\|warning\\) C[0-9]+:" 1 3)
;; Caml compiler:
;; File "foobar.ml", lines 5-8, characters 20-155: blah blah
("^File \"\\([^,\" \n\t]+\\)\", lines? \\([0-9]+\\)[-0-9]*, characters? \
\\([0-9]+\\)" 1 2 3)
;; Cray C compiler error messages
("\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR \\([^,\n]+, \\)* File = \
\\([^,\n]+\\), Line = \\([0-9]+\\)" 4 5)
;; Perl -w:
;; syntax error at automake line 922, near "':'"
;; Perl debugging traces
;; store::odrecall('File_A', 'x2') called at store.pm line 90
(".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]" 1 2)
;; See http://ant.apache.org/faq.html
;; Ant Java: works for jikes
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]\
+:" 1 2 3)
;; Ant Java: works for javac
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):" 1 2)
)
This is a list of elements that have at least three parts each: a regular expression and two numbers. The regular expression matches error messages in the format used by a particular compiler or tool. The first number tells Emacs which of the matched subexpressions contains the filename in the error message; the second number designates which of the subexpressions contains the line number. (There can also be additional parts at the end: a third number giving the position of the column number of the error, if any, and any number of format strings used to generate the true filename from the piece found in the error message, if needed. For more details about these, look at the actual file, as described below.)
For example, the element in the list dealing with Perl contains the regular expression:
".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]"
followed by 1 and 2, meaning that the first parenthesized subexpression contains the filename and the second contains the line number. So if you have Perl's warnings turned on—you always do, of course—you might get an error message such as this:
Читать дальше