11.3.2 Regular Expressions
Regular expressions (regexps) provide much more powerful ways of dealing with text. Although most beginning Emacs users tend to avoid commands that use regexps, like replace-regexpand re-search-forward, regular expressions are widely used within Lisp code. Such modes as Dired and the programming language modes would be unthinkable without them. Regular expressions require time and patience to become comfortable with, but doing so is well worth the effort for Lisp programmers, because they are one of the most powerful features of Emacs, and many things are not practical to implement in any other way.
One trick that can be useful when you are experimenting with regular expressions and trying to get the hang of them is to type some text into a scratch buffer that corresponds to what you're trying to match, and then use isearch-forward-regexp( C-M-s) to build up the regular expression. The interactive, immediate feedback of an incremental search can show you the pieces of the regular expression in action in a way that is completely unique to Emacs.
We introduce the various features of regular expressions by way of a few examples of search-and-replace situations; such examples are easy to explain without introducing lots of extraneous details. Afterward, we describe Lisp functions that go beyond simple search-and-replace capabilities with regular expressions. The following are examples of searching and replacing tasks that the normal search/replace commands can't handle or handle poorly:
• You are developing code in C, and you want to combine the functionality of the functions read
and readfile
into a new function called get
. You want to replace all references to these functions with references to the new one.
• You are writing a troff document using outline mode, as described in Chapter 7. In outline mode, headers of document sections have lines that start with one or more asterisks. You want to write a function called remove-outline-marksto get rid of these asterisks so that you can run troff on your file.
• You want to change all occurrences of program in a document, including programs and program's , to module / modules / module's , without changing programming to moduleming or programmer to modulemer .
• You are working on documentation for some C software that is being rewritten in Java. You want to change all the filenames in the documentation from .c to .java , since .java is the extension the javac compiler uses.
• You just installed a new C++ compiler that prints error messages in German. You want to modify the Emacs compilepackage so that it can parse the error messages correctly (see the end of Chapter 9).
We will soon show how to use regular expressions to deal with these examples, which we refer to by number. Note that this discussion of regular expressions, although more comprehensive than that in Chapter 3 Chapter 3. Search and Replace The commands we discussed in the first two chapters are enough to get you started, but they're certainly not enough to do any serious editing. If you're using Emacs for anything longer than a few paragraphs, you'll want the support this chapter describes. In this chapter, we cover the various ways that Emacs lets you search for and replace text. Emacs provides the traditional search and replace facilities you would expect in any editor; it also provides several important variants, including incremental searches, regular expression searches, and query-replace. We also cover spell-checking here, because it is a type of replacement (errors are sought and replaced with corrections). Finally, we cover word abbreviation mode; this feature is a type of automatic replacement that can be a real timesaver.
, does not cover every feature; those that it doesn't cover are redundant with other features or relate to concepts that are beyond the scope of this book. It is also important to note that the regular expression syntax described here is for use with Lisp strings only; there is an important difference between the regexp syntax for Lisp strings and the regexp syntax for user commands (like replace-regexp), as we will see.
Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are much the same everywhere. You probably already know a subset of regular expression notation: the wildcard characters used by the Unix shell or Windows command prompt to match filenames. The Emacs notation is a bit different; it is similar to those used by the language Perl, editors like ed and vi and Unix software tools like lex and grep. So let's start with the Emacs regular expression operators that resemble Unix shell wildcard character, which are listed in Table 11-5.
Table 11-5. Basic regular expression operators
Emacs operator |
Equivalent |
Function |
. |
? |
Matches any character. |
.* |
* |
Matches any string. |
[abc] |
[abc] |
Matches a , b , or c . |
[a-z] |
[a-z] |
Matches any lowercase letter. |
For example, to match all filenames beginning with program in the Unix shell, you would specify program*
. In Emacs, you would say program.*
. To match all filenames beginning with a through e in the shell, you would use [a-e]*
or [abcde]*
; in Emacs, it's [a-e].*
or [abcde].*
. In other words, the dash within the brackets specifies a range of characters. [78]We will provide more on ranges and bracketed character sets shortly.
To specify a character that is used as a regular expression operator, you need to precede it with a double-backslash, as in \\*
to match an asterisk. Why a double backslash? The reason has to do with the way Emacs Lisp reads and decodes strings. When Emacs reads a string in a Lisp program, it decodes the backslash-escaped characters and thus turns double backslashes into single backslashes. If the string is being used as a regular expression—that is, if it is being passed to a function that expects a regular expression argument—that function uses the single backslash as part of the regular expression syntax. For example, given the following line of Lisp:
(replace-regexp "fred\\*" "bob*")
the Lisp interpreter decodes the string fred\\*
as fred\*
and passes it to the replace-regexpcommand. The replace-regexpcommand understands fred\*
to mean fred
followed by a (literal) asterisk. Notice, however, that the second argument to replace-regexpis not a regular expression, so there is no need to backslash-escape the asterisk in bob*
at all. Also notice that if you were to invoke the this as a user command, you would not need to double the backslash, that is, you would type M-x replace-regexp Enterfollowed by fred\*and bob*. Emacs decodes strings read from the minibuffer differently.
The *
regular expression operator in Emacs (by itself) actually means something different from the *
in the Unix shell: it means "zero or more occurrences of whatever is before the *
." Thus, because . matches any character, .*
means "zero or more occurrences of any character," that is, any string at all, including the empty string. Anything can precede a *
: for example, read*
matches "rea" followed by zero or more d's; file[0-9]*
matches "file" followed by zero or more digits.
Читать дальше