?? _chapter 2.htm
字號:
<html>
<head>
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Chapter 2</title>
<link rel="stylesheet" type="text/css" href="docsafari.css">
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body><table width="100%" border="1" bgcolor="#EBEBFF"><tr><td width="5%" align="left" valign="middle"><a href="_chapter 1.htm"><img src="Larrow.gif" width="17" height="19" border="0"></a></td><td align="center" valign="middle"><a class="docLink" href="Front matter.htm">CONTENTS</a></td><td width="5%" align="right" valign="middle"><a href="_chapter 3.htm"><img src="Rarrow.gif" width="17" height="19" border="0"></a></td></tr></table>
<h2 class="docChapterTitle">Chapter 2. The UNIX Toolbox</h2><ul><li> <a class="docLink" href="#ch02lev1sec1">2.1 Regular Expressions</a></li>
<li> <a class="docLink" href="#ch02lev1sec2">2.2 Combining Regular Expression Metacharacters</a></li>
</ul>
<p class="docText">
<img alt="graphics/ch02.gif" src="ch02.gif" border="0" width="497" height="350"></p>
<p class="docText">There are hundreds of UNIX utilities available, and many of
them are everyday commands such as <span class="docEmphasis">ls, pwd,</span>
<span class="docEmphasis">who</span>, and <span class="docEmphasis">vi</span>.
Just as there are essential tools that a carpenter uses, there are also
essential tools the shell programmer needs to write meaningful and efficient
scripts. The three major utilities that will be discussed in detail here are
<span class="docEmphasis">grep, sed</span>, and <span class="docEmphasis">awk</span>.
These programs are the most important UNIX tools available for manipulating
text, output from a pipe, or standard input. In fact, <span class="docEmphasis">
sed</span> and <span class="docEmphasis">awk</span> are often used as scripting
languages by themselves. Before you fully appreciate the power of
<span class="docEmphasis">grep,</span> <span class="docEmphasis">sed</span>, and
<span class="docEmphasis">awk</span>, you must have a good foundation on the use
of regular expressions and regular expression metacharacters. A complete list of
useful UNIX utilities is found in
<a class="docLink" href="Appendix A.htm">Appendix A</a> of
this book.</p>
<h3 class="docSection1Title" id="ch02lev1sec1">2.1 Regular Expressions</h3>
<h4 class="docSection2Title" id="ch02lev2sec1">2.1.1 Definition and Example</h4>
<p class="docText">For users already familiar with the concept of regular
expression metacharacters, this section may be bypassed. However, this
preliminary material is crucial to understanding the variety of ways in which
<span class="docEmphasis">grep, sed</span>, and <span class="docEmphasis">awk</span>
are used to display and manipulate data.</p>
<p class="docText">What is a regular expression? A regular expression<span id="ENB2-1"><a class="docLink" href="#EN2-1"><sup>[1]</sup></a></span>
is just a pattern of characters used to match the same characters in a search.
In most programs, a regular expression is enclosed in forward slashes; for
example, <span class="docEmphasis">/love/</span> is a regular expression
delimited by forward slashes, and the pattern <span class="docEmphasis">love</span>
will be matched any time the same pattern is found in the line being searched.
What makes regular expressions interesting is that they can be controlled by
special metacharacters. If you are new to the idea of regular expressions, let
us look at an example that will help you understand what this whole concept is
about. Suppose that you are working in the <span class="docEmphasis">vi</span>
editor on an e-mail message to your friend. It looks like this:</p>
<pre>% <span class="docEmphStrong">vi letter</span>
------------------------------------------------------------------
Hi tom,
I think I failed my anatomy test yesterday. I had a terrible
stomach ache. I ate too many fried green tomatoes.
Anyway, Tom, I need your help. I'd like to make the test up
tomorrow, but don't know where to begin studying. Do you
think you could help me? After work, about 7 PM, come to
my place and I'll treat you to pizza in return for your help. Thanks.
Your pal,
guy@phantom
~
~
~
~
------------------------------------------------------------------
</pre>
<p class="docText">Now, suppose you find out that Tom never took the test
either, but David did. You also notice that in the greeting, you spelled
<span class="docEmphasis">Tom</span> with a lowercase <span class="docEmphasis">
t.</span> So you decide to make a global substitution to replace all occurrences
of <span class="docEmphasis">tom</span> with <span class="docEmphasis">David</span>,
as follows:</p>
<pre>% <span class="docEmphStrong">vi letter</span>
------------------------------------------------------------------
Hi <span class="docEmphasis">David,</span>
I think I failed my ana<span class="docEmphasis">David</span>y test yeserday. I had a terrible
s<span class="docEmphasis">David</span>achache. I think I ate too many fried green <span class="docEmphasis">David</span>atoes.
Anyway, Tom, I need your help. I'd like to make the test up
<span class="docEmphasis">David</span>orrow, but don't know where to begin studying. Do you
think you could help me? After work, about 7 PM, come to
my place and I'll treat you to pizza in return for your help. Thanks.
Your pal,
guy@phan<span class="docEmphasis">David</span>
~
~
~
--> <span class="docEmphStrong">:1,$s/tom/David/g</span>
------------------------------------------------------------------
</pre>
<p class="docText">The regular expression in the search string is
<span class="docEmphasis">tom.</span> The replacement string is
<span class="docEmphasis">David.</span> The <span class="docEmphasis">vi</span>
command reads "for lines 1 to the end of the file ($), substitute
<span class="docEmphasis">tom</span> everywhere it is found on each line and
replace it with <span class="docEmphasis">David</span>." Hardly what you want!
And one of the occurrences of <span class="docEmphasis">Tom</span> was untouched
because you only asked for <span class="docEmphasis">tom</span>, not
<span class="docEmphasis">Tom</span>, to be replaced with
<span class="docEmphasis">David</span>. So what to do?</p>
<p class="docText">Regular expression metacharacters are special characters that
allow you to delimit a pattern in some way so that you can control what
substitutions will take place. There are metacharacters to anchor a word to the
beginning or end of a line. There are metacharacters that allow you to specify
any characters, or some number of characters, to find both upper- and lowercase
characters, digits only, and so forth. For example, to change the name
<span class="docEmphasis">tom</span> or <span class="docEmphasis">Tom</span> to
<span class="docEmphasis">David</span>, the following <span class="docEmphasis">
vi</span> command would have done the job:</p>
<pre><span class="docEmphStrong">:1,$s/\<[Tt]om\>/David/g</span>
</pre>
<p class="docText">This command reads, "From the first line to the last line of
the file (<span class="docEmphasis">1,$</span>), substitute (<span class="docEmphasis">s</span>)
the word <span class="docEmphasis">Tom</span> or <span class="docEmphasis">tom</span>
with <span class="docEmphasis">David</span>," and the <span class="docEmphasis">
g</span> flag says to do this globally (i.e., make the substitution if it occurs
more than once on the same line). The regular expression metacharacters are \<
and \> for beginning and end of a word, and the pair of brackets, [<span class="docEmphasis">Tt</span>],
match for one of the characters enclosed within them (in this case, for either
<span class="docEmphasis">T</span> or <span class="docEmphasis">t</span>). There
are five basic metacharacters that all UNIX pattern-matching utilities
recognize, and then an extended set of metacharacters that vary from program to
program.</p>
<h4 class="docSection2Title" id="ch02lev2sec2">2.1.2 Regular Expression Metacharacters</h4>
<p class="docText"><a class="docLink" href="#ch02table01">Table 2.1</a> presents
regular expression metacharacters that can be used in all versions of
<span class="docEmphasis">vi, ex, grep, egrep, sed</span>, and
<span class="docEmphasis">awk.</span> Additional metacharacters are described
for each of the utilities where applicable.</p>
<table cellSpacing="0" cellPadding="1" width="100%" border="1">
<caption>
<h5 id="ch02table01" class="docTableTitle">Table 2.1. Regular Expression Metacharacters</h5>
</caption>
<colgroup span="4" align="left">
</colgroup>
<tr>
<th class="docTableHeader" vAlign="top"><span class="docEmphBoldItalic">
Metacharacter</span></th>
<th class="docTableHeader" vAlign="top"><span class="docEmphBoldItalic">
Function</span></th>
<th class="docTableHeader" vAlign="top"><span class="docEmphBoldItalic">
Example</span></th>
<th class="docTableHeader" vAlign="top"><span class="docEmphBoldItalic">What
It Matches</span></th>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">^</span></td>
<td class="docTableCell" vAlign="top">Beginning-of-line anchor</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/^love/</span></td>
<td class="docTableCell" vAlign="top">Matches all lines beginning with
<span class="docEmphasis">love.</span></td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">$</span></td>
<td class="docTableCell" vAlign="top">End-of-line anchor</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/love$/</span></td>
<td class="docTableCell" vAlign="top">Matches all lines ending with
<span class="docEmphasis">love.</span></td>
</tr>
<tr>
<td class="docTableCell" vAlign="top">.</td>
<td class="docTableCell" vAlign="top">Matches one character</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/l..e/</span></td>
<td class="docTableCell" vAlign="top">Matches lines containing an
<span class="docEmphasis">l</span>, followed by two characters, followed by
an <span class="docEmphasis">e</span>.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">*</span></td>
<td class="docTableCell" vAlign="top">Matches zero or more of the preceding
characters</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/ *love/</span></td>
<td class="docTableCell" vAlign="top">Match lines with zero or more spaces,
followed by the pattern <span class="docEmphasis">love</span>.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">[ ]</span></td>
<td class="docTableCell" vAlign="top">Matches one in the set</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/[Ll]ove/</span></td>
<td class="docTableCell" vAlign="top">Matches lines containing
<span class="docEmphasis">love</span> or <span class="docEmphasis">Love</span>.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">[x杫]</span></td>
<td class="docTableCell" vAlign="top">Matches one character within a range
in the set</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/[A朲]ove/</span></td>
<td class="docTableCell" vAlign="top">Matches letters from
<span class="docEmphasis">A</span> through <span class="docEmphasis">Z</span>
followed by <span class="docEmphasis">ove</span>.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">[^ ]</span></td>
<td class="docTableCell" vAlign="top">Matches one character not in the set</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/[^A朲]/</span></td>
<td class="docTableCell" vAlign="top">Matches any character not in the range
between <span class="docEmphasis">A</span> and <span class="docEmphasis">Z</span>.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">\</span></td>
<td class="docTableCell" vAlign="top">Used to escape a metacharacter</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/love\./</span></td>
<td class="docTableCell" vAlign="top">Matches lines containing
<span class="docEmphasis">love</span>, followed by a literal period.
Normally the period matches one of any character.</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top" colSpan="4">
<span class="docEmphBoldItalic">Additional metacharacters are supported by
many UNIX programs that use RE metacharacters:</span></td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">\<</span></td>
<td class="docTableCell" vAlign="top">Beginning-of-word anchor</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/\<love/</span></td>
<td class="docTableCell" vAlign="top">Matches lines containing a word that
begins with <span class="docEmphasis">love</span> (supported by
<span class="docEmphasis">vi</span> and <span class="docEmphasis">grep</span>).</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">\></span></td>
<td class="docTableCell" vAlign="top">End-of-word anchor</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">/love\>/</span></td>
<td class="docTableCell" vAlign="top">Matches lines containing a word that
ends with <span class="docEmphasis">love</span> (supported by
<span class="docEmphasis">vi</span> and <span class="docEmphasis">grep</span>).</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">\(..\)</span></td>
<td class="docTableCell" vAlign="top">Tags match characters to be used later</td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">
/\(love\)able \1er/</span></td>
<td class="docTableCell" vAlign="top">May use up to nine tags, starting with
the first tag at the left-most part of the pattern. For example, the pattern
<span class="docEmphasis">love</span> is saved as tag 1, to be referenced
later as <span class="docEmphasis">\1</span>; in this example, the search
pattern consists of <span class="docEmphasis">lovable</span> followed by
<span class="docEmphasis">lover</span> (supported by
<span class="docEmphasis">sed, vi,</span> and <span class="docEmphasis">grep</span>).</td>
</tr>
<tr>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">x{m\}</span>or
<span class="docEmphasis">x{m,\}</span>or <span class="docEmphasis">x{m,n\}</span></td>
<td class="docTableCell" vAlign="top">Repetition of character x, m times, at
least m times, at least m and not more than n times<sup class="docFootnote"><a class="docLink" href="#ch02tabfn01">[a]</a></sup></td>
<td class="docTableCell" vAlign="top"><span class="docEmphasis">o{5,10\}</span></td>
<td class="docTableCell" vAlign="top">Matches if line contains between 5 and
<span class="docEmphasis">10</span> consecutive occurrences of the letter
<span class="docEmphasis">o</span> (supported by <span class="docEmphasis">
vi</span> and <span class="docEmphasis">grep</span>).</td>
</tr>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -