关于固定元字符的一个问题

ipconfigme · 发表于 2006-2-19 10:59:59

wangyao@fisherman ~/test
$ cat fuhao
>hdfgio
>>dfjkhgeriu
...JIdjewai
>>reuhwqi
< hsdfui >
<>
<abc.>
<abc>
abc
dfh abc huihd hsdf
abd abcd cbabc
.abc .bcd sidfie ..sdfhf <abc .abc>
<abc. .abc>. .<abc.>
abc. .abc. nsdf <abc>.
.<abc>. hdsufih .dcbhdj.
wangyao@fisherman ~/test
$ grep '\<abc\>' fuhao
<abc.>
<abc>
abc
dfh abc huihd hsdf
.abc .bcd sidfie ..sdfhf <abc .abc>
<abc. .abc>. .<abc.>
abc. .abc. nsdf <abc>.
.<abc>. hdsufih .dcbhdj.

复制代码

这如何解释？

\<\>称为固定元字符，但是它并不是一个真正的元字符。它只在grep中使用；在sed，awk和egrep中没有包含此元字符，因此它在sed，awk和egrep是不等价的。

固定元字符表达式和尖括号内的词严格匹配。它的标记是\<和\>。需要严格匹配的词包含在两个元字符对之间。固定元字符正则表达式可以在固定元字符见包含其他的正则表达式元字符。

yongjian · 发表于 2006-2-20 13:01:30

GNU egrep, sed,awk,grep all returns the same result by using \< \> as the word boundary. I just tested it. So what is your question?

ipconfigme · 发表于 2006-2-20 16:33:36

\<>\表示词的范围，准确匹配一个词。
\<abc>\应该匹配abc这个词，但是结果正如上面所示，<abc> abc. 也匹配了

下面的一段话摘自《O'Reilly - Mastering regular expressions》。
There are three types of escaped items:

1. The pairing of \ and a metacharacter is a metasequence to match the literal character (for example, \* matches a literal asterisk).
2. The pairing of \ and selected non-metacharacters becomes a metasequence with an implementation-defined meaning (for example, \< often means "start of word").
3. The pairing of \ and any other character defaults to simply matching the character (that is, the backslash is ignored).

关键在\< often means "start of word"。often不代表总是。

yongjian · 发表于 2006-2-21 04:13:01

看来在于如何定义"word"的了。a-z组成是word,而<,.,{,等就不包含在内了。不过只是实验过，没有查正式文档的定义。perl中只把[a-z],[A-Z]和-定位为"word",其他都不是。

ipconfigme · 发表于 2006-2-21 09:04:30

看来在于如何定义"word"的了。a-z组成是word,而<,.,{,等就不包含在内了。不过只是实验过，没有查正式文档的定义。perl中只把[a-z],[A-Z]和-定位为"word",其他都不是。

看来word的定义也只能这样认为了。word是两边为空格、制表符、标点或<,.,{,符号的一个字符串。

但是这样就有了一个新的问题。如何在上述文件中精确的匹配 abc ，而不匹配 <abc>

大家不要怪我，对于问题要执着一些吗？：－)

		自动登录	找回密码
密码			注册