替换文本中的某些字串

pommy · 发表于 2008-9-26 14:26:48

htm文件中多个这样的字串：

href="[color="Red"]../03df0-03df5/05.pdf"
href="[color="Red"]../03e01-03e50/01.pdf"
href="[color="Red"]../avede-sdfed/00.pdf"

如何用script将

[color="Red"]../03df0-03df5/ 替换为 ../
[color="Red"]../03e01-03e50/ 替换为 ../
[color="Red"]../avede-sdfed/ 替换为 ../

或者说，如何直接删掉 [color="Red"].. 后的12字符？

谢谢

yongjian · 发表于 2008-9-29 17:25:57

建议先看一下abs guide的Regular Expression章节. 花不了多少时间.

seenxu · 发表于 2008-9-29 21:45:51

life is short - you need Python!

one stupid, but relative simple example

import re
a = 'href="../03df0-03df5/05.pdf"'
b = '/'
print re.sub(b+re.split(b, a)[1], '', a)

复制代码

input
href="../03df0-03df5/05.pdf"

output
href="../05.pdf"

to search and replace all <a> tag from *.html, you need beautifulsoup for extra help.
soup = BeautifulSoup.findAll('a'), and then apply the regex rules on those found entries within a loop.

seenxu · 发表于 2008-9-30 01:32:43

import re
from BeautifulSoup import BeautifulSoup
page = '''
<html>
<head>
<title>
</title>
</head>
<body>
<h1>
head level one
</h1>
<div id="atagholder">
<a href="../03df0-03df5/05.pdf">pdf one</a>
<a href="../03e01-03e50/01.pdf">pdf one</a>
<a href="../avede-sdfed/00.pdf">pdf one</a>
</div>
</body>
</html>
'''
print page
soup = BeautifulSoup(page)
node = soup.findAll('a')
for x in node:
x['href'] = '../'+re.split('"', re.split(sep, unicode(x))[2])[0]
print soup

复制代码

INPUT
<html>
<head>
<title>
</title>
</head>
<body>
<h1>
head level one
</h1>
<div id="atagholder">
<a href="../03df0-03df5/05.pdf">pdf one</a>
<a href="../03e01-03e50/01.pdf">pdf one</a>
<a href="../avede-sdfed/00.pdf">pdf one</a>
</div>
</body>
</html>
OUTPUT
<html>
<head>
<title>
</title>
</head>
<body>
<h1>
head level one
</h1>
<div id="atagholder">
<a href="../05.pdf">pdf one</a>
<a href="../01.pdf">pdf one</a>
<a href="../00.pdf">pdf one</a>
</div>
</body>
</html

复制代码

		自动登录	找回密码
密码			注册

替换文本中的某些字串

浏览过的版块