LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 1053|回复: 4

怎么把bookmark的网址提取出来?

[复制链接]
发表于 2003-11-12 22:01:15 | 显示全部楼层 |阅读模式
格式是:
<DT><A HREF="http://www.6bytes.com/meaculpa/index.html" ADD_DATE="1068055591" LAST_VISIT="1068138006" LAST_CHARSET="GB2312">FreeBSD Basics</A>
    <DT><A HREF="http://www.insecure.org/" ADD_DATE="1068179199" LAST_MODIFIED="1068223695" LAST_CHARSET="ISO-8859-1">Insecure.Org </A>
    <DT><A HREF="http://www.hackerzhell.co.uk/portscanners.php" ADD_DATE="1068181322" LAST_VISIT="1068315472" ICON="http://www.hackerzhell.co.uk/favicon.ico" LAST_CHARSET="ISO-8859-1">HackerzHell - For your security needs!</A>
    <DT><A HREF="http://www.mostgraveconcern.com/freebsd/" ADD_DATE="1068226914" LAST_VISIT="1068616692" LAST_CHARSET="GB2312">FreeBSD Cheat Sheets</A>



</bookmark>
<bookmark icon="favicons/www.freebsdforum.org" href="http://www.freebsdforum.org/" >
  <title>bsdforums.org - FreeBSD OpenBSD NetBSD Darwin Mac OSX Linux Unix forums,  message boards, discussions
and news.</title>
</bookmark>
<bookmark icon="www" href="http://www.freebsdsearch.com/" >
  <title>Welcome to FreeBSDSearch.com</title>

先谢了
发表于 2003-11-12 22:39:52 | 显示全部楼层
[/home/javalee/myshell]cat tmp
<DT><A HREF="http://www.6bytes.com/meaculpa/index.html" ADD_DATE="1068055591"
LAST_VISIT="1068138006" LAST_CHARSET="GB2312">FreeBSD Basics</A>
<DT><A HREF="http://www.insecure.org/" ADD_DATE="1068179199" LAST_MODIFIED="1068223695" LAST_CHARSET="ISO-8859-1">Insecure.Org </A>
  <DT><A HREF="http://www.hackerzhell.co.uk/portscanners.php" ADD_DATE="1068181322" LAST_VISIT="1068315472" ICON="http://www.hackerzhell.co.uk/favicon.ico"
LAST_CHARSET="ISO-8859-1">HackerzHell - For your security needs!</A>
   <DT><A HREF="http://www.mostgraveconcern.com/freebsd/" ADD_DATE="1068226914" LAST_VISIT="1068616692" LAST_CHARSET="GB2312">FreeBSD Cheat Sheets</A>

    和

     </bookmark>
      <bookmark icon="favicons/www.freebsdforum.org" href="http://www.freebsdforum.org/" >
       <title>bsdforums.org - FreeBSD OpenBSD NetBSD Darwin Mac OSX Linux Unix forums, message boards, discussions
        and news.</title>
         </bookmark>
          <bookmark icon="www" href="http://www.freebsdsearch.com/" >
           <title>Welcome to FreeBSDSearch.com</title>


来个grep,awk,sed,tr命令的大集合吧~~~~~~笨了点儿,但好理解 ;)
  1. [/home/javalee/myshell]cat tmp|tr '=' '\n'|grep 'www\.'|awk '{print $1}'|sed 's/"//g'
复制代码
结果:
http://www.6bytes.com/meaculpa/index.html
http://www.insecure.org/
http://www.hackerzhell.co.uk/portscanners.php
http://www.hackerzhell.co.uk/favicon.ico
http://www.mostgraveconcern.com/freebsd/
favicons/www.freebsdforum.org
http://www.freebsdforum.org/
http://www.freebsdsearch.com/
 楼主| 发表于 2003-11-13 19:55:24 | 显示全部楼层
谢谢!!这个脚本很有用哦!!
 楼主| 发表于 2003-11-15 15:59:01 | 显示全部楼层
发现有的行不行,是格式问题,不是版主的脚本不行。结果是这样:

http://www.linuxsir.cn/bbs/
http://www.chinaunix.com/
http://www.linuxeden.com/
http://www.pegasus.rutgers.edu/~elflord/index.html
GB2312>http://www.pegasus.rutgers.edu/~elflord/index.html</A>
http://www.google.com/
http://www.baidu.com/

能不能只提取"www....."的部分?
发表于 2003-11-15 17:21:40 | 显示全部楼层

我的思路逐步分离提取找出关键字~~,我记得有个perl很容易,不好意思,我忘了咋写喽~~ ;)
您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表