亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? webcrawler

?? java寫的html的解析器parser
??
字號:
<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><!-- $Id: head.tmpl,v 1.5 2002/12/15 01:30:47 carstenklapp Exp $ --><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /><meta name="robots" content="index,follow" /><meta name="keywords" content="Web Crawler, PhpWiki" /><meta name="description" content="A crawler is a program that picks up a page and follows all the links on the page. Crawlers are used in search engines to index all the pages on a website, starting only from the first page (as long as it is linked)." /><meta name="language" content="" /><meta name="document-type" content="Public" /><meta name="document-rating" content="General" /><meta name="generator" content="phpWiki" /><meta name="PHPWIKI_VERSION" content="1.3.4" /><link rel="shortcut icon" href="/wiki/themes/default/images/favicon.ico" /><link rel="home" title="HomePage" href="HomePage" /><link rel="help" title="HowToUseWiki" href="HowToUseWiki" /><link rel="copyright" title="GNU General Public License" href="http://www.gnu.org/copyleft/gpl.html#SEC1" /><link rel="author" title="The PhpWiki Programming Team" href="http://phpwiki.sourceforge.net/phpwiki/ThePhpWikiProgrammingTeam" /><link rel="search" title="FindPage" href="FindPage" /><link rel="alternate" title="View Source: WebCrawler" href="WebCrawler?action=viewsource&amp;version=8" /><link rel="alternate" type="application/rss+xml" title="RSS" href="RecentChanges?format=rss" /><link rel="bookmark" title="SandBox" href="SandBox" /><link rel="bookmark" title="WikiWikiWeb" href="WikiWikiWeb" /><link rel="stylesheet" title="MacOSX" type="text/css" charset="iso-8859-1" href="/wiki/themes/MacOSX/MacOSX.css" /><link rel="alternate stylesheet" title="Printer" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-printer.css" media="print, screen" /><link rel="alternate stylesheet" title="Modern" type="text/css" charset="iso-8859-1" href="/wiki/themes/default/phpwiki-modern.css" /><style type="text/css"><!--body {background-image: url(/wiki/themes/MacOSX/images/bgpaper8.png);}--></style><title>PhpWiki - Web Crawler</title></head><!-- End head --><!-- Begin body --><!-- $Id: body.tmpl,v 1.30 2002/09/02 14:36:58 rurban Exp $ --><body><!-- Begin top --><!-- $Id: top.tmpl,v 1.20 2002/12/15 01:30:47 carstenklapp Exp $ --><!-- End top --><!-- Begin browse --><!-- $Id: browse.tmpl,v 1.22 2002/02/19 23:00:26 carstenklapp Exp $ --><div class="wikitext"><p><b>Web Crawler (aka Spider)</b></p><p>A crawler is a program that picks up a page and follows all the links on the page. Crawlers are used in search engines to index all the pages on a website, starting only from the first page (as long as it is linked).</p><p>There are several crawlers out there, but few are good quality open-source crawlers. The problem is most crawlers could fail if the parser they use is not powerful. Using HTMLParser, it is possible to crawl through dirty html - with great speed.</p><p>There are two types of crawlers:</p><ul><li>Breadth First</li><li>Depth First</li></ul><p><b>Breadth First crawlers</b> use the BFS (Breadth-First Search) algorithm. Here's a brief description :</p><p>Get all links from the starting page and add to the queuePick first link from queue, get all links on this page, and add to queueRepeat Step 2, until queue is empty</p><p><b>Depth First crawlers</b> use the DFS (Depth-First Search) algorithm. Here's a brief description :</p><p>Get first link that has not yet been visited, from the starting page.Visit link and get first non-visited linkRepeat Step 2, until there are no further non-visited links.Go to next non-visited link in the previous level of recursion, and repeat step 2, until no more non-visited links are presentBFS crawlers are simple to write. DFS can be slightly more involved, so we shall present a simple DFS crawler program below. This is a basic program, and is included in the <i>org.htmlparser.parserapplications</i> package - <i>Robot.java</i>. Feel free to modify it or add functionality to it.</p><pre>import org.htmlparser.Parser;public class Robot {  private Parser parser;  /**   * Robot crawler - Provide the starting url   */  public Robot(String resourceLocation) {    try {      parser = new Parser(resourceLocation,new DefaultParserFeedback());      parser.registerScanners();    }    catch (ParserException e) {      System.err.println("Error, could not create parser object");      e.printStackTrace();    }  }  /**   * Crawl using a given crawl depth.   * @param crawlDepth Depth of crawling   */  public void crawl(int crawlDepth) throws ParserException  {    try {      crawl(parser,crawlDepth);    }    catch (ParserException e) {      throw new ParserException("ParserException at crawl("+crawlDepth+")",e);    }  }  /**   * Crawl using a given parser object, and a given crawl depth.   * @param parser Parser object   * @param crawlDepth Depth of crawling   */  public void crawl(Parser parser,int crawlDepth) throws ParserException {    System.out.println(" crawlDepth = "+crawlDepth);    for (NodeIterator e = parser.elements();e.hasMoreNodes();)    {      Node node = e.nextNode();      if (node instanceof LinkTag)      {        LinkTag linkTag = (LinkTag)node;        {          if (!linkTag.isMailLink())          {            if (linkTag.getLink().toUpperCase().indexOf("HTM")!=-1 ||              linkTag.getLink().toUpperCase().indexOf("COM")!=-1 ||              linkTag.getLink().toUpperCase().indexOf("ORG")!=-1)            {              if (crawlDepth&gt;0)              {                Parser newParser = new Parser(linkTag.getLink(),new DefaultParserFeedback());                newParser.registerScanners();                System.out.print("Crawling to "+linkTag.getLink());                crawl(newParser,crawlDepth-1);              }              else System.out.println(linkTag.getLink());            }          }        }      }    }  }  public static void main(String[] args)  {    System.out.println("Robot Crawler v"+Parser.VERSION_STRING);    if (args.length&lt;2 || args[0].equals("-help"))    {      System.out.println();      System.out.println("Syntax : java -classpath htmlparser.jar org.htmlparser.parserapplications.Robot &lt;resourceLocn/website&gt; &lt;depth&gt;");      System.out.println();      System.out.println("   &lt;resourceLocn&gt; the name of the file to be parsed (with complete path ");      System.out.println("                  if not in current directory)");      System.out.println("   &lt;depth&gt; No of links to be followed from each link");      System.out.println("   -help This screen");      System.out.println();      System.out.println("HTML Parser home page : http://htmlparser.sourceforge.net");      System.out.println();      System.out.println("Example : java -classpath htmlparser.jar com.kizna.parserapplications.Robot http://www.google.com 3");      System.out.println();      System.out.println("If you have any doubts, please join the HTMLParser mailing list (user/developer) from the HTML Parser home page instead of mailing any of the contributors directly. You will be surprised with the quality of open source support. ");      System.exit(-1);    }    String resourceLocation="";    int crawlDepth = 1;    if (args.length!=0) resourceLocation = args[0];    if (args.length==2) crawlDepth=Integer.valueOf(args[1]).intValue();    Robot robot = new Robot(resourceLocation);    System.out.println("Crawling Site "+resourceLocation);    try {      robot.crawl(crawlDepth);    }    catch (ParserException e) {      e.printStackTrace();    }  }}</pre><p>The method that does the crawling is the recursive method crawl(parser,depth). The crawler goes about creating multiple parsers and moving through sites using the DFS approach.</p><p>You have to be careful of the depth provided to the crawler. Studying the time taken to map all the links is itself an interesting research project. A word of caution, some sites dont like crawlers going through them. They would have a file called robots.txt in the root directory which should be accessed to know the rules and honor them. Read more about this. The above program is only a demonstration program. Please note that it will only follow links that have ".com", ".htm" or ".org" ending. In real-life situations, you'd also want to support dynamic links.</p><p>Before you set out to design an open-source or commercia crawler, please study what others have already researched in this area.</p><p><b>Some Useful Links on Crawlers</b></p><ul><li><a href="http://dollar.biz.uiowa.edu/%7Efil/IS/" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />InfoSpiders</span></a></li><li><a href="http://www.searchtools.com/robots/robots-articles.html" class="namedurl"><span style="white-space: nowrap"><img src="../themes/MacOSX/images/http.png" alt="http" class="linkicon" border="0" />Collection</span> of Crawler Links</a></li></ul><p>--<a href="../index.php/SomikRaha" class="wiki">SomikRaha</a>, Sunday, February 16, 2003 2:13:46 pm.</p></div><!-- End browse --><!-- Begin bottom --><!-- $Id: bottom.tmpl,v 1.3 2002/09/15 20:21:16 rurban Exp $ --><!-- Add your Disclaimer here --><!-- Begin debug --><!-- $Id: debug.tmpl,v 1.9 2002/09/17 02:10:33 dairiki Exp $ --><table width="%100" border="0" cellpadding="0" cellspacing="0"><tr><td></td><td><span class="debug">Page Execution took 0.291 seconds</span></td></tr></table><!-- This keeps the valid XHTML! icons from "hanging off the bottom of the scree" --><br style="clear: both;" /><!-- End debug --><!-- End bottom --></body><!-- End body --><!-- phpwiki source:$Id: prepend.php,v 1.13 2002/09/18 19:23:25 dairiki Exp $$Id: ErrorManager.php,v 1.16 2002/09/14 22:23:36 dairiki Exp $$Id: HtmlElement.php,v 1.27 2002/10/31 03:28:30 carstenklapp Exp $$Id: XmlElement.php,v 1.17 2002/08/17 15:52:51 rurban Exp $$Id: WikiCallback.php,v 1.2 2001/11/21 20:01:52 dairiki Exp $$Id: index.php,v 1.99 2002/12/31 01:13:14 wainstead Exp $$Id: main.php,v 1.90 2002/11/19 07:07:37 carstenklapp Exp $$Id: config.php,v 1.68 2002/11/14 22:28:03 carstenklapp Exp $$Id: FileFinder.php,v 1.11 2002/09/18 18:34:13 dairiki Exp $$Id: Request.php,v 1.24 2002/12/14 16:21:46 dairiki Exp $$Id: WikiUser.php,v 1.29 2002/11/19 07:07:38 carstenklapp Exp $$Id: WikiDB.php,v 1.17 2002/09/15 03:56:22 dairiki Exp $$Id: SQL.php,v 1.2 2001/09/19 03:24:36 wainstead Exp $$Id: mysql.php,v 1.3 2001/12/08 16:02:35 dairiki Exp $$Id: PearDB.php,v 1.28 2002/09/12 11:45:33 rurban Exp $$Id: backend.php,v 1.3 2002/01/10 23:32:04 carstenklapp Exp $$Id: DB.php,v 1.2 2002/09/12 11:45:33 rurban Exp $From Pear CVS: Id: DB.php,v 1.13 2002/07/02 15:19:49 cox Exp$Id: PEAR.php,v 1.1 2002/01/28 04:01:56 dairiki Exp $From Pear CVS: Id: PEAR.php,v 1.29 2001/12/15 15:01:35 mj Exp$Id: mysql.php,v 1.2 2002/09/12 11:45:33 rurban Exp $From Pear CVS: Id: mysql.php,v 1.5 2002/06/19 00:41:06 cox Exp$Id: common.php,v 1.2 2002/09/12 11:45:33 rurban Exp $From Pear CVS: Id: common.php,v 1.8 2002/06/12 15:03:16 fab Exp$Id: themeinfo.php,v 1.46 2002/03/08 20:31:14 carstenklapp Exp $$Id: Theme.php,v 1.58 2002/10/12 08:55:03 carstenklapp Exp $$Id: display.php,v 1.38 2002/09/15 20:17:58 rurban Exp $$Id: Template.php,v 1.46 2002/09/15 15:05:47 rurban Exp $$Id: WikiPlugin.php,v 1.27 2002/11/04 03:15:59 carstenklapp Exp $$Id: BlockParser.php,v 1.29 2002/11/25 22:25:49 dairiki Exp $$Id: InlineParser.php,v 1.19 2002/11/25 22:51:37 dairiki Exp $$Id: interwiki.php,v 1.23 2002/10/06 16:45:10 dairiki Exp $$Id: PageType.php,v 1.13 2002/09/04 20:39:47 dairiki Exp $--></html>

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
91精品国产免费久久综合| 日韩专区一卡二卡| 久久蜜臀中文字幕| 日韩三级视频中文字幕| 欧美久久久久久蜜桃| 欧美色老头old∨ideo| 91福利在线播放| 在线观看91视频| 欧美在线综合视频| 精品视频一区二区不卡| 欧美日韩国产首页| 91精品啪在线观看国产60岁| 欧美三级中文字幕| 欧美一区二区三区免费大片| 6080亚洲精品一区二区| 日韩亚洲欧美高清| 久久久一区二区| 国产精品毛片高清在线完整版| 国产蜜臀97一区二区三区| 久久蜜桃香蕉精品一区二区三区| 26uuu国产日韩综合| 精品国产91乱码一区二区三区| 国产欧美一区二区三区在线老狼| 国产精品日产欧美久久久久| 亚洲愉拍自拍另类高清精品| 奇米亚洲午夜久久精品| 国产成人啪免费观看软件| 99热精品一区二区| 欧美高清精品3d| 国产三级精品三级在线专区| 亚洲另类一区二区| 裸体一区二区三区| 91网站在线播放| 欧美成人精品3d动漫h| 国产精品女主播在线观看| 天天做天天摸天天爽国产一区 | 免费观看日韩电影| 国内精品国产成人国产三级粉色| 国产麻豆成人传媒免费观看| 国产高清亚洲一区| 99久久免费视频.com| 国产福利不卡视频| 6080日韩午夜伦伦午夜伦| 91精品国产高清一区二区三区蜜臀 | 国产成人在线电影| 国产二区国产一区在线观看| 成人免费高清在线观看| 91蝌蚪国产九色| 欧美亚洲国产一区二区三区va | 国产欧美一区在线| 欧美激情在线一区二区三区| 国产亚洲短视频| 亚洲欧美一区二区三区久本道91| 亚洲美女区一区| 亚洲国产一区二区三区青草影视| 日韩不卡一区二区| 国产剧情一区在线| 色综合久久久久久久久久久| 欧美日韩成人综合天天影院| 欧美一区二区三区播放老司机| 精品欧美黑人一区二区三区| 国产免费观看久久| 日韩午夜av一区| 精品国产成人系列| 亚洲男人都懂的| 久久国产精品99久久久久久老狼| 粉嫩嫩av羞羞动漫久久久| 一本久久精品一区二区| 日韩一区二区精品在线观看| 久久久久国产一区二区三区四区| ...av二区三区久久精品| 日产国产欧美视频一区精品| 国产成人午夜99999| 欧美午夜片在线看| 国产精品午夜在线观看| 国产精品久久久久一区二区三区| 亚洲视频一二区| 激情亚洲综合在线| 欧美亚洲动漫精品| 欧美激情综合五月色丁香小说| 一区二区三区中文字幕精品精品 | 欧美丝袜丝nylons| 日韩一区二区三区四区五区六区| 中文字幕一区二区在线播放| 日本中文一区二区三区| 99re这里都是精品| 久久久久久久久岛国免费| 亚洲电影在线免费观看| 99久久精品免费看国产| 久久亚洲影视婷婷| 日韩二区三区在线观看| 91黄视频在线观看| 亚洲天天做日日做天天谢日日欢| 亚洲精品成人a在线观看| 亚洲乱码中文字幕综合| 欧美另类一区二区三区| 国产免费观看久久| 国产在线观看免费一区| 日韩精品一区二| 亚洲国产wwwccc36天堂| 国内久久精品视频| 精品国产免费一区二区三区四区| 午夜不卡av在线| 欧美巨大另类极品videosbest | 成人免费毛片高清视频| 日韩欧美国产系列| 丝袜美腿亚洲综合| 欧美二区乱c少妇| 天天色天天操综合| 日本韩国精品一区二区在线观看| 777欧美精品| 日韩精品色哟哟| 欧美一区二区三区在线观看| 婷婷久久综合九色综合绿巨人| 91麻豆国产自产在线观看| 国产精品蜜臀av| 91在线视频播放地址| 亚洲欧美日韩久久| 欧美精品亚洲一区二区在线播放| 午夜激情综合网| 欧美videossexotv100| 日韩av电影免费观看高清完整版| 欧美在线免费观看亚洲| 五月婷婷综合网| 精品国产精品网麻豆系列| 国产精品一区二区你懂的| 日本一区免费视频| 99久久国产综合精品女不卡| 一区二区三区在线免费观看| 精品视频一区二区三区免费| 久久国产人妖系列| 99久久久精品| 日韩精品免费视频人成| 26uuu另类欧美| 97成人超碰视| 亚洲成av人片观看| 欧美大片拔萝卜| 福利电影一区二区三区| 一区二区久久久久久| 色婷婷久久综合| 在线看国产日韩| 日韩高清一级片| 欧美激情综合五月色丁香| 在线观看视频一区二区欧美日韩| 蜜臀精品久久久久久蜜臀| 国产精品福利一区二区三区| 3d动漫精品啪啪一区二区竹菊| 国产在线观看一区二区| 一区二区三区不卡在线观看| 成人高清免费观看| 亚洲青青青在线视频| 欧美草草影院在线视频| 青青青伊人色综合久久| 色婷婷国产精品| 中文字幕视频一区| 91碰在线视频| 国产成人午夜高潮毛片| 亚洲国产一区视频| 中文字幕一区在线| 精品成人一区二区三区四区| 99久久国产综合精品色伊| 久久国产精品区| 国产亚洲综合av| 中文字幕乱码日本亚洲一区二区| 欧美三级在线播放| caoporn国产精品| 麻豆精品视频在线观看| 一区二区三区在线看| 欧美国产日韩亚洲一区| 欧美一级搡bbbb搡bbbb| 欧美xxxxxxxx| 欧美日韩一区二区在线视频| 99精品热视频| www.色综合.com| 国产成人免费视| 大桥未久av一区二区三区中文| 久久精品国产精品青草| 成人欧美一区二区三区视频网页| 亚洲综合免费观看高清完整版在线 | 午夜免费久久看| 亚洲欧美电影一区二区| 国产精品福利电影一区二区三区四区| 欧美tickling挠脚心丨vk| 欧美一级高清片在线观看| 91国偷自产一区二区三区观看| 国产在线国偷精品产拍免费yy | 亚洲国产欧美在线人成| 亚洲精品一线二线三线无人区| 欧美乱熟臀69xxxxxx| 欧美老肥妇做.爰bbww视频| 91福利精品视频| 91丨porny丨国产入口| 99久久综合精品| 99这里只有精品| 91久久精品午夜一区二区| 不卡电影一区二区三区| 国产a级毛片一区| 久久99精品久久久久久久久久久久| 麻豆精品国产91久久久久久| 国产一区二区精品久久99|