主题网络爬虫

上传者: 43000290 | 上传时间: 2019-12-21 20:57:10 | 文件大小: 35KB | 文件类型: zip
网络爬虫是一种能够自动采集互联网信息的程序。网络爬虫不但能够作为搜索引擎的采集器,而且可以用于特定信息的采集,根据某些特定的要求采集网站中的信息,如就业,租房信息等。本文设计并实现了一种基于主题的网络爬虫程序。网络爬虫采用何种搜索策略和如何评价当前页面的主题相关度是基于主题的网络爬虫需要解决的关键问题。本文设计的网络爬虫采用广度优先搜索,对url进行解析、去重等。并应用Java多线程,使爬虫在抓取网页的过程中更有效率。通常评价页面相关度是采用基于内容评价的搜索策略,本文实现了三个常用的相关度评价算法分别是基于网页内容的相关度算法、基于网页内容和标题的相关度算法、基于网页内容和链接结构的相关度算法。

文件下载

资源详情

[{"title":"( 23 个子文件 35KB ) 主题网络爬虫","children":[{"title":"theme","children":[{"title":".project <span style='color:#111;'> 381B </span>","children":null,"spread":false},{"title":"src","children":[{"title":"theme","children":[{"title":"CrawlerFrame.java <span style='color:#111;'> 10.23KB </span>","children":null,"spread":false},{"title":"Crawler.java <span style='color:#111;'> 11.58KB </span>","children":null,"spread":false},{"title":"LinkFilter.java <span style='color:#111;'> 160B </span>","children":null,"spread":false},{"title":"PriorityURL.java <span style='color:#111;'> 505B </span>","children":null,"spread":false},{"title":"HttpConstants.java <span style='color:#111;'> 606B </span>","children":null,"spread":false},{"title":"Download.java <span style='color:#111;'> 6.95KB </span>","children":null,"spread":false},{"title":"HtmlParserTool.java <span style='color:#111;'> 1.74KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":".settings","children":[{"title":"org.eclipse.jdt.core.prefs <span style='color:#111;'> 598B </span>","children":null,"spread":false}],"spread":true},{"title":".classpath <span style='color:#111;'> 858B </span>","children":null,"spread":false},{"title":"bin","children":[{"title":"theme","children":[{"title":"CrawlerFrame.class <span style='color:#111;'> 10.52KB </span>","children":null,"spread":false},{"title":"Crawler$2.class <span style='color:#111;'> 676B </span>","children":null,"spread":false},{"title":"Crawler$1.class <span style='color:#111;'> 738B </span>","children":null,"spread":false},{"title":"HttpConstants.class <span style='color:#111;'> 645B </span>","children":null,"spread":false},{"title":"Crawler$Task.class <span style='color:#111;'> 619B </span>","children":null,"spread":false},{"title":"Crawler$3.class <span style='color:#111;'> 906B </span>","children":null,"spread":false},{"title":"PriorityURL.class <span style='color:#111;'> 873B </span>","children":null,"spread":false},{"title":"Crawler.class <span style='color:#111;'> 13.09KB </span>","children":null,"spread":false},{"title":"HtmlParserTool.class <span style='color:#111;'> 2.64KB </span>","children":null,"spread":false},{"title":"LinkFilter.class <span style='color:#111;'> 148B </span>","children":null,"spread":false},{"title":"Download.class <span style='color:#111;'> 8.11KB </span>","children":null,"spread":false},{"title":"HtmlParserTool$1.class <span style='color:#111;'> 817B </span>","children":null,"spread":false}],"spread":false}],"spread":true},{"title":"result <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

  • 咸鱼参上 :
    最近在做毕设,发现这个小程序是这个文档的项目https://kns.cnki.net/kcms/detail/detail.aspx?dbcode=CJFD&dbname=CJFD2014&filen
    2021-01-07
  • Damugeisme :
    还以为是带文档的,结果就是个小程序
    2019-05-23

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明