hadoop分布式网络爬虫

上传者: akawawk | 上传时间: 2019-12-21 19:23:50 | 文件大小: 70KB | 文件类型: zip
hadoop分布式网络爬虫的实现, 采用mapreduce和java,能实现深度搜索

文件下载

资源详情

[{"title":"( 59 个子文件 70KB ) hadoop分布式网络爬虫","children":[{"title":"CrawlerDriver-master","children":[{"title":"a.jpg <span style='color:#111;'> 2.41KB </span>","children":null,"spread":false},{"title":"bin","children":[{"title":"org","children":[{"title":"crawler","children":[{"title":"crawlerdriver","children":[{"title":"CrawlerRecordReader$NewLineReader.class <span style='color:#111;'> 3.35KB </span>","children":null,"spread":false},{"title":"CrawlerDriver.class <span style='color:#111;'> 2.54KB </span>","children":null,"spread":false},{"title":"CrawlerRecordReader.class <span style='color:#111;'> 5.53KB </span>","children":null,"spread":false},{"title":"CrawlerDriver$CrawlerReducer.class <span style='color:#111;'> 2.70KB </span>","children":null,"spread":false},{"title":"CrawlerDriver$InverseMapper.class <span style='color:#111;'> 2.04KB </span>","children":null,"spread":false},{"title":"HostPartitioner.class <span style='color:#111;'> 1.40KB </span>","children":null,"spread":false},{"title":"CrawlerInputFormat.class <span style='color:#111;'> 1.94KB </span>","children":null,"spread":false}],"spread":true},{"title":"htmltoxmldriver","children":[{"title":"HtmlToXMLDriver.class <span style='color:#111;'> 2.11KB </span>","children":null,"spread":false},{"title":"HtmlToXMLDriver$HtmlToXMLMapper.class <span style='color:#111;'> 2.10KB </span>","children":null,"spread":false}],"spread":true},{"title":"parserdriver","children":[{"title":"ParserInputFormat.class <span style='color:#111;'> 1.09KB </span>","children":null,"spread":false},{"title":"ParserDriver$ParserMapper.class <span style='color:#111;'> 3.96KB </span>","children":null,"spread":false},{"title":"ParserPartitioner.class <span style='color:#111;'> 1.41KB </span>","children":null,"spread":false},{"title":"ParserDriver.class <span style='color:#111;'> 2.13KB </span>","children":null,"spread":false},{"title":"ParserRecordReader.class <span style='color:#111;'> 4.89KB </span>","children":null,"spread":false}],"spread":true},{"title":"mergedriver","children":[{"title":"MergeDriver.class <span style='color:#111;'> 2.00KB </span>","children":null,"spread":false},{"title":"HashPartitioner.class <span style='color:#111;'> 1.40KB </span>","children":null,"spread":false},{"title":"MergeDriver$IdentityMapper.class <span style='color:#111;'> 1.85KB </span>","children":null,"spread":false},{"title":"MergeDriver$MergeDocReducer.class <span style='color:#111;'> 2.32KB </span>","children":null,"spread":false}],"spread":true},{"title":"util","children":[{"title":"TextArrayWritable.class <span style='color:#111;'> 606B </span>","children":null,"spread":false},{"title":"OutLinksWritable.class <span style='color:#111;'> 3.39KB </span>","children":null,"spread":false},{"title":"DocumentWritable.class <span style='color:#111;'> 3.16KB </span>","children":null,"spread":false},{"title":"Parser.class <span style='color:#111;'> 1.78KB </span>","children":null,"spread":false},{"title":"HttpDownloader.class <span style='color:#111;'> 3.53KB </span>","children":null,"spread":false},{"title":"Downloader.class <span style='color:#111;'> 2.29KB </span>","children":null,"spread":false},{"title":"MetaParser.class <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false}],"spread":true},{"title":"optimizerdriver","children":[{"title":"OptimizerDriver$OptimizerReducer.class <span style='color:#111;'> 2.92KB </span>","children":null,"spread":false},{"title":"OptimizerRecordReader.class <span style='color:#111;'> 5.32KB </span>","children":null,"spread":false},{"title":"OptimizerInputFormat.class <span style='color:#111;'> 1.19KB </span>","children":null,"spread":false},{"title":"OptimizerDriver.class <span style='color:#111;'> 2.45KB </span>","children":null,"spread":false},{"title":"OptimizerDriver$OptimizerMapper.class <span style='color:#111;'> 2.95KB </span>","children":null,"spread":false},{"title":"OptimizerPartitioner.class <span style='color:#111;'> 1.43KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true},{"title":"README2.md <span style='color:#111;'> 71B </span>","children":null,"spread":false},{"title":"src","children":[{"title":"org","children":[{"title":"crawler","children":[{"title":"crawlerdriver","children":[{"title":"CrawlerRecordReader.java <span style='color:#111;'> 8.24KB </span>","children":null,"spread":false},{"title":"CrawlerDriver.java <span style='color:#111;'> 2.83KB </span>","children":null,"spread":false},{"title":"CrawlerInputFormat.java <span style='color:#111;'> 1.08KB </span>","children":null,"spread":false},{"title":"HostPartitioner.java <span style='color:#111;'> 834B </span>","children":null,"spread":false}],"spread":true},{"title":"htmltoxmldriver","children":[{"title":"HtmlToXMLDriver.java <span style='color:#111;'> 1.81KB </span>","children":null,"spread":false}],"spread":true},{"title":"parserdriver","children":[{"title":"ParserPartitioner.java <span style='color:#111;'> 615B </span>","children":null,"spread":false},{"title":"ParserInputFormat.java <span style='color:#111;'> 603B </span>","children":null,"spread":false},{"title":"ParserRecordReader.java <span style='color:#111;'> 4.11KB </span>","children":null,"spread":false},{"title":"ParserDriver.java <span style='color:#111;'> 3.47KB </span>","children":null,"spread":false}],"spread":true},{"title":"mergedriver","children":[{"title":"MergeDriver.java <span style='color:#111;'> 2.03KB </span>","children":null,"spread":false},{"title":"HashPartitioner.java <span style='color:#111;'> 576B </span>","children":null,"spread":false}],"spread":true},{"title":"util","children":[{"title":"MetaParser.java <span style='color:#111;'> 684B </span>","children":null,"spread":false},{"title":"TextArrayWritable.java <span style='color:#111;'> 312B </span>","children":null,"spread":false},{"title":"DocumentWritable.java <span style='color:#111;'> 3.02KB </span>","children":null,"spread":false},{"title":"OutLinksWritable.java <span style='color:#111;'> 2.41KB </span>","children":null,"spread":false},{"title":"HttpDownloader.java <span style='color:#111;'> 1.98KB </span>","children":null,"spread":false},{"title":"Parser.java <span style='color:#111;'> 1.17KB </span>","children":null,"spread":false},{"title":"Downloader.java <span style='color:#111;'> 1.28KB </span>","children":null,"spread":false}],"spread":true},{"title":"optimizerdriver","children":[{"title":"OptimizerDriver.java <span style='color:#111;'> 3.04KB </span>","children":null,"spread":false},{"title":"OptimizerPartitioner.java <span style='color:#111;'> 584B </span>","children":null,"spread":false},{"title":"OptimizerInputFormat.java <span style='color:#111;'> 688B </span>","children":null,"spread":false},{"title":"OptimizerRecordReader.java <span style='color:#111;'> 3.87KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true},{"title":".project <span style='color:#111;'> 418B </span>","children":null,"spread":false},{"title":".classpath <span style='color:#111;'> 15.73KB </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 617B </span>","children":null,"spread":false},{"title":".gitattributes <span style='color:#111;'> 378B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明