C语言爬虫加链接提取去重加三叉树id搜索

上传者: lncer7 | 上传时间: 2021-08-09 09:35:07 | 文件大小: 67KB | 文件类型: ZIP
C语言并行爬虫(epoll),爬取服务器的16W个有效网页,通过爬取页面源代码进行确定性自动机匹配和布隆过滤器去重,对链接编号并写入url.txt文件,并通过中间文件和三叉树去除掉状态码非200的链接关系,因为匹配出来的链接不全都在服务器中,最后将正确的链接关系继续写入url.txt

文件下载

资源详情

[{"title":"( 55 个子文件 67KB ) C语言爬虫加链接提取去重加三叉树id搜索","children":[{"title":"crawler_parallel","children":[{"title":"bloomfilter.c <span style='color:#111;'> 3.88KB </span>","children":null,"spread":false},{"title":"crawler.c <span style='color:#111;'> 4.08KB </span>","children":null,"spread":false},{"title":"bloomfilter.h <span style='color:#111;'> 1.39KB </span>","children":null,"spread":false},{"title":"crawler <span style='color:#111;'> 22.91KB </span>","children":null,"spread":false},{"title":"handleURLs.c <span style='color:#111;'> 4.90KB </span>","children":null,"spread":false},{"title":".git","children":[{"title":"lfs","children":[{"title":"objects","children":[{"title":"logs","children":null,"spread":false}],"spread":true},{"title":"tmp","children":[{"title":"objects","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"index <span style='color:#111;'> 1.03KB </span>","children":null,"spread":false},{"title":"hooks","children":[{"title":"pre-push.sample <span style='color:#111;'> 1.32KB </span>","children":null,"spread":false},{"title":"prepare-commit-msg.sample <span style='color:#111;'> 1.21KB </span>","children":null,"spread":false},{"title":"applypatch-msg.sample <span style='color:#111;'> 478B </span>","children":null,"spread":false},{"title":"pre-commit.sample <span style='color:#111;'> 1.60KB </span>","children":null,"spread":false},{"title":"pre-receive.sample <span style='color:#111;'> 544B </span>","children":null,"spread":false},{"title":"pre-applypatch.sample <span style='color:#111;'> 424B </span>","children":null,"spread":false},{"title":"commit-msg.sample <span style='color:#111;'> 896B </span>","children":null,"spread":false},{"title":"pre-rebase.sample <span style='color:#111;'> 4.78KB </span>","children":null,"spread":false},{"title":"update.sample <span style='color:#111;'> 3.53KB </span>","children":null,"spread":false},{"title":"post-update.sample <span style='color:#111;'> 189B </span>","children":null,"spread":false}],"spread":true},{"title":"config <span style='color:#111;'> 309B </span>","children":null,"spread":false},{"title":"description <span style='color:#111;'> 314B </span>","children":null,"spread":false},{"title":"refs","children":[{"title":"tags","children":null,"spread":false},{"title":"heads","children":[{"title":"master <span style='color:#111;'> 41B </span>","children":null,"spread":false}],"spread":false},{"title":"remotes","children":[{"title":"origin","children":[{"title":"master <span style='color:#111;'> 41B </span>","children":null,"spread":false}],"spread":false}],"spread":false}],"spread":true},{"title":"logs","children":[{"title":"refs","children":[{"title":"heads","children":[{"title":"master <span style='color:#111;'> 436B </span>","children":null,"spread":false}],"spread":false},{"title":"remotes","children":[{"title":"origin","children":[{"title":"master <span style='color:#111;'> 136B </span>","children":null,"spread":false}],"spread":false}],"spread":false}],"spread":false},{"title":"HEAD <span style='color:#111;'> 436B </span>","children":null,"spread":false}],"spread":true},{"title":"objects","children":[{"title":"91","children":[{"title":"7f7a2d5c2b4a991e4fc7b839deb2e57ac0c8bc <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false}],"spread":false},{"title":"65","children":[{"title":"757167480ac7fafed2fb6e52ed3107699dea55 <span style='color:#111;'> 2.09KB </span>","children":null,"spread":false}],"spread":false},{"title":"eb","children":[{"title":"a1110b5794582b53554bb1e4224b860d4e173f <span style='color:#111;'> 75B </span>","children":null,"spread":false}],"spread":false},{"title":"6c","children":[{"title":"e7d4a7631d7f046156d96b1808838364dcb227 <span style='color:#111;'> 1.63KB </span>","children":null,"spread":false}],"spread":false},{"title":"46","children":[{"title":"6472fd67bfb4685cbe0f3ad9520350654d5759 <span style='color:#111;'> 431B </span>","children":null,"spread":false}],"spread":false},{"title":"8f","children":[{"title":"93d234174161bda1bdeb29ad7dfa95252221ba <span style='color:#111;'> 9.06KB </span>","children":null,"spread":false}],"spread":false},{"title":"5d","children":[{"title":"a6746c9ef45f68c5686e90dacd4744671eb51e <span style='color:#111;'> 122B </span>","children":null,"spread":false}],"spread":false},{"title":"6d","children":[{"title":"24a255ed554f264262684526931e8e60a52cd5 <span style='color:#111;'> 35B </span>","children":null,"spread":false}],"spread":false},{"title":"21","children":[{"title":"03d67d73b113ff60337a12997f439702efe547 <span style='color:#111;'> 1.02KB </span>","children":null,"spread":false}],"spread":false},{"title":"info","children":null,"spread":false},{"title":"95","children":[{"title":"9851a9948daea97a85ada33f27c1bf7b872d3e <span style='color:#111;'> 1.16KB </span>","children":null,"spread":false}],"spread":false},{"title":"a1","children":[{"title":"5c3ed22d6da5500bf91b917f1edbe0489abc05 <span style='color:#111;'> 432B </span>","children":null,"spread":false}],"spread":false},{"title":"pack","children":null,"spread":false},{"title":"df","children":[{"title":"e70ab9887e8cb1e1391e75acf6990e4960a7e6 <span style='color:#111;'> 555B </span>","children":null,"spread":false}],"spread":false},{"title":"c6","children":[{"title":"127b38c1aa25968a88db3940604d41529e4cf5 <span style='color:#111;'> 297B </span>","children":null,"spread":false}],"spread":false},{"title":"9b","children":[{"title":"3cc000aabbae0907200968fbedd5aa5c0284bb <span style='color:#111;'> 95B </span>","children":null,"spread":false}],"spread":false},{"title":"2c","children":[{"title":"87a6208b281d337373049c0ae6c4537e96456a <span style='color:#111;'> 122B </span>","children":null,"spread":false}],"spread":false},{"title":"45","children":[{"title":"b94f39ad5223952cbd13df0178bdffbe622204 <span style='color:#111;'> 177B </span>","children":null,"spread":false}],"spread":false},{"title":"01","children":[{"title":"eaf4bc7651a53e4f31a98a822f5ddf252c83d3 <span style='color:#111;'> 803B </span>","children":null,"spread":false}],"spread":false},{"title":"e3","children":[{"title":"d8ccc1eb2b6db4413ea58b4d711c06083549b7 <span style='color:#111;'> 106B </span>","children":null,"spread":false}],"spread":false},{"title":"30","children":[{"title":"f1933cf6601fce2d0ae2463c8a65cf1e90d04b <span style='color:#111;'> 1.60KB </span>","children":null,"spread":false}],"spread":false},{"title":"b8","children":[{"title":"ffb13758e0339532fb770449a7bd37693654b2 <span style='color:#111;'> 144B </span>","children":null,"spread":false}],"spread":false}],"spread":false},{"title":"info","children":[{"title":"exclude <span style='color:#111;'> 240B </span>","children":null,"spread":false}],"spread":false},{"title":"COMMIT_EDITMSG <span style='color:#111;'> 19B </span>","children":null,"spread":false},{"title":"HEAD <span style='color:#111;'> 23B </span>","children":null,"spread":false},{"title":"FETCH_HEAD <span style='color:#111;'> 104B </span>","children":null,"spread":false}],"spread":false},{"title":"common.h <span style='color:#111;'> 1.94KB </span>","children":null,"spread":false},{"title":"DFA.c <span style='color:#111;'> 5.26KB </span>","children":null,"spread":false},{"title":"queue.c <span style='color:#111;'> 1023B </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 430B </span>","children":null,"spread":false},{"title":"Makefile <span style='color:#111;'> 202B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 81B </span>","children":null,"spread":false},{"title":".gitattributes <span style='color:#111;'> 65B </span>","children":null,"spread":false},{"title":"ternaryTree.c <span style='color:#111;'> 3.52KB </span>","children":null,"spread":false}],"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明