leetcode题库-crawler_1point3:一亩三分地爬虫,crawlerfor1point3acres

上传者: 38661939 | 上传时间: 2022-05-01 10:39:39 | 文件大小: 25KB | 文件类型: ZIP
leetcode题库 该项目包含两个独立的子项目: crawler_1point3: 统计一亩三分地帖子数据,可以查看近期哪些公司热度比较高 crawler_leetcode(WIP): 统计 LeetCode 面经数据。 简介 目前支持: 一亩三分地“”,可统计公司话题热度 一亩三分地“” ,可统计公司招聘热度 LeetCode 爬虫只将帖子数据写入数据,可参考 crawler_web 进行网页展示,若只想本地看统计数据的话可以在 中取消 # self.create_forms_by_db() 注释,在 company_list 中添加想要看到的公司名,其会在本地创建一个 markdown 文件,统计数据将以 markdown 表格的形式展示。 由于一亩三分地的帖子是按照回复时间排序的,而 LeetCode 可以按照发帖顺序排序,所以两个爬虫在 判断是否达到上次爬过的内容 上有所不同。 一亩三分地每个帖子里面会有公司的 tag,所以提取公司比较容易,而 LeetCode 格式没那么严格,只能在标题和 tag 里提取字段,判断是否是公司名,公司名单列表存在单独的文件 里面方便修改。 P

文件下载

资源详情

[{"title":"( 29 个子文件 25KB ) leetcode题库-crawler_1point3:一亩三分地爬虫,crawlerfor1point3acres","children":[{"title":"crawler_1point3-master","children":[{"title":"Dockerfile <span style='color:#111;'> 219B </span>","children":null,"spread":false},{"title":".gitignore <span style='color:#111;'> 20B </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 4.39KB </span>","children":null,"spread":false},{"title":"run.sh <span style='color:#111;'> 144B </span>","children":null,"spread":false},{"title":".vscode","children":[{"title":"settings.json <span style='color:#111;'> 108B </span>","children":null,"spread":false}],"spread":true},{"title":"docker-compose.yml <span style='color:#111;'> 559B </span>","children":null,"spread":false},{"title":"crawler_1point3","children":[{"title":"crawler_1point3","children":[{"title":"spiders","children":[{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false},{"title":"1point3_spider.py <span style='color:#111;'> 9.11KB </span>","children":null,"spread":false}],"spread":true},{"title":"items.py <span style='color:#111;'> 604B </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 4.40KB </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 3.60KB </span>","children":null,"spread":false},{"title":"test","children":[{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"pipeline_test.py <span style='color:#111;'> 316B </span>","children":null,"spread":false}],"spread":true},{"title":"wait_for_mongo.py <span style='color:#111;'> 982B </span>","children":null,"spread":false},{"title":"settings.py <span style='color:#111;'> 3.56KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"drop_collections.py <span style='color:#111;'> 405B </span>","children":null,"spread":false}],"spread":true},{"title":"scrapy.cfg <span style='color:#111;'> 273B </span>","children":null,"spread":false}],"spread":true},{"title":"requirements.txt <span style='color:#111;'> 29B </span>","children":null,"spread":false},{"title":"crawler_leetcode","children":[{"title":"scrapy.cfg <span style='color:#111;'> 275B </span>","children":null,"spread":false},{"title":"crawler_leetcode","children":[{"title":"company_list.py <span style='color:#111;'> 1.49KB </span>","children":null,"spread":false},{"title":"spiders","children":[{"title":"leetcode_spider.py <span style='color:#111;'> 7.74KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false}],"spread":true},{"title":"items.py <span style='color:#111;'> 443B </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 1.21KB </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 3.58KB </span>","children":null,"spread":false},{"title":"settings.py <span style='color:#111;'> 3.24KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"run.py <span style='color:#111;'> 126B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明