crawllagou:拉勾网爬虫 lagou spider-源码

上传者: 42097557 | 上传时间: 2021-08-26 14:54:29 | 文件大小: 1.09MB | 文件类型: ZIP
该项目已不再维护 前提说明 1.拉勾网搜索页面一般都只展示30页、每页15个职位信息,一次搜索约450条 2.拉勾网反爬虫机制更新频繁,直接解析Ajax请求容易触发反爬虫机制(提示:”msg”:”您操作太频繁,请稍后再访问”),浏览器也无法访问。 3.解析这个Ajax请求前发起GET请求获取当前会话session, 可在某程度上解决第2点, 但在频繁请求后仍会触发反爬虫机制, 被Ban IP 开发设计 1.可构建大量代理IP池, 通过不断更换代理IP的方式, 解析Ajax请求进行爬取 2.亦可使用 selenium 浏览器自动化测试框架驱动谷歌浏览器, 模拟人使用浏览器查看网页的过程获取数据 3.此程序使用第2点解决方案, 使用 selenium 模拟人操作 4.搜索页和详情页请求过快便会跳出来登录页面, 连续请求10个详情页也会弹出登录页面, 因此此程序需要登录 5.在第一次登录后将保存

文件下载

资源详情

[{"title":"( 48 个子文件 1.09MB ) crawllagou:拉勾网爬虫 lagou spider-源码","children":[{"title":"crawllagou-master","children":[{"title":"report","children":[{"title":"images","children":[{"title":"bg.jpg <span style='color:#111;'> 252.24KB </span>","children":null,"spread":false},{"title":"line(1).png <span style='color:#111;'> 3.85KB </span>","children":null,"spread":false},{"title":"head_bg.png <span style='color:#111;'> 7.70KB </span>","children":null,"spread":false}],"spread":true},{"title":"js","children":[{"title":"report_charts.js <span style='color:#111;'> 152.73KB </span>","children":null,"spread":false},{"title":"area_echarts.js <span style='color:#111;'> 7.97KB </span>","children":null,"spread":false},{"title":"echarts.min.js <span style='color:#111;'> 762.39KB </span>","children":null,"spread":false},{"title":"jquery.js <span style='color:#111;'> 82.40KB </span>","children":null,"spread":false},{"title":"china.js <span style='color:#111;'> 117.16KB </span>","children":null,"spread":false},{"title":"echarts-wordcloud.min.js <span style='color:#111;'> 124.18KB </span>","children":null,"spread":false},{"title":"snow.html <span style='color:#111;'> 9.10KB </span>","children":null,"spread":false},{"title":"index.js <span style='color:#111;'> 787B </span>","children":null,"spread":false}],"spread":true},{"title":"report.html <span style='color:#111;'> 5.19KB </span>","children":null,"spread":false},{"title":"single","children":[{"title":"city_company_scale_chart.html <span style='color:#111;'> 12.03KB </span>","children":null,"spread":false},{"title":"requirement_word_cloud.html <span style='color:#111;'> 32.34KB </span>","children":null,"spread":false},{"title":"experience_pie_chart.html <span style='color:#111;'> 4.57KB </span>","children":null,"spread":false},{"title":"character_pie_chart.html <span style='color:#111;'> 3.75KB </span>","children":null,"spread":false},{"title":"company_size_bar_chart.html <span style='color:#111;'> 4.34KB </span>","children":null,"spread":false},{"title":"salary_pie_chart.html <span style='color:#111;'> 10.47KB </span>","children":null,"spread":false},{"title":"address_map_chart.html <span style='color:#111;'> 4.54KB </span>","children":null,"spread":false},{"title":"city_experience_chart.html <span style='color:#111;'> 10.54KB </span>","children":null,"spread":false},{"title":"address_pie_chart.html <span style='color:#111;'> 4.77KB </span>","children":null,"spread":false},{"title":"eduction_pie_chart.html <span style='color:#111;'> 3.66KB </span>","children":null,"spread":false},{"title":"company_name_chart.html <span style='color:#111;'> 6.05KB </span>","children":null,"spread":false},{"title":"company_scale_bar_chart.html <span style='color:#111;'> 4.30KB </span>","children":null,"spread":false},{"title":"city_company_size_chart.html <span style='color:#111;'> 10.23KB </span>","children":null,"spread":false},{"title":"company_field_chart.html <span style='color:#111;'> 9.37KB </span>","children":null,"spread":false},{"title":"city_eduction_chart.html <span style='color:#111;'> 7.79KB </span>","children":null,"spread":false},{"title":"advantage_word_cloud.html <span style='color:#111;'> 32.51KB </span>","children":null,"spread":false}],"spread":false},{"title":"font","children":[{"title":"DS-DIGIT.TTF <span style='color:#111;'> 24.88KB </span>","children":null,"spread":false}],"spread":true},{"title":"template","children":[{"title":"template.html <span style='color:#111;'> 2.69KB </span>","children":null,"spread":false}],"spread":true},{"title":"picture","children":[{"title":"loading.gif <span style='color:#111;'> 701B </span>","children":null,"spread":false},{"title":"jt.png <span style='color:#111;'> 71.90KB </span>","children":null,"spread":false},{"title":"map.png <span style='color:#111;'> 302.10KB </span>","children":null,"spread":false},{"title":"lbx.png <span style='color:#111;'> 81.26KB </span>","children":null,"spread":false},{"title":"weather.png <span style='color:#111;'> 2.27KB </span>","children":null,"spread":false}],"spread":true},{"title":"css","children":[{"title":"comon0.css <span style='color:#111;'> 7.24KB </span>","children":null,"spread":false}],"spread":true}],"spread":true},{"title":"utils","children":[{"title":"mysql_helpers.py <span style='color:#111;'> 1.16KB </span>","children":null,"spread":false},{"title":"base_helpers.py <span style='color:#111;'> 19.02KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"mongodb_helpers.py <span style='color:#111;'> 4.05KB </span>","children":null,"spread":false}],"spread":true},{"title":"requirements.txt <span style='color:#111;'> 456B </span>","children":null,"spread":false},{"title":"visualize_data.py <span style='color:#111;'> 28.20KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 4.12KB </span>","children":null,"spread":false},{"title":"spiders.py <span style='color:#111;'> 22.92KB </span>","children":null,"spread":false},{"title":"configures","children":[{"title":"cookies_file <span style='color:#111;'> 5.93KB </span>","children":null,"spread":false},{"title":"stop_words.txt <span style='color:#111;'> 298B </span>","children":null,"spread":false},{"title":"configure.yml <span style='color:#111;'> 1.42KB </span>","children":null,"spread":false}],"spread":true},{"title":".gitignore <span style='color:#111;'> 207B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明