基于Python Scrapy实现的百思不得姐段子的数据采集爬虫系统 含全部源代码

上传者: 27595745 | 上传时间: 2021-07-10 17:02:47 | 文件大小: 13KB | 文件类型: RAR
基于Python Scrapy实现的百思不得姐段子的数据采集爬虫系统 含全部源代码 import scrapy from budejie.items import BudejieItem class BudejieSpider(scrapy.Spider): """百思不得姐段子的爬虫""" name = 'budejie' start_urls = ['http://www.budejie.com/text/'] total_page = 50 def parse(self, response): current_page = int(response.css(u'a.z-crt::text').extract_first()) print u'current page: {}'.format(current_page) lies = response.css(u'div.j-r-list >ul >li') for li in lies: username = li.css(u'a.u-user-name::text').extract_first() user_url = li.css(u'div.u-txt a::attr(href)').extract_first() content = u'\n'.join(li.css(u'div.j-r-list-c-desc a::text').extract()) content_url = li.css(u'div.j-r-list-c-desc a::attr(href)').extract_first() yield BudejieItem( username=username, content=content, user_url=user_url, content_url=content_url, ) if current_page < self.total_page: next_page_url = self.start_urls[0] + '{}'.format(current_page + 1) yield scrapy.Request(next_page_url)

文件下载

资源详情

[{"title":"( 22 个子文件 13KB ) 基于Python Scrapy实现的百思不得姐段子的数据采集爬虫系统 含全部源代码","children":[{"title":"budejie","children":[{"title":"budejie","children":[{"title":"__init__.pyc <span style='color:#111;'> 137B </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 3.15KB </span>","children":null,"spread":false},{"title":"settings.pyc <span style='color:#111;'> 408B </span>","children":null,"spread":false},{"title":"spiders","children":[{"title":"__init__.pyc <span style='color:#111;'> 145B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false},{"title":"budejieSpider.pyc <span style='color:#111;'> 1.65KB </span>","children":null,"spread":false},{"title":"budejieSpider.py <span style='color:#111;'> 1.15KB </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 2.14KB </span>","children":null,"spread":false},{"title":"items.pyc <span style='color:#111;'> 487B </span>","children":null,"spread":false},{"title":"pipelines.pyc <span style='color:#111;'> 4.14KB </span>","children":null,"spread":false},{"title":"main.py <span style='color:#111;'> 251B </span>","children":null,"spread":false},{"title":"settings.py <span style='color:#111;'> 245B </span>","children":null,"spread":false},{"title":"items.py <span style='color:#111;'> 311B </span>","children":null,"spread":false}],"spread":false},{"title":".idea","children":[{"title":"misc.xml <span style='color:#111;'> 1.44KB </span>","children":null,"spread":false},{"title":"budejie.iml <span style='color:#111;'> 284B </span>","children":null,"spread":false},{"title":"workspace.xml <span style='color:#111;'> 32.49KB </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 164B </span>","children":null,"spread":false},{"title":"dictionaries","children":[{"title":"nesta.xml <span style='color:#111;'> 86B </span>","children":null,"spread":false}],"spread":true},{"title":".name <span style='color:#111;'> 7B </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 266B </span>","children":null,"spread":false}],"spread":true},{"title":"scrapy.cfg <span style='color:#111;'> 257B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明