基于Python Scrapy实现的豆瓣电影数据采集爬虫系统 含数据库SQL和全部源代码

上传者: 27595745 | 上传时间: 2021-07-10 17:02:46 | 文件大小: 14KB | 文件类型: RAR
基于Python Scrapy实现的豆瓣电影数据采集爬虫系统 含数据库SQL和全部源代码 # -*- coding: utf-8 -*- """ @Author : nesta @Email : 572645517@qq.com @Software: PyCharm @project : movie @File : MovieSpider.py @Time : 2018/4/26 9:18 """ from scrapy.spiders import Spider from scrapy.http import Request from scrapy.selector import Selector from movie.items import MovieItem class MovieSpider(Spider): name = 'movie' url = u'https://movie.douban.com/top250' start_urls = [u'https://movie.douban.com/top250'] def parse(self, response): item = MovieItem() selector = Selector(response) # 解析 movies = selector.xpath('//div[@class="info"]') for movie in movies: title = movie.xpath('div[@class="hd"]/a/span/text()').extract() fullTitle = '' for each in title: fullTitle += each movieInfo = movie.xpath('div[@class="bd"]/p/text()').extract() star = movie.xpath('div[@class="bd"]/div[@class="star"]/span[@class="rating_num"]/text()').extract()[0] quote = movie.xpath('div[@class="bd"]/p/span/text()').extract() if quote: quote = quote[0] else: quote = '' item['title'] = fullTitle item['movieInfo'] = ';'.join(movieInfo).replace(' ', '').replace('\n', '') item['star'] = star[0] item['quote'] = quote yield item nextPage = selector.xpath('//span[@class="next"]/link/@href').extract() if nextPage: nextPage = nextPage[0] print(self.url + str(nextPage)) yield Request(self.url + str(nextPage), callback=self.parse)

文件下载

资源详情

[{"title":"( 26 个子文件 14KB ) 基于Python Scrapy实现的豆瓣电影数据采集爬虫系统 含数据库SQL和全部源代码","children":[{"title":"movie","children":[{"title":"movie","children":[{"title":"settings.py <span style='color:#111;'> 3.01KB </span>","children":null,"spread":false},{"title":"__init__.pyc <span style='color:#111;'> 133B </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 863B </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 3.51KB </span>","children":null,"spread":false},{"title":"main.py <span style='color:#111;'> 246B </span>","children":null,"spread":false},{"title":"items.pyc <span style='color:#111;'> 501B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"Movie.sql <span style='color:#111;'> 564B </span>","children":null,"spread":false},{"title":"items.py <span style='color:#111;'> 378B </span>","children":null,"spread":false},{"title":"spiders","children":[{"title":"MovieSpider.py <span style='color:#111;'> 1.43KB </span>","children":null,"spread":false},{"title":"__init__.pyc <span style='color:#111;'> 141B </span>","children":null,"spread":false},{"title":"MovieSpider.pyc <span style='color:#111;'> 1.88KB </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false}],"spread":true},{"title":"settings.pyc <span style='color:#111;'> 276B </span>","children":null,"spread":false}],"spread":false},{"title":"scrapy.cfg <span style='color:#111;'> 253B </span>","children":null,"spread":false},{"title":".idea","children":[{"title":"misc.xml <span style='color:#111;'> 1.44KB </span>","children":null,"spread":false},{"title":"movie.iml <span style='color:#111;'> 284B </span>","children":null,"spread":false},{"title":"dataSources.local.xml <span style='color:#111;'> 423B </span>","children":null,"spread":false},{"title":"workspace.xml <span style='color:#111;'> 26.51KB </span>","children":null,"spread":false},{"title":"dataSources.xml <span style='color:#111;'> 853B </span>","children":null,"spread":false},{"title":"dictionaries","children":[{"title":"nesta.xml <span style='color:#111;'> 86B </span>","children":null,"spread":false}],"spread":true},{"title":".name <span style='color:#111;'> 5B </span>","children":null,"spread":false},{"title":"dataSources.ids <span style='color:#111;'> 660B </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 262B </span>","children":null,"spread":false},{"title":"sqldialects.xml <span style='color:#111;'> 198B </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 164B </span>","children":null,"spread":false}],"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明