基于Python Scrapy实现的网易云音乐music163数据爬取爬虫系统 含全部源代码

上传者: 27595745 | 上传时间: 2021-07-10 21:02:57 | 文件大小: 20KB | 文件类型: RAR
基于Python Scrapy实现的网易云音乐music163数据爬取爬虫系统 含全部源代码 基于Scrapy框架的网易云音乐爬虫,大致爬虫流程如下: - 以歌手页为索引页,抓取到全部歌手; - 从全部歌手页抓取到全部专辑; - 通过所有专辑抓取到所有歌曲; - 最后抓取歌曲的精彩评论。 数据保存到`Mongodb`数据库,保存歌曲的歌手,歌名,专辑,和热评的作者,赞数,以及作者头像url。 抓取评论者的头像url,是因为如果大家喜欢,可以将他做web端。 ### 运行: ``` $ scrapy crawl music ``` #!/usr/bin/python #-*-coding:utf-8-*- import time from pprint import pprint from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.http import Request from woaidu_crawler.items import WoaiduCrawlerItem from woaidu_crawler.utils.select_result import list_first_item,strip_null,deduplication,clean_url class WoaiduSpider(BaseSpider): name = "woaidu" start_urls = ( 'http://www.woaidu.org/sitemap_1.html', ) def parse(self,response): response_selector = HtmlXPathSelector(response) next_link = list_first_item(response_selector.select(u'//div[@class="k2"]/div/a[text()="下一页"]/@href').extract()) if next_link: next_link = clean_url(response.url,next_link,response.encoding) yield Request(url=next_link, callback=self.parse) for detail_link in response_selector.select(u'//div[contains(@class,"sousuolist")]/a/@href').extract(): if detail_link: detail_link = clean_url(response.url,detail_link,response.encoding) yield Request(url=detail_link, callback=self.parse_detail) def parse_detail(self, response): woaidu_item = WoaiduCrawlerItem() response_selector = HtmlXPathSelector(response) woaidu_item['book_name'] = list_first_item(response_selector.select('//div[@class="zizida"][1]/text()').extract()) woaidu_item['author'] = [list_first_item(response_selector.select('//div[@class="xiaoxiao"][1]/text()').extract())[5:].strip(),] woaidu_item['book_description'] = list_first_item(response_selector.select('//div[@class="lili"][1]/text()').extract()).strip() woaidu_item['book_covor_image_url'] = list

文件下载

资源详情

[{"title":"( 23 个子文件 20KB ) 基于Python Scrapy实现的网易云音乐music163数据爬取爬虫系统 含全部源代码","children":[{"title":"music163","children":[{"title":"music163","children":[{"title":"items.pyc <span style='color:#111;'> 534B </span>","children":null,"spread":false},{"title":"main.py <span style='color:#111;'> 308B </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 1.66KB </span>","children":null,"spread":false},{"title":"pipelines.pyc <span style='color:#111;'> 1.95KB </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 902B </span>","children":null,"spread":false},{"title":"spiders","children":[{"title":"spider.py <span style='color:#111;'> 5.05KB </span>","children":null,"spread":false},{"title":"__init__.pyc <span style='color:#111;'> 147B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false},{"title":"spider.pyc <span style='color:#111;'> 5.61KB </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.pyc <span style='color:#111;'> 139B </span>","children":null,"spread":false},{"title":"items.py <span style='color:#111;'> 387B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"settings.py <span style='color:#111;'> 5.55KB </span>","children":null,"spread":false},{"title":"settings.pyc <span style='color:#111;'> 3.08KB </span>","children":null,"spread":false}],"spread":false},{"title":"scrapy.cfg <span style='color:#111;'> 260B </span>","children":null,"spread":false},{"title":".idea","children":[{"title":"misc.xml <span style='color:#111;'> 1.44KB </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 164B </span>","children":null,"spread":false},{"title":".name <span style='color:#111;'> 8B </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 268B </span>","children":null,"spread":false},{"title":"dictionaries","children":[{"title":"nesta.xml <span style='color:#111;'> 86B </span>","children":null,"spread":false}],"spread":true},{"title":"workspace.xml <span style='color:#111;'> 24.23KB </span>","children":null,"spread":false},{"title":"music163.iml <span style='color:#111;'> 284B </span>","children":null,"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 591B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

  • qq_47480661 :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-20
  • weixin_50324898 :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-11-07
  • weixin_44015622 :
    attempted relative import with no known parent package出现这个了
    2021-11-02
  • weixin_44557433 :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-10-04
  • PPQQlee :
    用户下载后在一定时间内未进行评价,系统默认好评。
    2021-08-24

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明