基于Python Scrapy实现的蜂鸟数据采集爬虫系统 含IP代理(反爬)、日志处理和全部源代码等

上传者: 27595745 | 上传时间: 2021-07-10 17:02:46 | 文件大小: 14KB | 文件类型: RAR
基于Python Scrapy实现的蜂鸟数据采集爬虫系统 含代理、日志处理和全部源代码等 import scrapy from fengniao.items import FengniaoItem from scrapy.spidermiddlewares.httperror import HttpError from twisted.internet.error import TimeoutError, TCPTimedOutError, DNSLookupError, ConnectionRefusedError class FengniaoclawerSpider(scrapy.Spider): name = 'fengniaoClawer' allowed_domains = ['fengniao.com'] # 爬虫自定义设置,会覆盖 settings.py 文件中的设置 custom_settings = { 'LOG_LEVEL': 'DEBUG', # 定义log等级 'DOWNLOAD_DELAY': 0, # 下载延时 'COOKIES_ENABLED': False, # enabled by default 'DEFAULT_REQUEST_HEADERS': { # 'Host': 'www.fengniao.com', 'Referer': 'https://www.fengniao.com', }, # 管道文件,优先级按照由小到大依次进入 'ITEM_PIPELINES': { 'fengniao.pipelines.ImagePipeline': 100, 'fengniao.pipelines.FengniaoPipeline': 300, }, # 关于下载图片部分 'IMAGES_STORE': 'fengniaoPhoto', # 没有则新建 'IMAGES_EXPIRES': 90, # 图片有效期,已经存在的图片在这个时间段内不会再下载 'IMAGES_MIN_HEIGHT': 100, # 图片最小尺寸(高度),低于这个高度的图片不会下载 'IMAGES_MIN_WIDTH': 100, # 图片最小尺寸(宽度),低于这个宽度的图片不会下载 # 下载中间件,优先级按照由小到大依次进入 'DOWNLOADER_MIDDLEWARES': { 'fengniao.middlewares.ProxiesMiddleware': 400, 'fengniao.middlewares.HeadersMiddleware': 543, 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, }, 'DEPTH_PRIORITY': 1, # BFS,是以starts_url为准,局部BFS,受CONCURRENT_REQUESTS影响 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleFifoDiskQueue', 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.FifoMemoryQueue', 'REDIRECT_PRIORITY_ADJUST': 2, # Default: +2 'RETRY_PRIORITY_ADJUST': -1, # Default: -1 'RETRY_TIMES': 8, # 重试次数 # Default: 2, can also be specified per-request using max_retry_times attribute of Request.meta 'DOWNLOAD_TIMEOUT': 30, # This timeout can be set per spider using download_timeout spider attribute and per-request using download_timeout Request.meta key # 'DUPEFILTER_CLASS': "scrapy_redis.dupefilter.RFPDupeFilter", # 'SCHEDULER': "scrapy_redis.scheduler.Scheduler", # 'SCHEDULER_PERSIST': False, # Don't cleanup red

文件下载

资源详情

[{"title":"( 20 个子文件 14KB ) 基于Python Scrapy实现的蜂鸟数据采集爬虫系统 含IP代理(反爬)、日志处理和全部源代码等","children":[{"title":"fengniao","children":[{"title":"fengniao","children":[{"title":"__init__.pyc <span style='color:#111;'> 139B </span>","children":null,"spread":false},{"title":"middlewares.py <span style='color:#111;'> 4.72KB </span>","children":null,"spread":false},{"title":"settings.pyc <span style='color:#111;'> 287B </span>","children":null,"spread":false},{"title":"spiders","children":[{"title":"__init__.pyc <span style='color:#111;'> 147B </span>","children":null,"spread":false},{"title":"__init__.py <span style='color:#111;'> 161B </span>","children":null,"spread":false},{"title":"fengniaoClawer.py <span style='color:#111;'> 6.56KB </span>","children":null,"spread":false}],"spread":true},{"title":"__init__.py <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"pipelines.py <span style='color:#111;'> 3.22KB </span>","children":null,"spread":false},{"title":"main.py <span style='color:#111;'> 412B </span>","children":null,"spread":false},{"title":"dictionary.py <span style='color:#111;'> 1.91KB </span>","children":null,"spread":false},{"title":"settings.py <span style='color:#111;'> 3.29KB </span>","children":null,"spread":false},{"title":"items.py <span style='color:#111;'> 488B </span>","children":null,"spread":false}],"spread":true},{"title":".idea","children":[{"title":"misc.xml <span style='color:#111;'> 1.44KB </span>","children":null,"spread":false},{"title":"workspace.xml <span style='color:#111;'> 15.31KB </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 164B </span>","children":null,"spread":false},{"title":"dictionaries","children":[{"title":"nesta.xml <span style='color:#111;'> 86B </span>","children":null,"spread":false}],"spread":true},{"title":".name <span style='color:#111;'> 8B </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 268B </span>","children":null,"spread":false},{"title":"fengniao.iml <span style='color:#111;'> 284B </span>","children":null,"spread":false}],"spread":true},{"title":"scrapy.cfg <span style='color:#111;'> 259B </span>","children":null,"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明