基于Python + Flask + PyEcharts + Bootstrap实现淄博酒店数据分析可视化平台 本项目名为“淄博酒店统计与分析”,主要是利用网络爬虫爬取美团网站的酒店数据,利用Echarts工具进行统计分析,筛选出用户期望的酒店数据,对酒店的选择进行策略支持。 不仅仅是淄博这一个城市,可以修改成任何一个城市,修改成任何一个城市的任何行业的可视化分析,只需要修改爬虫代码即可。 并且,本项目有明确的的前后端,有数据的爬取,数据的处理和代入数据库,FLASKweb展示到页面,分页等很多的功能。 通过模板(一些其他资源)进行修改的,非常的实用,功能齐全等多个好处
2021-07-13 12:18:02 20.05MB 大数据 可视化分析 flask mysql
原创的基于pyserial和pygame库的蓝牙串口遥控服务端。需要使用到游戏手柄(也可自行修改)。
2021-07-13 11:45:20 2KB 遥控 串口 蓝牙 pyserial
1
简单的小程序,需要用jupyter notebook打开,输入各关节的转轴坐标,旋转角度,初始端点姿态即可得到端点姿态,内有示例
2021-07-11 20:03:49 105KB 机器人 python 旋量 指数积
1
基于Python Scrapy实现的网易云音乐music163数据爬取爬虫系统 含全部源代码 基于Scrapy框架的网易云音乐爬虫,大致爬虫流程如下: - 以歌手页为索引页,抓取到全部歌手; - 从全部歌手页抓取到全部专辑; - 通过所有专辑抓取到所有歌曲; - 最后抓取歌曲的精彩评论。 数据保存到`Mongodb`数据库,保存歌曲的歌手,歌名,专辑,和热评的作者,赞数,以及作者头像url。 抓取评论者的头像url,是因为如果大家喜欢,可以将他做web端。 ### 运行: ``` $ scrapy crawl music ``` #!/usr/bin/python #-*-coding:utf-8-*- import time from pprint import pprint from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.http import Request from woaidu_crawler.items import WoaiduCrawlerItem from woaidu_crawler.utils.select_result import list_first_item,strip_null,deduplication,clean_url class WoaiduSpider(BaseSpider): name = "woaidu" start_urls = ( 'http://www.woaidu.org/sitemap_1.html', ) def parse(self,response): response_selector = HtmlXPathSelector(response) next_link = list_first_item(response_selector.select(u'//div[@class="k2"]/div/a[text()="下一页"]/@href').extract()) if next_link: next_link = clean_url(response.url,next_link,response.encoding) yield Request(url=next_link, callback=self.parse) for detail_link in response_selector.select(u'//div[contains(@class,"sousuolist")]/a/@href').extract(): if detail_link: detail_link = clean_url(response.url,detail_link,response.encoding) yield Request(url=detail_link, callback=self.parse_detail) def parse_detail(self, response): woaidu_item = WoaiduCrawlerItem() response_selector = HtmlXPathSelector(response) woaidu_item['book_name'] = list_first_item(response_selector.select('//div[@class="zizida"][1]/text()').extract()) woaidu_item['author'] = [list_first_item(response_selector.select('//div[@class="xiaoxiao"][1]/text()').extract())[5:].strip(),] woaidu_item['book_description'] = list_first_item(response_selector.select('//div[@class="lili"][1]/text()').extract()).strip() woaidu_item['book_covor_image_url'] = list
2021-07-10 21:02:57 20KB python scrapy 数据爬虫 网易云音乐
基于Python语言的网络数据挖掘》实验指导书 1 一、 实验教学目的和要求 2 二、 Python开发环境简介 2 三、 实验项目名称及目的要求 3 1.1 实验项目1 Python语言的基本语法及简单应用 3 1.2 实验项目2 使用Python读写Excel数据 5 1.3 实验项目3 使用Python实现网络爬虫算法 8 1.4 实验项目4 使用Python获取社交网络数据 12 1.5 实验项目5 使用Python统计分析社交网络数据 14 1.6 实验项目6 使用Python大批量获取网络图片数据 17 1.7 实验项目7 使用Python处理图片尺寸和角度 19 1.8 实验项目8 使用Python处理图片亮度、对比度和饱和度 21
2021-07-10 21:02:49 695KB Python 数据挖掘 实验 教程
基于Python Scrapy实现的拉勾网全站职位数据采集 爬虫系统 含数据库处理 # -*- coding: utf-8 -*- # Define here the models for your scraped items # # See documentation in: # http://doc.scrapy.org/en/latest/topics/items.html import scrapy,re from scrapy.loader import ItemLoader from scrapy.loader.processors import MapCompose, TakeFirst from w3lib.html import remove_tags def extract_num(text): #从字符串中提取出数字 match_re = re.match(".*?(\d+).*", text) if match_re: nums = int(match_re.group(1)) else: nums = 0 return nums def replace_splash(value): '''去除/''' return value.replace("/", "") def handle_strip(value): '''空格''' return value.strip() def handle_jobaddr(value): '''去查看地图''' addr_list = value.split("\n") addr_list = [item.strip() for item in addr_list if item.strip() != "查看地图"] return "".join(addr_list) class LagouJobItemLoader(ItemLoader): #自定义itemloader default_output_processor = TakeFirst() class LagouJobItem(scrapy.Item): #拉勾网职位 title = scrapy.Field() url = scrapy.Field() salary = scrapy.Field() job_city = scrapy.Field( input_processor=MapCompose(replace_splash), ) work_years = scrapy.Field( input_processor=MapCompose(replace_splash), ) degree_need = scrapy.Field( input_processor=MapCompose(replace_splash), ) job_type = scrapy.Field() publish_time = scrapy.Field() job_advantage = scrapy.Field() job_desc = scrapy.Field( input_processor=MapCompose(handle_strip), ) job_addr = scrapy.Field( input_processor=MapCompose(remove_tags, handle_jobaddr), ) company_name = scrapy.Field( input_processor=MapCompose(handle_strip), ) company_url = scrapy.Field() crawl_time = scrapy.Field() crawl_update_time = scrapy.Field() def get_insert_sql(self): insert_sql = """ insert into lagou_job(title, url, salary, job_city, work_years, degree_need, job_type, publish_time
2021-07-10 17:02:48 7KB python scapy 爬虫 拉勾
基于Python Scrapy实现的拉勾网全站职位数据采集 爬虫系统 含数据库处理和全部源代码 # -*- coding: utf-8 -*- import scrapy from meiju100.items import Meiju100Item class MeijuSpider(scrapy.Spider): name = "meiju" allowed_domains = ["meijutt.com"] start_urls = ['http://www.meijutt.com/new100.html'] def parse(self, response): items = [] subSelector = response.xpath('//ul[@class="top-list fn-clear"]/li') for sub in subSelector: item = Meiju100Item() item['storyName'] = sub.xpath('./h5/a/text()').extract() item['storyState'] = sub.xpath('./span[1]/font/text()').extract() if item['storyState']: pass else: item['storyState'] = sub.xpath('./span[1]/text()').extract() item['tvStation'] = sub.xpath('./span[2]/text()').extract() if item['tvStation']: pass else: item['tvStation'] = [u'未知'] item['updateTime'] = sub.xpath('./div[2]/text()').extract() if item['updateTime']: pass else: item['updateTime'] = sub.xpath('./div[2]/font/text()').extract() items.append(item) return items
2021-07-10 17:02:48 14KB Python scrapy 爬虫 数据采集
基于Python Scrapy实现的百思不得姐段子的数据采集爬虫系统 含全部源代码 import scrapy from budejie.items import BudejieItem class BudejieSpider(scrapy.Spider): """百思不得姐段子的爬虫""" name = 'budejie' start_urls = ['http://www.budejie.com/text/'] total_page = 50 def parse(self, response): current_page = int(response.css(u'a.z-crt::text').extract_first()) print u'current page: {}'.format(current_page) lies = response.css(u'div.j-r-list >ul >li') for li in lies: username = li.css(u'a.u-user-name::text').extract_first() user_url = li.css(u'div.u-txt a::attr(href)').extract_first() content = u'\n'.join(li.css(u'div.j-r-list-c-desc a::text').extract()) content_url = li.css(u'div.j-r-list-c-desc a::attr(href)').extract_first() yield BudejieItem( username=username, content=content, user_url=user_url, content_url=content_url, ) if current_page < self.total_page: next_page_url = self.start_urls[0] + '{}'.format(current_page + 1) yield scrapy.Request(next_page_url)
2021-07-10 17:02:47 13KB python scrapy 数据采集 段子
基于Python Scrapy实现的爬取豆瓣读书9分榜单的书籍数据采集爬虫系统 含数据集和全部源代码 # -*- coding: utf-8 -*- import scrapy import re from doubanbook.items import DoubanbookItem class DbbookSpider(scrapy.Spider): name = "dbbook" # allowed_domains = ["https://www.douban.com/doulist/1264675/"] start_urls = ( 'https://www.douban.com/doulist/1264675/', ) URL = 'https://www.douban.com/doulist/1264675/?start=PAGE&sort=seq&sub_type=' def parse(self, response): # print response.body item = DoubanbookItem() selector = scrapy.Selector(response) books = selector.xpath('//div[@class="bd doulist-subject"]') for each in books: title = each.xpath('div[@class="title"]/a/text()').extract()[0] rate = each.xpath('div[@class="rating"]/span[@class="rating_nums"]/text()').extract()[0] author = re.search('(.*?)
2021-07-10 17:02:47 19KB python scrapy 爬虫 数据采集
基于Python Scrapy实现的蜂鸟数据采集爬虫系统 含代理、日志处理和全部源代码等 import scrapy from fengniao.items import FengniaoItem from scrapy.spidermiddlewares.httperror import HttpError from twisted.internet.error import TimeoutError, TCPTimedOutError, DNSLookupError, ConnectionRefusedError class FengniaoclawerSpider(scrapy.Spider): name = 'fengniaoClawer' allowed_domains = ['fengniao.com'] # 爬虫自定义设置,会覆盖 settings.py 文件中的设置 custom_settings = { 'LOG_LEVEL': 'DEBUG', # 定义log等级 'DOWNLOAD_DELAY': 0, # 下载延时 'COOKIES_ENABLED': False, # enabled by default 'DEFAULT_REQUEST_HEADERS': { # 'Host': 'www.fengniao.com', 'Referer': 'https://www.fengniao.com', }, # 管道文件,优先级按照由小到大依次进入 'ITEM_PIPELINES': { 'fengniao.pipelines.ImagePipeline': 100, 'fengniao.pipelines.FengniaoPipeline': 300, }, # 关于下载图片部分 'IMAGES_STORE': 'fengniaoPhoto', # 没有则新建 'IMAGES_EXPIRES': 90, # 图片有效期,已经存在的图片在这个时间段内不会再下载 'IMAGES_MIN_HEIGHT': 100, # 图片最小尺寸(高度),低于这个高度的图片不会下载 'IMAGES_MIN_WIDTH': 100, # 图片最小尺寸(宽度),低于这个宽度的图片不会下载 # 下载中间件,优先级按照由小到大依次进入 'DOWNLOADER_MIDDLEWARES': { 'fengniao.middlewares.ProxiesMiddleware': 400, 'fengniao.middlewares.HeadersMiddleware': 543, 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None, }, 'DEPTH_PRIORITY': 1, # BFS,是以starts_url为准,局部BFS,受CONCURRENT_REQUESTS影响 'SCHEDULER_DISK_QUEUE': 'scrapy.squeues.PickleFifoDiskQueue', 'SCHEDULER_MEMORY_QUEUE': 'scrapy.squeues.FifoMemoryQueue', 'REDIRECT_PRIORITY_ADJUST': 2, # Default: +2 'RETRY_PRIORITY_ADJUST': -1, # Default: -1 'RETRY_TIMES': 8, # 重试次数 # Default: 2, can also be specified per-request using max_retry_times attribute of Request.meta 'DOWNLOAD_TIMEOUT': 30, # This timeout can be set per spider using download_timeout spider attribute and per-request using download_timeout Request.meta key # 'DUPEFILTER_CLASS': "scrapy_redis.dupefilter.RFPDupeFilter", # 'SCHEDULER': "scrapy_redis.scheduler.Scheduler", # 'SCHEDULER_PERSIST': False, # Don't cleanup red
2021-07-10 17:02:46 14KB python scrapy 爬虫 数据采集