SinaWeiboCrawler:新浪微博主题爬虫

上传者: 42174176 | 上传时间: 2024-04-22 22:49:14 | 文件大小: 185KB | 文件类型: ZIP
项目说明 爬取新浪微博用户数据,为用户画像、情感分析和关系建模等提供结构化数据。 项目依赖的第三方库 HTTPClient Jsoup :解析HTML fastjson 程序核心逻辑: 在 useVersion2014/WeiboCrawler3.main() 中,WeiboCrawler3的实例对象crawler调用crawl()爬取原始数据后存在文件里,剩余代码再解析磁盘上的文件进行抽取和转换得到最后的数据。 crawl()是执行爬取动作的具体函数 String html = crawl.getHTML(url) //根据url获取网址 crawler.isVerification(html) //判断是否需要输入验证码 如果连接超时重新连接 新浪微博模拟登录逻辑 Sina.main() Sina.login(username,passwprd) preLogin(encodeAcco

文件下载

资源详情

[{"title":"( 51 个子文件 185KB ) SinaWeiboCrawler:新浪微博主题爬虫","children":[{"title":"SinaWeiboCrawler-master","children":[{"title":".project <span style='color:#111;'> 374B </span>","children":null,"spread":false},{"title":"src","children":[{"title":"cn","children":[{"title":"edu","children":[{"title":"xmu","children":[{"title":"cld","children":[{"title":"WeiboCrawler2","children":[{"title":"useVersion2014","children":[{"title":"Sina.java <span style='color:#111;'> 20.64KB </span>","children":null,"spread":false},{"title":"ReParser.java <span style='color:#111;'> 3.25KB </span>","children":null,"spread":false},{"title":"WeiboCrawler3.java <span style='color:#111;'> 23.19KB </span>","children":null,"spread":false},{"title":"WeiBoUser.java <span style='color:#111;'> 799B </span>","children":null,"spread":false},{"title":"SinaSSOEncoder.java <span style='color:#111;'> 16.90KB </span>","children":null,"spread":false}],"spread":true},{"title":"ProxyIP.java <span style='color:#111;'> 13.72KB </span>","children":null,"spread":false},{"title":"Crawler.java <span style='color:#111;'> 7.66KB </span>","children":null,"spread":false},{"title":"JWindowsFrame.java <span style='color:#111;'> 18.36KB </span>","children":null,"spread":false},{"title":"FileOperation.java <span style='color:#111;'> 2.29KB </span>","children":null,"spread":false},{"title":"HTML.java <span style='color:#111;'> 5.39KB </span>","children":null,"spread":false},{"title":"HTMLParser.java <span style='color:#111;'> 9.71KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true}],"spread":true}],"spread":true},{"title":"git_push_all_projects_in_pwd_LOG.txt <span style='color:#111;'> 10.12KB </span>","children":null,"spread":false},{"title":"[重要][模板]爬虫项目.vsdx <span style='color:#111;'> 68.82KB </span>","children":null,"spread":false},{"title":".idea","children":[{"title":"misc.xml <span style='color:#111;'> 258B </span>","children":null,"spread":false},{"title":"workspace.xml <span style='color:#111;'> 23.30KB </span>","children":null,"spread":false},{"title":"modules.xml <span style='color:#111;'> 270B </span>","children":null,"spread":false},{"title":"vcs.xml <span style='color:#111;'> 167B </span>","children":null,"spread":false}],"spread":true},{"title":".settings","children":[{"title":"org.eclipse.jdt.core.prefs <span style='color:#111;'> 587B </span>","children":null,"spread":false},{"title":"org.eclipse.core.resources.prefs <span style='color:#111;'> 55B </span>","children":null,"spread":false}],"spread":true},{"title":"README.md <span style='color:#111;'> 2.89KB </span>","children":null,"spread":false},{"title":"WeiboCrawler2.2.iml <span style='color:#111;'> 1.39KB </span>","children":null,"spread":false},{"title":".classpath <span style='color:#111;'> 883B </span>","children":null,"spread":false},{"title":"bin","children":[{"title":"cn","children":[{"title":"edu","children":[{"title":"xmu","children":[{"title":"cld","children":[{"title":"WeiboCrawler2","children":[{"title":"useVersion2014","children":[{"title":"WeiBoUser.class <span style='color:#111;'> 1.19KB </span>","children":null,"spread":false},{"title":"WeiboCrawler3$1.class <span style='color:#111;'> 1.12KB </span>","children":null,"spread":false},{"title":"ReParser.class <span style='color:#111;'> 2.52KB </span>","children":null,"spread":false},{"title":"SinaSSOEncoder.class <span style='color:#111;'> 17.88KB </span>","children":null,"spread":false},{"title":"WeiboCrawler3.class <span style='color:#111;'> 17.05KB </span>","children":null,"spread":false},{"title":"Sina.class <span style='color:#111;'> 18.24KB </span>","children":null,"spread":false},{"title":"WeiboCrawler3$1$1.class <span style='color:#111;'> 1.11KB </span>","children":null,"spread":false}],"spread":false},{"title":"HTMLParser.class <span style='color:#111;'> 9.03KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$1.class <span style='color:#111;'> 1.87KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$10.class <span style='color:#111;'> 1.32KB </span>","children":null,"spread":false},{"title":"HTML$1.class <span style='color:#111;'> 1011B </span>","children":null,"spread":false},{"title":"JWindowsFrame$9$1.class <span style='color:#111;'> 1.37KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$3.class <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$4.class <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"HTML$1$1.class <span style='color:#111;'> 1006B </span>","children":null,"spread":false},{"title":"JWindowsFrame$2.class <span style='color:#111;'> 2.03KB </span>","children":null,"spread":false},{"title":"JWindowsFrame.class <span style='color:#111;'> 8.65KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$6.class <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"Crawler.class <span style='color:#111;'> 7.81KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$9.class <span style='color:#111;'> 3.09KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$5.class <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$7.class <span style='color:#111;'> 1.23KB </span>","children":null,"spread":false},{"title":"FileOperation.class <span style='color:#111;'> 2.99KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$8$1.class <span style='color:#111;'> 1.36KB </span>","children":null,"spread":false},{"title":"HTML.class <span style='color:#111;'> 5.65KB </span>","children":null,"spread":false},{"title":"ProxyIP.class <span style='color:#111;'> 10.86KB </span>","children":null,"spread":false},{"title":"JWindowsFrame$8.class <span style='color:#111;'> 2.07KB </span>","children":null,"spread":false},{"title":"OpenFile.class <span style='color:#111;'> 1.29KB </span>","children":null,"spread":false}],"spread":false}],"spread":false}],"spread":true}],"spread":true}],"spread":true}],"spread":true}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明