word源码java-FBDP_hw5_wordCount:FBDP_hw5_wordCount

上传者: 38721811 | 上传时间: 2022-05-23 08:11:13 | 文件大小: 4.48MB | 文件类型: ZIP
word源码java hw5 一、作业要求 在HDFS上加载莎士比亚文集的数据文件(Shakespeare.txt),编写MapReduce程序进行词频统计,并按照单词出现次数从大到小排列,输出前100个高频单词,要求忽略大小写,忽略标点符号(punctuation.txt),忽略停词(stop-word-list),忽略数字,单词长度>=3。输出格式为"<排名>:<单词>,<次数>",例如: 1: 单词1,次数 2: 单词2,次数 ... 100: 单词100,次数 【注】作业提交方式:git仓库地址或者相关文件的zip包 git仓库目录组织建议: project name (例如wordcount) | +-- src | +-- target | +-- output | | +-- result (输出结果文件) | +-- pom.xml | +-- .gitignore(target目录下只保留jar文件,并忽略其它无关文件) | +-- readme.md (对设计思路,实验结果等给出说明,并给出提交作业运行成功的WEB页面截图。可以进一步对性能、扩展性等方面存在的不足和

文件下载

资源详情

[{"title":"( 30 个子文件 4.48MB ) word源码java-FBDP_hw5_wordCount:FBDP_hw5_wordCount","children":[{"title":"FBDP_hw5_wordCount-master","children":[{"title":"output","children":[{"title":".part-r-00000.crc <span style='color:#111;'> 20B </span>","children":null,"spread":false},{"title":"_SUCCESS <span style='color:#111;'> 0B </span>","children":null,"spread":false},{"title":"._SUCCESS.crc <span style='color:#111;'> 8B </span>","children":null,"spread":false},{"title":"part-r-00000 <span style='color:#111;'> 1.32KB </span>","children":null,"spread":false}],"spread":true},{"title":"src","children":[{"title":"main","children":[{"title":"java","children":[{"title":"WordCount.java <span style='color:#111;'> 10.81KB </span>","children":null,"spread":false}],"spread":true}],"spread":true}],"spread":true},{"title":"pom.xml <span style='color:#111;'> 3.74KB </span>","children":null,"spread":false},{"title":"README.md <span style='color:#111;'> 5.04KB </span>","children":null,"spread":false},{"title":"target","children":[{"title":"WordCount-2.1.jar <span style='color:#111;'> 12.14KB </span>","children":null,"spread":false},{"title":"WordCount-2.7.4.jar <span style='color:#111;'> 12.14KB </span>","children":null,"spread":false},{"title":"WordCount-2.0.jar <span style='color:#111;'> 12.14KB </span>","children":null,"spread":false}],"spread":true},{"title":"stop-word-list.txt <span style='color:#111;'> 1.87KB </span>","children":null,"spread":false},{"title":"input","children":[{"title":"Shakespeare.txt <span style='color:#111;'> 9.32MB </span>","children":null,"spread":false}],"spread":true},{"title":"punctuation.txt <span style='color:#111;'> 105B </span>","children":null,"spread":false},{"title":"img","children":[{"title":"mapper使用InverseMapper.jpg <span style='color:#111;'> 36.68KB </span>","children":null,"spread":false},{"title":"忽略停词.jpg <span style='color:#111;'> 43.45KB </span>","children":null,"spread":false},{"title":"自定义降序比较类.jpg <span style='color:#111;'> 42.48KB </span>","children":null,"spread":false},{"title":"按指定格式输出.jpg <span style='color:#111;'> 81.30KB </span>","children":null,"spread":false},{"title":"伪分布式输出结束后的web页面.jpg <span style='color:#111;'> 120.23KB </span>","children":null,"spread":false},{"title":"bdkit的hdfs_web界面.jpg <span style='color:#111;'> 189.91KB </span>","children":null,"spread":false},{"title":"job1的setClass们.jpg <span style='color:#111;'> 57.23KB </span>","children":null,"spread":false},{"title":"命令行参数-skip.jpg <span style='color:#111;'> 64.54KB </span>","children":null,"spread":false},{"title":"取排名前一百的单词.jpg <span style='color:#111;'> 44.88KB </span>","children":null,"spread":false},{"title":"忽略大小写.jpg <span style='color:#111;'> 24.68KB </span>","children":null,"spread":false},{"title":"job2的setClass们.jpg <span style='color:#111;'> 91.95KB </span>","children":null,"spread":false},{"title":"中转文件夹.jpg <span style='color:#111;'> 49.01KB </span>","children":null,"spread":false},{"title":"伪分布式输出结果.jpg <span style='color:#111;'> 50.46KB </span>","children":null,"spread":false},{"title":"忽略标点和数字.jpg <span style='color:#111;'> 13.00KB </span>","children":null,"spread":false},{"title":"单词长度大于等于3.jpg <span style='color:#111;'> 18.32KB </span>","children":null,"spread":false},{"title":"bdkit的RM_web界面.jpg <span style='color:#111;'> 237.54KB </span>","children":null,"spread":false},{"title":"启动伪分布式成功的web界面.jpg <span style='color:#111;'> 135.04KB </span>","children":null,"spread":false}],"spread":false}],"spread":true}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明