spark-2.4.6-bin-hadoop2.6.tgz 官网下载不了的,可以这里下载哦,csdn很稳定哦
2022-03-14 22:07:13 220.43MB spark
1
Hadoop MapReduce Mapreduce是一个分布式运算程序的编程框架,是用户开发“基于hadoop的数据分析应用”的核心框架。Mapreduce核心功能是将用户编写的业务逻辑代码和自带默认组件整合成一个完整的分布式运算程序,并发运行在一个hadoop集群上。 Why MapReduce? 1.海量数据在单机上处理因为硬件资源限制,无法胜任 2.而一旦将单机版程序扩展到集群来分布式运行,将极大增加程序的复杂度和开发难度 3.引入mapreduce框架后,开发人员可以将绝大部分工作集中在业务逻辑的开发上,而将分布式计算中的复杂性交由框架来处理 MapReduce编程规范 1.用户编写的程序分成三个部分:Mapper,Reducer,Driver(提交运行mr程序的客户端) 2.Mapper的输入数据是KV对的形式(KV的类型可自定义) 3.Mapper的输出数据是KV对的形式(K
2022-03-10 13:56:25 2.1MB Python
1
This second edition covers Hadoop 2, which at the time of writing is the current production-ready version of Hadoop. The first edition of the book covered Hadoop 0.22 (Hadoop 1 wasn’t yet out), and Hadoop 2 has turned the world upside-down and opened up the Hadoop platform to processing paradigms beyond MapReduce. YARN, the new scheduler and application manager in Hadoop 2, is complex and new to the community, which prompted me to dedicate a new chapter 2 to covering YARN basics and to discussing how MapReduce now functions as a YARN application. Parquet has also recently emerged as a new way to store data in HDFS—its columnar format can yield both space and time efficiencies in your data pipelines, and it’s quickly becoming the ubiquitous way to store data. Chapter 4 includes extensive coverage of Parquet, which includes how Parquet supports sophisticated object models such as Avro and how various Hadoop tools can use Parquet. How data is being ingested into Hadoop has also evolved since the first edition, and Kafka has emerged as the new data pipeline, which serves as the transport tier between your data producers and data consumers, where a consumer would be a system such as Camus that can pull data from Kafka into HDFS. Chapter 5, which covers moving data into and out of Hadoop, now includes coverage of Kafka and Camus.
2022-03-09 19:25:19 13.68MB hadoop hadoop2 Practice 第二版
1
这个压缩包是在windows上编译的hadoop2.7.3的bin文件夹,大家下载下来直接替换应该就可以使用。
2022-03-01 23:29:22 848KB hadoo window
1
关于hadoop-2.2.0和hadoop2.6.0的winutils.exe、hadoop.dll版本混用(易出错)
2022-03-01 23:18:29 73KB 插件
1
spark-2.2.0-bin-hadoop2.7
2022-02-24 13:52:33 67B spark
1
linux\windows中搭建spark环境使用的spark-1.3.1-bin-hadoop2.6.tgz安装包
2022-02-24 09:42:31 247.37MB spark
1
windows下编译的Hadoop2.7.4,使用方法简单直接解压到本地即可,解决在本地运行mapreduce程序连接不到hadoop客户端的问题。
2022-02-13 23:41:05 189.99MB windows hadoop2.7.4
1
hadoop2.8.5plugins-for-idea2019.1.3 压缩包中包含了Hadoop2.8.5插件和idea2019.1.3
2022-02-07 10:15:24 606.49MB hadoop2.8.5 idea plugins
1
spark的安装包,Linux下使用,需要欢迎下载,spark-3.1.2-bin-hadoop2.7.tgz
2022-01-31 18:08:18 214.05MB linux spark 运维 服务器
1