根据市场上目前比较流行的几款Hadoop产品综合分析,从部署的便捷性、功能、性能及成本等方面综合考量,推荐使用CDH与HDP。然后再根据我们具体的使用场景来进行选择,如果我们追求功能全面与部署案例参考推荐使用CDH,因为CDH目前是市场上功能最全、部署案例最多的一款产品,如果我们追求部署快捷,易上手使用推荐使用HDP,因为HDP是迄今为止100%纯开源ApacheHadoop的唯一提供商并且是第一家使用了ApacheHCatalog的元数据服务特性的提供商。并且,它们的Stinger开创性地极大地优化了Hive项目。Hortonworks为入门提供了一个非常好的,易于使用的沙盒。
2021-08-22 20:32:50 1.62MB Hbase Hive spark flink
1
12.1 Spark概述 12.2 Spark生态系统 12.3 Spark运行架构 12.3 Spark SQL 12.3 Spark的部署和应用方式
2021-08-22 09:10:38 2.79MB 大数据导论 大数据 Spark big
spark介绍-蘑菇街.pdf
2021-08-21 14:12:23 7.84MB spark
Spark部署中的关键问题解决之道--许鹏.pdf
2021-08-21 14:12:22 180KB spark
A Web Application for Interactive Data Analysis with Spark
2021-08-21 14:12:22 6.89MB spark
A Web Application for Interactive Data Analysis with Spark
2021-08-21 14:12:21 6.89MB spark
Interface Design for Spark Community by Reynold Xin.pdf
2021-08-21 13:01:56 1.23MB spark
基于Spark的支持隐私保护的聚类算法.pdf
2021-08-20 01:23:04 309KB 聚类 算法 数据结构 参考文献
sparkdemo_202108.7z
2021-08-20 01:03:19 141KB spark
1
We designed this book mainly for data scientists and data engineers looking to use Apache Spark. The two roles have slightly different needs, but in reality, most application development covers a bit of both, so we think the material will be useful in both cases. Specifically, in our minds, the data scientist workload focuses more on interactively querying data to answer questions and build statistical models, while the data engineer job focuses on writing maintainable, repeatable production applications-either to use the data scientist’s models in practice, or just to prepare data for further analysis (e.g., building a data ingest pipeline). However, we often see with Spark that these roles blur. For instance, data scientists are able to package production applications without too much hassle and data engineers use interactive analysis to understand and inspect their data to build and maintain pipelines.
2021-08-19 11:14:36 7.88MB Spark
1