Big Data Analytics: A Hands-On Approach

上传者: 43960172 | 上传时间: 2022-01-20 16:30:49 | 文件大小: 108.43MB | 文件类型: -
Big Data Science & Analytics: A Hands-On Approach By 作者: Arshdeep Bahga – Vijay Madisetti ISBN-10 书号: 0996025537 ISBN-13 书号: 9780996025539 Edition 版本: 1 出版日期: 2016-04-15 pages 页数: (542 ) The book is organized into three main parts, comprising a total of twelve chapters. Part I provides an introduction to big data, applications of big data, and big data science and analytics patterns and architectures. A novel data science and analytics application system design methodology is proposed and its realization through use of open-source big data frameworks is described. This methodology describes big data analytics applications as realization of the proposed Alpha, Beta, Gamma and Delta models, that comprise tools and frameworks for collecting and ingesting data from various sources into the big data analytics infrastructure, distributed filesystems and non-relational (NoSQL) databases for data storage, processing frameworks for batch and real-time analytics, serving databases, web and visualization frameworks. This new methodology forms the pedagogical foundation of this book. Part II introduces the reader to various tools and frameworks for big data analytics, and the architectural and programming aspects of these frameworks as used in the proposed design methodology. We chose Python as the primary programming language for this book. Other languages, besides Python, may also be easily used within the Big Data stack described in this book. We describe tools and frameworks for Data Acquisition including Publish-subscribe messaging frameworks such as Apache Kafka and Amazon Kinesis, Source-Sink connectors such as Apache Flume, Database Connectors such as Apache Sqoop, Messaging Queues such as RabbitMQ, ZeroMQ, RestMQ, Amazon SQS and custom REST-based connectors and WebSocket-based connectors. The reader is introduced to Hadoop Distributed File System (HDFS) and HBase non-relational database. The batch analysis chapter provides an in-depth study of frameworks such as Hadoop-MapReduce, Pig, Oozie, Spark and Solr. The real-time analysis chapter focuses on Apache Storm and Spark Streaming frameworks. In the chapter on interactive querying, we describe with the help of examples, the use of frameworks and services such as Spark SQL, Hive, Amazon Redshift and Google BigQuery. The chapter on serving databases and web frameworks provide an introduction to popular relational and non-relational databases (such as MySQL, Amazon DynamoDB, Cassandra, and MongoDB) and the Django Python web framework. Part III focuses advanced topics on big data including analytics algorithms and data visualization tools. The chapter on analytics algorithms introduces the reader to machine learning algorithms for clustering, classification, regression and recommendation systems, with examples using the Spark MLlib and H2O frameworks. The chapter on data visualization describes examples of creating various types of visualizations using frameworks such as Lightning, pygal and Seaborn.

文件下载

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明