Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?, In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications., Peer under the hood of the systems you already use, and learn how to use and operate them more effectively, Make informed decisions by identifying the strengths and weaknesses of different tools, Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity, Understand the distributed systems research upon which modern databases are built, Peek behind the scenes of major online services, and learn from their architectures
2023-01-14 00:48:58 23.82MB database programming
1
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
2023-01-14 00:44:44 15.76MB 大数据 分布式 架构设计 系统设计
1
请复制该连接:https://pan.baidu.com/s/1slqjTDF ,下载.txt文档后,密码在文档中,直接粘贴即可下载,感谢支持
2022-11-13 14:39:47 55B Designing Data Intensive Application
1
完整文字版(英文),带书签目录,介绍分布式原理,非常非常好的一本书。作者:马丁·科勒普曼 ,目录如下: Part I. Foundations of Data Systems 1. Reliable, Scalable, and Maintainable Applications 3 Thinking About Data Systems 4 Reliability 6 Hardware Faults 7 Software Errors 8 Human Errors 9 How Important Is Reliability? 10 Scalability 10 Describing Load 11 Describing Performance 13 Approaches for Coping with Load 17 Maintainability 18 Operability: Making Life Easy for Operations 19 Simplicity: Managing Complexity 20 Evolvability: Making Change Easy 21 Summary 22 2. Data Models and Query Languages 27 Relational Model Versus Document Model 28 The Birth of NoSQL 29 The Object-Relational Mismatch 29 Many-to-One and Many-to-Many Relationships 33 Are Document Databases Repeating History? 36 Relational Versus Document Databases Today 38 Query Languages for Data 42 Declarative Queries on the Web 44 MapReduce Querying 46 Graph-Like Data Models 49 Property Graphs 50 The Cypher Query Language 52 Graph Queries in SQL 53 Triple-Stores and SPARQL 55 The Foundation: Datalog 60 Summary 63 3. Storage and Retrieval 69 Data Structures That Power Your Database 70 Hash Indexes 72 SSTables and LSM-Trees 76 B-Trees 79 Comparing B-Trees and LSM-Trees 83 Other Indexing Structures 85 Transaction Processing or Analytics? 90 Data Warehousing 91 Stars and Snowflakes: Schemas for Analytics 93 Column-Oriented Storage 95 Column Compression 97 Sort Order in Column Storage 99 Writing to Column-Oriented Storage 101 Aggregation: Data Cubes and Materialized Views 101 Summary 103 4. Encoding and Evolution 111 Formats for Encoding Data 112 Language-Specific Formats 113 JSON, XML, and Binary Variants 114 Thrift and Protocol Buffers 117 Avro 122 The Merits of Schemas 127 Modes of Dataflow 128 Dataflow Through Databases 129 Dataflow Through Services: REST and RPC 131 Message-Passing Dataflow 136 Summary 139 Part II. Distributed Data 5. Replication 151 Leaders and Followers 152 Synchronous Versus Asynchronous Replication 153 Setting Up New Followers 155 Handling Node Outages 156 Implementation of Replication Logs 158 Problems with Replication Lag 161 Reading Your Own Writes 162 Monotonic Reads 164 Consistent Prefix Reads 165 Solutions for Replication Lag 167 Multi-Leader Replication 168 Use Cases for Multi-Leader Replication 168 Handling Write Conflicts 171 Multi-Leader Replication Topologies 175 Leaderless Replication 177 Writing to the Database When a Node Is Down 177 Limitations of Quorum Consistency 181 Sloppy Quorums and Hinted Handoff 183 Detecting Concurrent Writes 184 Summary 192 6. Partitioning 199 Partitioning and Replication 200 Partitioning of Key-Value Data 201 Partitioning by Key Range 202 Partitioning by Hash of Key 203 Skewed Workloads and Relieving Hot Spots 205 Partitioning and Secondary Indexes 206 Partitioning Secondary Indexes by Document 206 Partitioning Secondary Indexes by Term 208 Rebalancing Partitions 209 Strategies for Rebalancing 210 Operations: Automatic or Manual Rebalancing 213 Request Routing 214 Parallel Query Execution 216 Summary 216 7. Transactions 221 The Slippery Concept of a Transaction 222 The Meaning of ACID 223 Single-Object and Multi-Object Operations 228 Weak Isolation Levels 233 Read Committed 234 Snapshot Isolation and Repeatable Read 237 Preventing Lost Updates 242 Write Skew and Phantoms 246 Serializability 251 Actual Serial Execution 252 Two-Phase Locking (2PL) 257 Serializable Snapshot Isolation (SSI) 261 Summary 266 8. The Trouble with Distributed Systems 273 Faults and Partial Failures 274 Cloud Computing and Supercomputing 275 Unreliable Networks 277 Network Faults in Practice 279 Detecting Faults 280 Timeouts and Unbounded Delays 281 Synchronous Versus Asynchronous Networks 284 Unreliable Clocks 287 Monotonic Versus Time-of-Day Clocks 288 Clock Synchronization and Accuracy 289 Relying on Synchronized Clocks 291 Process Pauses 295 Knowledge, Truth, and Lies 300 The Truth Is Defined by the Majority 300 Byzantine Faults 304 System Model and Reality 306 Summary 310 9. Consistency and Consensus 321 Consistency Guarantees 322 Linearizability 324 What Makes a System Linearizable? 325 Relying on Linearizability 330 Implementing Linearizable Systems 332 The Cost of Linearizability 335 Ordering Guarantees 339 Ordering and Causality 339 Sequence Number Ordering 343 Total Order Broadcast 348 Distributed Transactions and Consensus 352 Atomic Commit and Two-Phase Commit (2PC) 354 Distributed Transactions in Practice 360 Fault-Tolerant Consensus 364 Membership and Coordination Services 370 Summary 373 Part III. Derived Data 10. Batch Processing 389 Batch Processing with Unix Tools 391 Simple Log Analysis 391 The Unix Philosophy 394 MapReduce and Distributed Filesystems 397 MapReduce Job Execution 399 Reduce-Side Joins and Grouping 403 Map-Side Joins 408 The Output of Batch Workflows 411 Comparing Hadoop to Distributed Databases 414 Beyond MapReduce 419 Materialization of Intermediate State 419 Graphs and Iterative Processing 424 High-Level APIs and Languages 426 Summary 429 11. Stream Processing 439 Transmitting Event Streams 440 Messaging Systems 441 Partitioned Logs 446 Databases and Streams 451 Keeping Systems in Sync 452 Change Data Capture 454 Event Sourcing 457 State, Streams, and Immutability 459 Processing Streams 464 Uses of Stream Processing 465 Reasoning About Time 468 Stream Joins 472 Fault Tolerance 476 Summary 479 12. The Future of Data Systems 489 Data Integration 490 Combining Specialized Tools by Deriving Data 490 Batch and Stream Processing 494 Unbundling Databases 499 Composing Data Storage Technologies 499 Designing Applications Around Dataflow 504 Observing Derived State 509 Aiming for Correctness 515 The End-to-End Argument for Databases 516 Enforcing Constraints 521 Timeliness and Integrity 524 Trust, but Verify 528 Doing the Right Thing 533 Predictive Analytics 533 Privacy and Tracking 536 Summary 543 Glossary 553 Index 559
2022-06-09 10:02:14 21.55MB 分布式 大数据 技术理念
1
《》Designing Data Intensive Applications》epub版本
2022-02-28 06:31:36 6.2MB Designing Data Intensive Applications
1
这本免费书籍重点介绍MapReduce算法设计,重点介绍自然语言处理,信息检索和机器学习中常见的文本处理算法。
2021-10-05 10:44:21 1.73MB 专题
1
Designing Data-Intensive Applications。英文原版,完整高清,带书签。本书概括和总结了数据系统,分布式或非分布式环境下面临的本质问题,和解决方案的分类。
2021-08-28 23:26:58 23.26MB Data Intensive
1
KG Refinement by Knowledge Intensive Crowdsourcing-林欣.pdf
2021-08-20 09:19:55 1.55MB 知识图谱
Designing Data-Intensive Applications 英文原版 pdf
2021-07-08 18:43:15 20.67MB Designing Da
1
包括优化并行数值计算的很多方法,比如负载均衡、内存访问等。还有些例子。
2021-06-26 14:13:32 7.92MB 并行 算法优化
1