大数据技术分享 多IDC的数据分布 MySQL多机房部署 共19页.pdf
2022-06-09 22:06:36 298KB 大数据 mysql
大型应用系统架构 消息中间件在大型分布式系统的应用 共20页.pptx
2022-06-09 19:04:35 465KB 系统架构
【目录】 背景 基于zk的分布式选举 切换的数据一致性保证 zk的监控 效果页面 总结
2022-06-09 17:07:09 5.25MB 数据库 zookeeper 分布式
完整文字版(英文),带书签目录,介绍分布式原理,非常非常好的一本书。作者:马丁·科勒普曼 ,目录如下: Part I. Foundations of Data Systems 1. Reliable, Scalable, and Maintainable Applications 3 Thinking About Data Systems 4 Reliability 6 Hardware Faults 7 Software Errors 8 Human Errors 9 How Important Is Reliability? 10 Scalability 10 Describing Load 11 Describing Performance 13 Approaches for Coping with Load 17 Maintainability 18 Operability: Making Life Easy for Operations 19 Simplicity: Managing Complexity 20 Evolvability: Making Change Easy 21 Summary 22 2. Data Models and Query Languages 27 Relational Model Versus Document Model 28 The Birth of NoSQL 29 The Object-Relational Mismatch 29 Many-to-One and Many-to-Many Relationships 33 Are Document Databases Repeating History? 36 Relational Versus Document Databases Today 38 Query Languages for Data 42 Declarative Queries on the Web 44 MapReduce Querying 46 Graph-Like Data Models 49 Property Graphs 50 The Cypher Query Language 52 Graph Queries in SQL 53 Triple-Stores and SPARQL 55 The Foundation: Datalog 60 Summary 63 3. Storage and Retrieval 69 Data Structures That Power Your Database 70 Hash Indexes 72 SSTables and LSM-Trees 76 B-Trees 79 Comparing B-Trees and LSM-Trees 83 Other Indexing Structures 85 Transaction Processing or Analytics? 90 Data Warehousing 91 Stars and Snowflakes: Schemas for Analytics 93 Column-Oriented Storage 95 Column Compression 97 Sort Order in Column Storage 99 Writing to Column-Oriented Storage 101 Aggregation: Data Cubes and Materialized Views 101 Summary 103 4. Encoding and Evolution 111 Formats for Encoding Data 112 Language-Specific Formats 113 JSON, XML, and Binary Variants 114 Thrift and Protocol Buffers 117 Avro 122 The Merits of Schemas 127 Modes of Dataflow 128 Dataflow Through Databases 129 Dataflow Through Services: REST and RPC 131 Message-Passing Dataflow 136 Summary 139 Part II. Distributed Data 5. Replication 151 Leaders and Followers 152 Synchronous Versus Asynchronous Replication 153 Setting Up New Followers 155 Handling Node Outages 156 Implementation of Replication Logs 158 Problems with Replication Lag 161 Reading Your Own Writes 162 Monotonic Reads 164 Consistent Prefix Reads 165 Solutions for Replication Lag 167 Multi-Leader Replication 168 Use Cases for Multi-Leader Replication 168 Handling Write Conflicts 171 Multi-Leader Replication Topologies 175 Leaderless Replication 177 Writing to the Database When a Node Is Down 177 Limitations of Quorum Consistency 181 Sloppy Quorums and Hinted Handoff 183 Detecting Concurrent Writes 184 Summary 192 6. Partitioning 199 Partitioning and Replication 200 Partitioning of Key-Value Data 201 Partitioning by Key Range 202 Partitioning by Hash of Key 203 Skewed Workloads and Relieving Hot Spots 205 Partitioning and Secondary Indexes 206 Partitioning Secondary Indexes by Document 206 Partitioning Secondary Indexes by Term 208 Rebalancing Partitions 209 Strategies for Rebalancing 210 Operations: Automatic or Manual Rebalancing 213 Request Routing 214 Parallel Query Execution 216 Summary 216 7. Transactions 221 The Slippery Concept of a Transaction 222 The Meaning of ACID 223 Single-Object and Multi-Object Operations 228 Weak Isolation Levels 233 Read Committed 234 Snapshot Isolation and Repeatable Read 237 Preventing Lost Updates 242 Write Skew and Phantoms 246 Serializability 251 Actual Serial Execution 252 Two-Phase Locking (2PL) 257 Serializable Snapshot Isolation (SSI) 261 Summary 266 8. The Trouble with Distributed Systems 273 Faults and Partial Failures 274 Cloud Computing and Supercomputing 275 Unreliable Networks 277 Network Faults in Practice 279 Detecting Faults 280 Timeouts and Unbounded Delays 281 Synchronous Versus Asynchronous Networks 284 Unreliable Clocks 287 Monotonic Versus Time-of-Day Clocks 288 Clock Synchronization and Accuracy 289 Relying on Synchronized Clocks 291 Process Pauses 295 Knowledge, Truth, and Lies 300 The Truth Is Defined by the Majority 300 Byzantine Faults 304 System Model and Reality 306 Summary 310 9. Consistency and Consensus 321 Consistency Guarantees 322 Linearizability 324 What Makes a System Linearizable? 325 Relying on Linearizability 330 Implementing Linearizable Systems 332 The Cost of Linearizability 335 Ordering Guarantees 339 Ordering and Causality 339 Sequence Number Ordering 343 Total Order Broadcast 348 Distributed Transactions and Consensus 352 Atomic Commit and Two-Phase Commit (2PC) 354 Distributed Transactions in Practice 360 Fault-Tolerant Consensus 364 Membership and Coordination Services 370 Summary 373 Part III. Derived Data 10. Batch Processing 389 Batch Processing with Unix Tools 391 Simple Log Analysis 391 The Unix Philosophy 394 MapReduce and Distributed Filesystems 397 MapReduce Job Execution 399 Reduce-Side Joins and Grouping 403 Map-Side Joins 408 The Output of Batch Workflows 411 Comparing Hadoop to Distributed Databases 414 Beyond MapReduce 419 Materialization of Intermediate State 419 Graphs and Iterative Processing 424 High-Level APIs and Languages 426 Summary 429 11. Stream Processing 439 Transmitting Event Streams 440 Messaging Systems 441 Partitioned Logs 446 Databases and Streams 451 Keeping Systems in Sync 452 Change Data Capture 454 Event Sourcing 457 State, Streams, and Immutability 459 Processing Streams 464 Uses of Stream Processing 465 Reasoning About Time 468 Stream Joins 472 Fault Tolerance 476 Summary 479 12. The Future of Data Systems 489 Data Integration 490 Combining Specialized Tools by Deriving Data 490 Batch and Stream Processing 494 Unbundling Databases 499 Composing Data Storage Technologies 499 Designing Applications Around Dataflow 504 Observing Derived State 509 Aiming for Correctness 515 The End-to-End Argument for Databases 516 Enforcing Constraints 521 Timeliness and Integrity 524 Trust, but Verify 528 Doing the Right Thing 533 Predictive Analytics 533 Privacy and Tracking 536 Summary 543 Glossary 553 Index 559
2022-06-09 10:02:14 21.55MB 分布式 大数据 技术理念
1
Twitter的分布式自增ID雪花算法snowflake (Java版)
2022-06-09 07:46:07 3KB snowflake
1
ZooKeeper是Hadoop的正式子项目,它是一个针对大型分布式系统的可靠协调系统,提供的功能包括:配置维护、名字服务、分布式同步、组服务等。ZooKeeper的目标就是封装好复杂易出错的关键服务,将简单易用的接口和性能高效、功能稳定的系统提供给用户。 那么Zookeeper能帮我们作什么事情呢?简单的例子:假设我们我们有个20个搜索引擎的服务器(每个负责总索引中的一部分的搜索任务)和一个 总服务器(负责向这20个搜索引擎的服务器发出搜索请求并合并结果集),一个备用的总服务器(负责当总服务器宕机时替换总服务器),一个web的cgi(向总服务器发出搜索请求)。搜索引擎的服务器中的15个服务器现在提供搜索服务,5个服务器正在生成索引。这20个搜索引擎的服务器经常要让正在 提供搜索服务的服务器停止提供服务开始生成索引,或生成索引的服务器已经把索引生成完成可以搜索提供服务了。使用Zookeeper可以保证总服务器自动感知有多少提供搜索引擎的服务器并向这些服务器发出搜索请求,备用的总服务器宕机时自动启用备用的总服务器,web的cgi能够自动地获知总服务器的网络地址变化。
2022-06-08 22:05:57 4.41MB zookeeper zookeeper hadoop
matlab的toolbox中也含有概率统计方面的库函数.概率方面的库函数主要有各种常见分布的分布函数、概率密度、分布率以及生成服从各种分布随机数的函数.统计方面的库函数含盖了简单随机样本下常见的参数估计(点估计、区间估计),假设检验.此外还含有大量涉及实验设计、线性回归、非线性回归等方面的库函数.以下我们主要对matlab在概率统计方面的内容做一些介绍.
2022-06-08 21:40:40 152KB 分布律 概率密度 matlab
1
1. 项目idea+maven+spring-boot+spring-cloud+spring-cloud-alibaba,依赖nacos 2.0.1, mysql,seata server 1.4.2; 2. 示例包括三个服务,订单服务,商品服务,账户服务; 3. 创建订单的同时,需要扣减商品库存,并扣减账户余额,三个操作要么同时成功,要么同时失败。 4. seata server、nacos和mysql环境参数配置,需要根据实际环境进行调整修改;
2022-06-08 21:33:22 102KB mysql 分布式 spring cloud
1
2021年DTCC大会三天嘉宾分享主题PDF材料,包含云原生数据库的开发实践、分布式数据库的应用、图数据库技术应用创新、时序数据库等多个主题。
2022-06-08 19:10:38 473.88MB 云原生 数据库 分布式 时序数据库
1
转发的 remoting 分布式开发的 管理系统 实例 绝对能运行!
2022-06-08 18:04:33 544KB c# remoting 分布式开发 图书管理系统
1