a list of papers, conference, books, mooc, Q&A and other stuffs for distributed systems
issues for more materials are welcome.
theories and some main topics of distributed systems, maily basic concepts/fault tolerence and replication/consistency and consensus algorithms/formal methods etc.
-
basic concepts/introductions
- introductions
- time and clock
- distributed censensus problem
-
fault-tolerence and replication/consistency and consensus
- fault-tolerence and replication
- Impossibility of Distributed Consensus With One Faulty Process
- Implementing fault-tolerant services using the state machine approach: a tutorial
- Remus: High Availability via Asynchronous Virtual Machine Replication
- Perspectives on the CAP Theorem
- Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services
- CAP Twelve Years Later
- consistency and consensus
- algorithms and protocols
- The part-time parliament
- Paxos Made Simple
- gbcast
- Viewstamped replication: A new primary copy method to support highly-available distributed systems
- Zab : High-performance broadcast for primary-backup systems
- In Search of an Understandable Consensus Algorithm
- The Byzantine Generals Problem
- ZooKeeper ’ s atomic broadcast protocol : Theory and practice
- Revisiting the PAXOS algorithm
- The Paxos Family of Consensus Protocols
- Multi-Paxos: An Implementation and Evaluation
- Consensus in the presence of partial synchrony
- Consensus on transaction commit
- Consistency in Distributed Storage Systems An Overview of Models, Metrics and Measurement Approaches
- Base: An Acid Alternative
- Eventually Consistent
- Flexible Paxos: Quorum intersection revisited
- engineering and systems
- Replication and Fault-Tolerance in the ISIS System
- The Chubby lock service for loosely-coupled distributed systems
- ZooKeeper: Wait-free Coordination for Internet-scale Systems
- Paxos Made Live: An Engineering Perspective
- Paxos for System Builders
- PAXOS Made Transparent
- Consensus in the Cloud: Paxos Systems Demystified
- algorithms and protocols
- fault-tolerence and replication
-
other topics
- leader election algorithms
- p2p
- formal methods
- Design and implementation of the Sun network filesystem
- The Google file system
- The Hadoop distributed file system
- Ceph : A Scalable , High-Performance Distributed File System
- Finding a needle in Haystack: Facebook's photo storage
- Bigtable
- Hadoop-HBase for large-scale data
- Dynamo
- Spanner: Google’s Globally-Distributed Database
- Dryad : Distributed Data-Parallel Programs from Sequential Building Blocks
- MapReduce : Simplified Data Processing on Large Clusters
- Pregel: a system for large-scale graph processing
- Dremel: Interactive Analysis of Web-Scale Datasets
- Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing
- Storm@twitter
- GraphX: Graph Processing in a Distributed Dataflow Framework
- Introducing Apache Giraph for Large Scale Graph Processing
- Large-Scale Distributed Graph Computing Systems : An Experimental Evaluation
- Large-scale cluster management at Google with Borg
- Omega: flexible, scalable schedulers for large compute clusters
- Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
- Yarn
- Models for Parallel Computing : Review and Perspectives
- Actors: A Model of Concurrent Computation in Distributed Systems
- Communicating sequential processes
- Parallel Algorithms Lecture Notes
- DTHREADS: Efficient and Deterministic Multithreading
- Kendo: efficient deterministic multithreading in software
- Replication: Theory and Practice
- Distributed Systems: Concepts and Design
- Distributed Systems: Principles and Paradigms
- Distributed Systems: An Algorithmic Approach
- Distributed Algorithms: An Intuitive Approach
- Distributed Computing: Principles, Algorithms, and Systems
- Designing data intensive applications
- Introduction to reliable and secure distributed programming
- Distributed Systems 3rd
- MIT 6.824: Distributed Systems
- CMU 15-440: Distributed Systems Syllabus
- MIT 6.852/18.437 Distributed Algorithms
- MIT 6.S897: Large-Scale Systems
- CS 525 Spring 2015 Advanced Distributed Systems
- CS–745/845: Formal Specification and Verification of Systems
- edx: KTHx: ID2203.2x Reliable Distributed Algorithms
- UNDERSTANDING PAXOS
- The Log: What every software engineer should know about real-time data's unifying abstraction
- Consensus Protocols: Two-Phase Commit
- Consensus Protocols: Three-phase Commit
- Three-Phase Commit Protocol
- Consensus Protocols: A Paxos Implementation
- Consensus Protocols: Paxos
- FLP and CAP are not the same
- Consistency and availability in Amazon's Dynamo
- Distributed systems theory for the distributed systems engineer
- PAXOS/MULTI-PAXOS ALGORITHM
- EVENTUAL CONSISTENCY
- The Essential Leslie Lamport
- The Essential Nancy Lynch
- The Essential Barbara Liskov
- Viewstamped Replication Revisited