Skip to content

Latest commit

 

History

History
429 lines (231 loc) · 32.1 KB

README.md

File metadata and controls

429 lines (231 loc) · 32.1 KB

File System

Local File System

Kernel File System

Linux File System, https://www.kernel.org/doc/html/latest/filesystems

[1974 CACM] Old UNIX File System: The UNIX Time-Sharing System. [PDF]

[1984 TOCS] FFS: A Fast File System for UNIX. [PDF]

[1986 USENIX Summer] Vnodes: An Architecture for Multiple File System Types in Sun UNIX. [PDF]

[1991 USENIX Winter] Extent−like Performance from a UNIX File System. [PDF]

[1991 SOSP] LFS: The Design and Implementation of a Log-Structured File System. [PDF]

[1993 USENIX Winter] An Implementation of a Log-Structured File System for UNIX. [PDF]

[1995] Design and Implementation of the Second Extended Filesystem. [PDF]

[1996 ATC] Scalability in the XFS File System. [PDF]

[1998 LinuxExpo] ext3: Journaling the Linux ext2fs Filesystem. [PDF]

[2001 Ottawa linux symposium] JFFS: The Journalling Flash File System. [PDF]

[2003 FAST] ZFS: The Zettabyte File System. [PDF]

[2006 SIGOPS Operating Systems Review] NILFS: The Linux implementation of a log-structured file system. [PDF]

[2007] The new ext4 filesystem: Current status and future plans. [PDF]

[2013 TOS] BTRFS: The Linux B-Tree Filesystem. [PDF]

[2015 FAST] F2FS: A New File System for Flash Storage. [PDF] [Slides]

[2019 ATC] EROFS: A Compression-friendly Readonly File System for Resource-scarce Devices. [PDF] [Slides]

[2020 VAULT] zonefs: Mapping POSIX File System Interface to Raw Zoned Block Device Accesses. [Slides]

User-Space File System

FUSE: Filesystem in USErspace, https://github.com/libfuse/libfuse

[2004 ATC] Wayback: A User-level Versioning File System for Linux. [PDF]

[2010 SAC] Performance and Extension of User Space File Systems. [PDF]

[2010 MSST] LTFS: The Linear Tape File System. [PDF] [Code]

[2011 EuroSys] Refuse to crash with Re-FUSE. [PDF]

[2013 ATC] TABLEFS: Enhancing Metadata Efficiency in the Local File System. [PDF]

[2015 HotStorage] Terra Incognita: On the Practicality of User-Space File Systems. [PDF] [Slides]

[2016 FAST] The Composite-file File System: Decoupling the One-to-one Mapping of Files and Metadata for Better Performance. [PDF] [Slides]

[2017 FAST] To FUSE or Not to FUSE: Performance of User-Space File Systems. [PDF] [Slides] [Code]

[2018 ROSS] Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support. [PDF] [Code]

[2019 TOS] Performance and Resource Utilization of FUSE User-Space File Systems. [PDF]

[2019 ATC] ExtFUSE: Extension Framework for File Systems in User space. [PDF] [Slides] [Code]

[2021 ATC] XFUSE: An Infrastructure for Running Filesystem Services in User Space. [PDF] [Slides]

[2022 TOS] DEFUSE: An Interface for Fast and Correct User Space File System Access. [PDF]

[2024 FAST] RFUSE: Modernizing Userspace Filesystem Framework through Scalable Kernel-Userspace Communication. [PDF] [Slides] [Code]

Crash Consistency

File System Checker

e2fsck, https://github.com/tytso/e2fsprogs

[1983] Fsck − The UNIX File System Check Program. [PDF]

[2008 OSDI] SQCK: A Declarative File System Checker. [PDF]

[2013 FAST] ffsck: The Fast File System Checker. [PDF]

[2018 TOS] Towards Robust File System Checkers. [PDF]

[2021 FAST] pFSCK: Accelerating File System Checking and Repair for Modern Storage. [PDF] [Code]

Journaling

[2024 ATC] FastCommit: resource-efficient, performant and cost-effective file system journaling. [PDF] [Slides]

Others

[2012 FAST] Consistency Without Ordering. [PDF] [Slides] [Code]

[2013 SOSP] Optimistic Crash Consistency. [PDF] [Slides] [Code]

[2017 TOS] Application Crash Consistency and Performance with CCFS. [PDF]

Fragmentation

[2016 HotStorage] An Empirical Study of File-System Fragmentation in Mobile Storage Systems. [PDF] [Slides]

[2017 FAST] File Systems Fated for Senescence? Nonsense, Says Science! [PDF] [Slides]

[2017 ATC] Improving File System Performance of Mobile Storage Systems Using a Decoupled Defragmenter. [PDF] [Slides]

[2024 FAST] We Ain't Afraid of No File Fragmentation: Causes and Prevention of Its Performance Impact on Modern Flash SSDs. [PDF] [Slides]

Multicore/Manycore Scalability

[2016 ATC] Understanding Manycore Scalability of File Systems. [PDF] [Slides] [Code]

[2017 SOSP] ScaleFS: Scaling a File System to Many Cores Using an Operation Log. [PDF]

[2022 FAST] ScaleXFS: Getting scalability of XFS back on the ring. [PDF]

Distributed File System

General Purpose File System

[1985] Design and Implementation of the Sun Network Filesystem. [PDF]

[1987 SOSP & 1988 TOCS] AFS: Scale and performance in a distributed file system. [PDF]

[1993 SOSP & 1995 TOCS] The Zebra Striped Network File System. [PDF] [Ph.D. Thesis]

[2003 MSST] zFS - A Scalable Distributed File System Using Object Disks. [PDF] [Slides]

[2006 OSDI] Ceph: A Scalable, High-Performance Distributed File System. [PDF] [Code]

[2007 SC] RADOS: A Scalable, Reliable Storage Service for Petabyte-scale Storage Clusters. [PDF]

[2007 Ph.D. Thesis@UCSC] Ceph: Reliable, Scalable, and High-Performance Distributed Storage. [PDF]

[2011 ATC] TidyFS: A Simple and Small Distributed File System. [PDF] [Slides]

[2019 SOSP] File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution. [PDF] [Slides]

Big Data

[2003 SOSP] GFS: The Google File System. [PDF]

[2010 MSST] HDFS: The Hadoop Distributed File System. [PDF] [Slides] [Code]

[2013 VLDB] QFS: The Quantcast File System. [PDF] [Code]

[2021 FAST] Facebook’s Tectonic Filesystem: Efficiency from Exascale. [PDF]

[2023 FAST] More Than Capacity: Performance-Oriented Evolution of Pangu in Alibaba. [PDF] [Video]

High Performance Computing (HPC)

Parallel File System

[2000] PVFS: A Parallel File System for Linux Clusters. [PDF]

[2002 FAST] GPFS: A Shared-Disk File System for Large Computing Clusters. [PDF]

[2003 Ottawa Linux Symposium] Lustre: Building a File System for 1,000-node Clusters. [PDF] [Code]

[2008 FAST] Scalable Performance of the Panasas Parallel File System. [PDF]

Burst Buffer File System

[2009 SC] PLFS: A Checkpoint Filesystem for Parallel Applications. [PDF] [Code]

[2016 SC] BurstFS: An Ephemeral Burst-Buffer File System for Scientific Applications. [PDF] [Code]

[2018 CLUSTER & 2020 JCST] GekkoFS – A temporary distributed file system for HPC applications. [CLUSTER PDF] [JCST PDF] [Code]

[2020 JCST] Gfarm/BB — Gfarm File System for Node-Local Burst Buffer. [PDF] [Code]

[2022 HPCAsia] CHFS: Parallel Consistent Hashing File System for Node-local Persistent Memory. [PDF] [Code]

[2023 FAST] HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers. [PDF] [Video]

[2023 IPDPS] UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage. [PDF] [Code] [Relevant Slides]

[2024 CLUSTER] FINCHFS: Design of Ad-Hoc File System for I/O Heavy HPC Workloads. [PDF] [Code]

Cloud Computing

[2018 VLDB] PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database. [PDF] [Code]

[2019 SIGMOD] CFS: A Distributed File System for Large Scale Container Platforms. [PDF] [Code]

Artificial Intelligence (AI)

AI for File System

[2021 FAST] Learning Cache Replacement with CACHEUS. [PDF] [Slides]

[2023 FAST] GL-Cache: Group-level learning for efficient and high-performance caching. [PDF] [Slides] [Code]

[2024 FAST] Baleen: ML Admission & Prefetching for Flash Caches. [PDF] [Slides] [Code] [Dataset]

File System for AI

Full List of Papers on Storage for AI, https://github.com/hegongshan/Storage-for-AI-Paper

[2019 CLUSTER] Efficient User-Level Storage Disaggregation for Deep Learning. [PDF]

[2020 ICPP] DIESEL: A Dataset-Based Distributed Storage and Caching System for Large-Scale Deep Learning Training. [PDF] [Slides]

[2023 ATC] Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training. [PDF] [Slides]

Data Management

[2024 FAST] Combining Buffered I/O and Direct I/O in Distributed File Systems. [PDF] [Slides] [Code]

  • Data Distribution

[1997 STOC] Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web. [PDF]

[2006 SC] CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data. [PDF]

[2020 FAST] MAPX: Controlled Data Migration in the Expansion of Decentralized Object-Based Storage Systems. [PDF]

Metadata Management

[2003 MSST] Efficient Metadata Management in Large Distributed Storage Systems. [PDF] [Slides]

[2004 SC] Dynamic Metadata Management for Petabyte-scale File Systems. [PDF]

[2011 FAST] Scale and Concurrency of GIGA+: File System Directories with Millions of Files. [PDF]

[2014 SC] IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion. [PDF] [Slides]

[2015 FAST] CalvinFS: Consistent WAN Replication and Scalable Metadata Management for Distributed File Systems. [PDF]

[2015 SoCC] ShardFS vs. IndexFS: Replication vs. Caching Strategies for Distributed Metadata Management in Cloud Storage Systems. [PDF]

[2015 PDSW] DeltaFS: Exascale File Systems Scale Better Without Dedicated Servers. [PDF]

[2017 FAST] HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases. [PDF] [Slides]

[2017 SC] LocoFS: A Loosely-Coupled Metadata Service for Distributed File Systems. [PDF]

[2018 TPDS] A Flattened Metadata Service for Distributed File Systems. [PDF]

[2022 FAST] InfiniFS: An Efficient Metadata Service for Large-Scale Distributed Filesystems. [PDF] [Slides]

[2022 SC] MetaWBC: POSIX-Compliant Metadata Write-Back Caching for Distributed File Systems. [PDF]

[2023 ATC] SingularFS: A Billion-Scale Distributed File System Using a Single Metadata Server. [PDF] [Slides]

[2023 SC] Xfast: Extreme File Attribute Stat Acceleration for Lustre. [PDF]

  • Load Balance

[2015 SC] Mantle: a programmable metadata load balancer for the ceph file system. [PDF]

[2021 SC] Lunule: An Agile and Judicious Metadata Load Balancer for CephFS. [PDF]

Fault Tolerance

Replication

  • Primary-backup Replication

[1976 ICSE] A principle for resilient sharing of distributed resources. [PDF]

  • Chain Replication

[2004 OSDI] Chain Replication for Supporting High Throughput and Availability. [PDF]

  • Consensus-based Replication

[1998 TOCS] Paxos: The Part-Time Parliament. [PDF]

[2001 ACM SIGACT News] Paxos Made Simple. [PDF]

[2014 ATC] Raft: In Search of an Understandable Consensus Algorithm. [PDF]

Erasure Coding

[2012 ATC] Erasure Coding in Windows Azure Storage. [PDF] [Slides]

[2015 FAST] A Tale of Two Erasure Codes in HDFS. [PDF] [Slides]

[2018 FAST] Clay Codes: Moulding MDS Codes to Yield an MSR Code. [PDF] [Slides]

Hardware Optimization

[2017 ATC] Octopus: an RDMA-enabled Distributed Persistent Memory File System. [PDF] [Slides]

[2020 OSDI] Assise: Performance and Availability via Client-local NVM in a Distributed File System. [PDF] [Slides]

[2021 SOSP] LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism. [PDF]

[2022 TPDS] Hydra: A Decentralized File System for Persistent Memory and RDMA Networks. [PDF]

Other Topics

Data Deduplication

[2001 SOSP] LBFS: A Low-bandwidth Network File System. [PDF]

[2008 FAST] Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. [PDF]

[2009 FAST] Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality. [PDF] [Slides]

[2009 ATC] Decentralized Deduplication in SAN Cluster File Systems. [PDF] [Slides]

[2010 FAST] I/O Deduplication: Utilizing Content Similarity to Improve I/O Performance. [PDF] [Slides]

[2011 ATC] Building a High-performance Deduplication System. [PDF]

Security

[1993 CCS] CFS: A Cryptographic File System for Unix. [PDF]

[1998 Technical Report] Cryptfs: A Stackable Vnode Level Encryption File System. [PDF]

[1999 SOSP] SFS: Separating key management from file system security. [PDF]

[2000 OSDI] Fast and secure distributed read-only file system. [PDF]

[2003 ICDE] StegFS: A Steganographic File System. [PDF]

[2003 ATC] NCryptfs: A Secure and Convenient Cryptographic File System. [PDF] [Slides]

[2005 Ottawa Linux Symposium] eCryptfs: An Enterprise-class Encrypted Filesystem for Linux. [PDF]

[2008 StorageSS] Tahoe – The Least-Authority Filesystem. [PDF] [Code]

Surveys

[1989] A Survey of Distributed File Systems. [PDF]

[1990] Distributed File Systems: Concepts and Examples. [PDF]

[2005 ATC] Analysis and Evolution of Journaling File Systems. [PDF]

[2008 NCM] A Taxonomy and Survey on Distributed File Systems. [PDF]

[2015] Analysis of Six Distributed File Systems. [PDF]

[2016 ICCCA] Evolution and Analysis of Distributed File Systems in Cloud Storage: Analytical Survey. [PDF]

[2016 PIEEE] A Comprehensive Study of the Past, Present, and Future of Data Deduplication. [PDF]

[2018 CSUR] Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems – A Systematic Review. [PDF]

[2020 JCST] Ad Hoc File Systems for High-Performance Computing. [PDF]

[2022 TOS] Survey of Distributed File System Design Choices. [PDF]

[2022 CCF-THPC] A survey on AI for storage. [PDF]

[2022 TPDS] The State of the Art of Metadata Managements in Large-Scale Distributed File Systems — Scalability, Performance and Availability. [PDF]

[2022 TPDS] A Survey of Storage Systems in the RDMA Era. [PDF]

[2024 JCRD] From BERT to ChatGPT: Challenges and Technical Development of Storage Systems for Large Model Training. [PDF]

[2025 TOS] A Survey of the Past, Present, and Future of Erasure Coding for Storage Systems. [PDF]

Analysis

[2000 ATC] A Comparison of File System Workloads. [PDF]

[2007 FAST] A Five-Year Study of File-System Metadata. [PDF]

[2008 TOS] A Nine Year Study of File System and Storage Benchmarking. [PDF]

[2011 HotOS] Benchmarking File System Benchmarking: It IS Rocket Science. [PDF]

[2011 FAST] A Study of Practical Deduplication. [PDF] [Slides]

[2012 SC] A Study on Data Deduplication in HPC Storage Systems. [PDF]

[2013 FAST] A Study of Linux File System Evolution. [PDF] [Slides]

Object Storage

[2003 MSST] Towards an Object Store. [PDF] [Slides]

[2004 MSST] OBFS: A File System for Object-based Storage Devices. [PDF] [Slides]

[2010 OSDI] Finding a needle in Haystack: Facebook’s photo storage. [PDF] [Slides]

[2020 SCFA] DAOS: A Scale-Out High Performance Storage Stack for Storage Class Memory. [PDF] [Code]

New Hardware

[2017 FAST] LightNVM: The Linux Open-Channel SSD Subsystem. [PDF] [Slides]

[2021 ATC] ZNS: Avoiding the Block Interface Tax for Flash-based SSDs. [PDF] [Slides]

[2021 OSDI] ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction. [PDF] [Slides]