Skip to content

Latest commit

 

History

History
232 lines (170 loc) · 20.7 KB

README.md

File metadata and controls

232 lines (170 loc) · 20.7 KB

Reading

Some of the items in the list are taken from: https://github.com/analytics/analytics/blob/master/notes/papers.md, others I have added along the way.

Reasonning about consistency

All of these relate to consistency reasoning within the context of a given store, but the concepts map upwards hoefully...

Query processing

Machine Learning - Storage and computational foundations

Hetrogeneous Data Store Integration

Data cleaning, data management

Time, clocks and ordering

Functional Pearls / Other programming

UB-Trees

Space-Filling

Order Preserving Codes

Other Order-Preserving Structures:

Fat binary search and monotone minimal perfect hashing

Formal Concept Analysis

Incremental Computation

Weighted Datalog and Generalized Annotated Programs

Stable Models

Top-Down Evaluation

Tabling

*-Semirings and C-Semirings/ω-continuous semirings

[21:27] nwf:	 Paper dump:
[21:27] nwf:	 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.71.7650 is a decent overview, though a little dated
[21:28] nwf:	 http://dl.acm.org/citation.cfm?id=973230 ties a lot of this in to NLP parsing, if you're curious
[21:28] nwf:	 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.704 I have just skimmed and has some nice examples
[21:30] nwf:	 http://dl.acm.org/citation.cfm?doid=322261.322272 has some more graph-based examples

Provenance semirings and x-fast trees

Consistency and distribution

Latch-free architecture

Persistence and Versioning

Cache-Obliviousness

Packed Memory Arrays

Encoding functions with static support, Monotone Minimal Perfect Hashing and Bloomier Filters

Cache-Oblivious Databases

String Dictionaries

Logging and Metrics

  • The log-structured merge tree by O'Neil, Cheng, Gawlick and O'Neil provides great throughput for inserts in exchange for high read latency for write-mostly workloads.

Compressed Search

We could store full-text-indices on some columns as a BWT transformed version of the leave chunk, then perform compressed-search techniques on top to permit efficient operations on them without paying for full decompression.

At ~4 meg leaves this actually fits pretty well with bzip2 window sizes.

Wavelet Trees and Succinct Data Structures

Compressed Computation

Batched buffer management

Succinct Tree Representations

Scheduling and Work-Stealing

Data Sets (for future demo / benchmarking)

Bloom Filters