Merge pull request #14 from esl/design_notes

Add design notes to the documentation
esl · Jun 21, 2023 · c5d7804 · c5d7804
2 parents 74cdf8e + 891696c
commit c5d7804
Show file tree

Hide file tree

Showing 3 changed files with 31 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -29,7 +29,7 @@ Here instead, the cache uses a –bunch of– ets table with `read_concurrency`,
 All operations hove been carefully written following the latest OTP [efficiency guide](https://erlang.org/doc/efficiency_guide/users_guide.html), including maps operations that improve sharing, avoiding unnecessary copying from ETS tables, inline list functions, use `atomics` and `persistent_term` as in [this guide](https://blog.erlang.org/persistent_term/).
 
 ### Instrumentation
-These days, all modern services must be instrumented. This cache library helps follow the RED method –Rate, Errors, Duration–: that is, lookup operations raise telemetry events with name `[segmented_cache, request]` with information whether there was a hit or not (`hit := boolean()`), and the time the lookup took, in microseconds (`time := integer()`). With this, we can aggregate the total Rate, and extract the proportion of cache misses, or Errors, while knowing the Duration of the lookups.
+These days, all modern services must be instrumented. This cache library helps follow the RED method –Rate, Errors, Duration–: that is, lookup operations raise telemetry events with name `[segmented_cache, Name, request]` with information whether there was a hit or not (`hit := boolean()`), and the time the lookup took, in microseconds (`time := integer()`). With this, we can aggregate the total Rate, and extract the proportion of cache misses, or Errors, while knowing the Duration of the lookups. See the documentation for details.
 
 ## Configuration
 

diff --git a/design_notes.md b/design_notes.md
@@ -0,0 +1,24 @@
+Here are the key points of the design:
+
+## Cache state
+- The cache records are stored in multiple ETS tables called _segments_, stored in a tuple to improve access times.
+- There's one _current_ table pointed at by an index, stored in an [`atomic`](https://erlang.org/doc/man/atomics.html).
+- The `persistent_term` module is used to store the cache state, which contains the tuple of segments and the atomic reference. It's initialised just once, and it never changes after that.
+- The atomic value is changing, but most importantly: changing in a lock-free manner.
+- Writes are always done at the _current_ table.
+- Reads iterate through all of them in order, starting from the _current_ one.
+
+## TTL implementation
+- Tables are rotated periodically and data from the last table is dropped.
+- `ttl` is not 100% accurate, as a record can be inserted during rotation and therefore live one segment less that expected: we can treat the `ttl` as a warranty that a record will live less than `ttl` but at least `ttl - ttl/N` where `N` is the number of segments (ETS tables).
+- There's a `gen_server` process responsible for the table rotation (see [Cache state notes](#cache-state-notes) for more details).
+
+## LRU implementation
+- On `segmented_cache:is_member/2` and `segmented_cache:get_entry/2` calls, the record is reinserted in the _current_ ETS table.
+
+## Distribution
+- In a distributed environment, cache is populated independently on every node.
+- However, we must ensure that on deletion the cache record is removed on all the nodes (and from all the ETS tables, see [LRU implementation notes](#lru-implementation-notes))
+- There's a `gen_server` process responsible for ETS tables rotation (see [TTL implementation notes](#ttl-implementation-notes))
+- The same process is reused to implement asynchronous cache record deletion on other nodes in the cluster.
+- In order to simplify discovery of these `gen_server` processes on other nodes, they all are added into a dedicated `pg` group.
diff --git a/rebar.config b/rebar.config
@@ -37,8 +37,12 @@
 {hex, [
     {doc, #{provider => ex_doc}}
 ]}.
+
 {ex_doc, [
      {source_url, <<"https://github.com/esl/segmented_cache">>},
-     {extras, [<<"README.md">>, <<"LICENSE">>]},
-     {main, <<"readme">>}
+     {main, <<"readme">>},
+     {extras, [{'README.md', #{title => <<"Overview">>}},
+               {'design_notes.md', #{title => <<"Design Notes">>}},
+               {'LICENSE', #{title => <<"License">>}}
+              ]}
 ]}.