From 2e7fd03cc7f96a20b09fd622707789166dd7241a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Thu, 8 Feb 2024 19:00:39 +0100
Subject: [PATCH 01/27] Add bitvector and bitlist documentation

---
 docs/bitvectors.md | 214 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 214 insertions(+)
 create mode 100644 docs/bitvectors.md

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
new file mode 100644
index 000000000..96d297ab7
--- /dev/null
+++ b/docs/bitvectors.md
@@ -0,0 +1,214 @@
+# BitVectors and BitLists by example
+
+## Representing integers
+
+Everything in a computer, be it in memory, or disc, or when sent over the network, needs to eventually be represented in binary form. There's two classical ways to do so:
+
+### Big endian byte order
+
+Big endian can be thought of as "you represent it as you read it". For example, let's represent the number 254 in big endian. To represent it as a binary we decompose it powers of two:
+
+$$259 = 256 + 2 + 1 = 2^{8} + 2^{1} + 2^0$$
+
+That means that we'll have the bits representing the power of 0, the power of 1 and the power of 8 set to 1. The rest, will be clear (value = 0).
+
+```
+0000001 00000011
+```
+
+Similar to our decimal system of representation, the symbols to the left represent the most significant values, and the ones to the left, the least significant ones.
+
+Note that this we need two bytes to represent it. This is most CPUs can address bytes, but not bits. That is, when we refer to an address in memory, we refer to the whole byte, and the next address corresponds to the next byte.
+
+We can also think about this number as the byte array `be = [1, 3]`. Here, the least significant byte is the one with the highest index `be[1] = 3` and the most significant byte is the one with the lowest index `be[0] = 1`.
+
+### Little endian byte order
+
+In this representation, we reverse the bytes around. 259 is represented like follows:
+
+```
+00000011 00000001
+```
+
+In this representation, thinking of it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0] = 3` means the lowest significant byte, and the highest index, `le[1] = 1` is the most significant byte. So while little endian is less readable, it is frequently used to represent integers as binaries because of this property.
+
+## Bit vectors
+
+### Little endian bit order
+
+Why would we need a third representation? Let's first pose the problem. We want to represent a set of booleans. Imagine we have a fixed amount of validators, equal to 9, and we want to represent wether they attested in a block or not. We may represent this as follows:
+
+```
+[true, true, false, false, false, false, false, false, true]
+```
+
+However, this representation has a problem: each boolean takes one full byte. For a million validators, which is the order of magnitude of the validator set in mainnet, that would take around 1MB, just for to track attestations on a single slot. We may benefit from the fact that a boolean and a bit both have two states and represent this as a binary instead:
+
+```
+11000000 10000000
+```
+
+So this way we reduce the amount of bytes needed by a factor of 8. In this case we completed the second byte because if we send this over the network we always need to send full bytes, but this effect is diluted when dealing with thousands of bytes.
+
+If we wanted to represent a number with this, as we're addressing by bits instead of bytes, we'd say that the least significant bit is the one with the lowest index, thus why this is called little-endian-*bit*-order. That is, in general:
+
+$$\sum_{i} arr[i]*2^i = 2^0 + 2^1 + 2^8 = 259$$
+
+If you look closely, this is the same number we used in the examples for the classical byte orders! The same way that little-endian byte order was big endian but reversing the bytes, little-endian bit order is big endian but reversing bit by bit.
+
+### Serialization
+
+The way SSZ represents bit vectors is as follows:
+
+1. Conceptually, a set is represented in little-endian bit ordering.
+2. Padding is added so we have full bytes.
+3. When serializing, we convert from little-endian bit ordering to little-endian byte ordering.
+
+So if we want to represent the following array:
+
+```
+[true, true, false, false, false, false, false, false, true]
+```
+
+Which means that the validators with index 0, 1 and 8 attested, this would be represented as follows conceptually:
+
+```
+110000001
+```
+
+Adding padding:
+
+```
+11000000 10000000
+```
+
+Moving it to little endian byte order (we go byte by byte and reverse the bits):
+
+```
+00000011 00000001
+```
+
+Which is what I'll send over the network. This is what SSZ calls `bitvectors`, which is a binary representing an array of booleans of constant size. We know that this array is of size 9 beforehand, so we know what bits are padding and should be ignored. For variable sized bit arrays we'll use `bitlists`, which we'll talk about later.
+
+### Internal representation
+
+There's a trick here: SSZ doesn't specify how to store this in memory after deserializing. We could, theoretically, read the serialized data, transform it from little-endian byte order to little-endian bit order, and use bit addressing (which elixir supports) to get individual values. This would imply, however, going through each byte and reversing the bits, which is a costly operation. If we stuck with little-endian byte order, addressing individual bits would be more complicated, and shifting (moving every bit to the left or right) would be tricky.
+
+For this reason, we represent bitvectors in our node as big-endian binaries. That means that we reverse the bytes (a relatively cheap operation) and, for bit addressing, we just use the complementary index. An example:
+
+If we are still representing the number 259 (validators with index 0, 1 and 8 attested) we'll have the two following representations (note, elixir has a `bitstring` type that lets you address bit by bit and store an amount of bits that's not a multiple of 8):
+
+```
+110000001 -> little-endian bit order
+100000011 -> big-endian
+```
+
+If we watch closely, we confirm something we said before: this are bit-mirrored representations. That means that if I want to know if the validator 0 voted, in the little-endian bit order we address `bitvector[i]`, and in the other case, we just use `bitvector[N-i]`, where `N=9` as it is the size of the vector.
+
+A possible optimization (we'd need to benchmark it) would be to represent the array as the number 259 directly, and use bitwise operations to address bits or shift.
+
+This is the code that performs that:
+
+```elixir
+def new(bitstring, size) when is_bitstring(bitstring) do
+  # Change the byte order from little endian to big endian (reverse bytes).
+  encoded_size = bit_size(bitstring)
+  <<num::integer-little-size(encoded_size)>> = bitstring
+  <<num::integer-size(size)>>
+end
+```
+
+It reads the input as a little-endian number, and then represents it as big-endian.
+
+## Bitlists
+
+### Sentinel bits
+
+In reality, there's not a fixed amount of validators, if someone deposits 32ETH in the deposit contract, a new validator will join the set. `bitlists` are used to represent boolean arrays of variable size like this one. Conceptually, they use the little-endian bit order too, but they use a strategy called `sentinel bit` to mark where it ends. Let's imagine, again, that we're representing the same set of 9 validators as before. We start with the following 9 bits:
+
+```
+110000001
+```
+
+To serialize this and send it over the network, I do the following:
+
+1. Add an extra bit = 1:
+
+```
+1100000011
+```
+
+2. Add padding to complete the full bytes
+
+```
+11000000 11000000
+```
+
+3. Move to little-endian byte order (reverse bits within each byte):
+
+```
+00000011 00000011
+```
+
+When deserializing, we'll look closely at the last byte, and realize that there's 6 trailing 0s (padding), and discard those and the 7th bit (the sentinel 1).
+
+### Edge case: already a multiple of 8
+
+We need to take into account that it might be the case that we already have a multiple of 8 as the number of booleans we're representing. For instance, let's suppose that we have 8 validators and only the first and the second one attested. In little-endian bit ordering, that is:
+
+```
+11000000
+```
+
+When adding the trailing bit and padding, it will look like this:
+
+```
+11000000 10000000
+```
+
+This means that the sentinel bit is, effectively, adding a full new byte. After reversing the bits:
+
+```
+00000011 00000001
+```
+
+When parsing this, we still take care about the last byte, but we will realize that it's comprised of 7 trailing 0s and a sentinel bit, so we'll discard it fully. 
+
+This also shows the importance of the sentinel bit: if it wasn't for it it wouldn't be obvious to the parser that `00000011` represented 8 elements: it could be a set of two validators where both voted.
+
+### Internal representation
+
+For bitlists, in this client we do the same as with bitvectors. We represent them using big endian, for the same reasons. That is, the first thing we do is reverse the bytes, and then remove the first zeroes of the first byte. The code doing that is the following:
+
+```elixir
+def new(bitstring) when is_bitstring(bitstring) do
+  # Change the byte order from little endian to big endian (reverse bytes).
+  num_bits = bit_size(bitstring)
+  len = length_of_bitlist(bitstring)
+
+  <<pre::integer-little-size(num_bits - 8), last_byte::integer-little-size(@bits_per_byte)>> =
+  bitstring
+
+  decoded = <<remove_trailing_bit(<<last_byte>>)::bitstring, pre::integer-size(num_bits - 8)>>
+  {decoded, len}
+end
+
+@spec remove_trailing_bit(binary()) :: bitstring()
+defp remove_trailing_bit(<<1::1, rest::7>>), do: <<rest::7>>
+defp remove_trailing_bit(<<0::1, 1::1, rest::6>>), do: <<rest::6>>
+defp remove_trailing_bit(<<0::2, 1::1, rest::5>>), do: <<rest::5>>
+defp remove_trailing_bit(<<0::3, 1::1, rest::4>>), do: <<rest::4>>
+defp remove_trailing_bit(<<0::4, 1::1, rest::3>>), do: <<rest::3>>
+defp remove_trailing_bit(<<0::5, 1::1, rest::2>>), do: <<rest::2>>
+defp remove_trailing_bit(<<0::6, 1::1, rest::1>>), do: <<rest::1>>
+defp remove_trailing_bit(<<0::7, 1::1>>), do: <<0::0>>
+defp remove_trailing_bit(<<0::8>>), do: <<0::0>>
+
+# This last case should never happen, the last byte should always
+# have a sentinel bit.
+```
+
+We see that we perform two things at the same time:
+
+1. We read as a little endian and then represent it as big endian.
+2. we remove the trailing bits of the last byte, which after reversing, is the first one.

From e5efbda82ae6b59361f41d7b3c27b453633583a0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:29:08 +0100
Subject: [PATCH 02/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 96d297ab7..264019df4 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -6,7 +6,7 @@ Everything in a computer, be it in memory, or disc, or when sent over the networ
 
 ### Big endian byte order
 
-Big endian can be thought of as "you represent it as you read it". For example, let's represent the number 254 in big endian. To represent it as a binary we decompose it powers of two:
+Big-endian can be thought of as "you represent it as you read it". For example, let's represent the number 259 in big-endian. To represent it as a binary we decompose it into powers of two:
 
 $$259 = 256 + 2 + 1 = 2^{8} + 2^{1} + 2^0$$
 

From a1485a01ead9c4b04efb8e5cc2078848357eacea Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:29:53 +0100
Subject: [PATCH 03/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 264019df4..307408ec5 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -4,7 +4,7 @@
 
 Everything in a computer, be it in memory, or disc, or when sent over the network, needs to eventually be represented in binary form. There's two classical ways to do so:
 
-### Big endian byte order
+### Big-endian byte order
 
 Big-endian can be thought of as "you represent it as you read it". For example, let's represent the number 259 in big-endian. To represent it as a binary we decompose it into powers of two:
 

From 1d4dd128aad53f00a4f5533ca10fe756cef7a09e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:29:59 +0100
Subject: [PATCH 04/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 307408ec5..456718228 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -22,7 +22,7 @@ Note that this we need two bytes to represent it. This is most CPUs can address
 
 We can also think about this number as the byte array `be = [1, 3]`. Here, the least significant byte is the one with the highest index `be[1] = 3` and the most significant byte is the one with the lowest index `be[0] = 1`.
 
-### Little endian byte order
+### Little-endian byte order
 
 In this representation, we reverse the bytes around. 259 is represented like follows:
 

From d7259b3e5967ceed928efb418afca917f6f0b6ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:30:06 +0100
Subject: [PATCH 05/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 456718228..0062b0c94 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -24,7 +24,7 @@ We can also think about this number as the byte array `be = [1, 3]`. Here, the l
 
 ### Little-endian byte order
 
-In this representation, we reverse the bytes around. 259 is represented like follows:
+In this representation, we reverse the bytes around. 259 is represented as follows:
 
 ```
 00000011 00000001

From efb6cc63db13fea0038e63e9a2e5949d3abfaa69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:30:32 +0100
Subject: [PATCH 06/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 0062b0c94..f2226879a 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -30,7 +30,7 @@ In this representation, we reverse the bytes around. 259 is represented as follo
 00000011 00000001
 ```
 
-In this representation, thinking of it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0] = 3` means the lowest significant byte, and the highest index, `le[1] = 1` is the most significant byte. So while little endian is less readable, it is frequently used to represent integers as binaries because of this property.
+Representing it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0] = 3` means the lowest significant byte, and the highest index, `le[1] = 1` is the most significant byte. So, while little-endian is less readable, it is frequently used to represent integers as binaries because of this property.
 
 ## Bit vectors
 

From c407411ee5073a7a76143f116045f5b4cbe8c345 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:30:47 +0100
Subject: [PATCH 07/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index f2226879a..50c4e5024 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -34,7 +34,7 @@ Representing it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0]
 
 ## Bit vectors
 
-### Little endian bit order
+### Little-endian bit order
 
 Why would we need a third representation? Let's first pose the problem. We want to represent a set of booleans. Imagine we have a fixed amount of validators, equal to 9, and we want to represent wether they attested in a block or not. We may represent this as follows:
 

From 237e27ace3a331953a89cb35999156051ec27bf8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:31:16 +0100
Subject: [PATCH 08/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 50c4e5024..c0ff72967 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -166,7 +166,7 @@ When adding the trailing bit and padding, it will look like this:
 11000000 10000000
 ```
 
-This means that the sentinel bit is, effectively, adding a full new byte. After reversing the bits:
+This means that the sentinel bit is, effectively, adding a new full byte. After reversing the bits:
 
 ```
 00000011 00000001

From 68dc2d7b92a438442d534d882cdb2447b239ed70 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:31:27 +0100
Subject: [PATCH 09/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index c0ff72967..2910a5935 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -172,7 +172,7 @@ This means that the sentinel bit is, effectively, adding a new full byte. After
 00000011 00000001
 ```
 
-When parsing this, we still take care about the last byte, but we will realize that it's comprised of 7 trailing 0s and a sentinel bit, so we'll discard it fully. 
+When parsing this, we still take care of the last byte, but we will realize that it's comprised of 7 trailing 0s and a sentinel bit, so we'll discard it fully. 
 
 This also shows the importance of the sentinel bit: if it wasn't for it it wouldn't be obvious to the parser that `00000011` represented 8 elements: it could be a set of two validators where both voted.
 

From 99823fb3d47c922e51b55a740044f4d1952bcbdc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:31:55 +0100
Subject: [PATCH 10/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 2910a5935..338133616 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -178,7 +178,7 @@ This also shows the importance of the sentinel bit: if it wasn't for it it would
 
 ### Internal representation
 
-For bitlists, in this client we do the same as with bitvectors. We represent them using big endian, for the same reasons. That is, the first thing we do is reverse the bytes, and then remove the first zeroes of the first byte. The code doing that is the following:
+For bitlists, in this client we do the same as with bitvectors, and for the same reasons: we represent them using big-endian. That is, the first thing we do is reverse the bytes, and then remove the first zeroes of the first byte. The code doing that is the following:
 
 ```elixir
 def new(bitstring) when is_bitstring(bitstring) do

From d12befafd5208efb313811bada906559787a6672 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:32:05 +0100
Subject: [PATCH 11/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 338133616..8d465a516 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -210,5 +210,5 @@ defp remove_trailing_bit(<<0::8>>), do: <<0::0>>
 
 We see that we perform two things at the same time:
 
-1. We read as a little endian and then represent it as big endian.
+1. We read as a little-endian and then represent it as big-endian.
 2. we remove the trailing bits of the last byte, which after reversing, is the first one.

From 41435edf712459aa9d069d6fe05b4cac1e5d9668 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:32:24 +0100
Subject: [PATCH 12/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 8d465a516..25a8d491a 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -36,7 +36,7 @@ Representing it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0]
 
 ### Little-endian bit order
 
-Why would we need a third representation? Let's first pose the problem. We want to represent a set of booleans. Imagine we have a fixed amount of validators, equal to 9, and we want to represent wether they attested in a block or not. We may represent this as follows:
+Why would we need a third representation? Let's first pose the problem. We want to represent a set of booleans. Imagine we have a fixed amount of validators, equal to 9, and we want to represent whether they attested in a block or not. We may represent this as follows:
 
 ```
 [true, true, false, false, false, false, false, false, true]

From 73da7caf50058927000749c60510fe5c0bebf125 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:33:28 +0100
Subject: [PATCH 13/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 25a8d491a..92fba7291 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -48,7 +48,7 @@ However, this representation has a problem: each boolean takes one full byte. Fo
 11000000 10000000
 ```
 
-So this way we reduce the amount of bytes needed by a factor of 8. In this case we completed the second byte because if we send this over the network we always need to send full bytes, but this effect is diluted when dealing with thousands of bytes.
+This way we reduce the amount of bytes needed by a factor of 8. In this case, we completed the second byte because if we send this over the network we always need to send full bytes, but this effect is diluted when dealing with thousands of bytes.
 
 If we wanted to represent a number with this, as we're addressing by bits instead of bytes, we'd say that the least significant bit is the one with the lowest index, thus why this is called little-endian-*bit*-order. That is, in general:
 

From 5311b2d69bb14e6e42f828b9bd95190ba3dc2dd9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:33:47 +0100
Subject: [PATCH 14/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 92fba7291..4dedc083c 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -64,7 +64,7 @@ The way SSZ represents bit vectors is as follows:
 2. Padding is added so we have full bytes.
 3. When serializing, we convert from little-endian bit ordering to little-endian byte ordering.
 
-So if we want to represent the following array:
+So, if we want to represent that the validators with indices 0, 1, and 8 attested, we can use the following array:
 
 ```
 [true, true, false, false, false, false, false, false, true]

From 8670e655963621c0e1a18c9c02d724cf11d254fe Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 17:34:06 +0100
Subject: [PATCH 15/27] Update docs/bitvectors.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-authored-by: Tomás Grüner <47506558+MegaRedHand@users.noreply.github.com>
---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 4dedc083c..c826b3857 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -70,7 +70,7 @@ So, if we want to represent that the validators with indices 0, 1, and 8 atteste
 [true, true, false, false, false, false, false, false, true]
 ```
 
-Which means that the validators with index 0, 1 and 8 attested, this would be represented as follows conceptually:
+Conceptually, this would be represented as the following string of bits:
 
 ```
 110000001

From 967ea1e9caf7ce54e55e976b782a01ad574f0a39 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 19:01:23 +0100
Subject: [PATCH 16/27] fix typos

---
 docs/bitvectors.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index c826b3857..52a4c34d0 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -16,9 +16,9 @@ That means that we'll have the bits representing the power of 0, the power of 1
 0000001 00000011
 ```
 
-Similar to our decimal system of representation, the symbols to the left represent the most significant values, and the ones to the left, the least significant ones.
+Similar to our decimal system of representation, the symbols to the left represent the most significant values, and the ones to the right, the least significant ones.
 
-Note that this we need two bytes to represent it. This is most CPUs can address bytes, but not bits. That is, when we refer to an address in memory, we refer to the whole byte, and the next address corresponds to the next byte.
+Note that we need two bytes to represent this number. Most CPUs can address bytes, but not individual bits. That is, when we refer to an address in memory, we refer to the whole byte, and the next address corresponds to the next byte.
 
 We can also think about this number as the byte array `be = [1, 3]`. Here, the least significant byte is the one with the highest index `be[1] = 3` and the most significant byte is the one with the lowest index `be[0] = 1`.
 
@@ -42,7 +42,7 @@ Why would we need a third representation? Let's first pose the problem. We want
 [true, true, false, false, false, false, false, false, true]
 ```
 
-However, this representation has a problem: each boolean takes one full byte. For a million validators, which is the order of magnitude of the validator set in mainnet, that would take around 1MB, just for to track attestations on a single slot. We may benefit from the fact that a boolean and a bit both have two states and represent this as a binary instead:
+However, this representation has a problem: each boolean takes one full byte. For a million validators, which is in the order of magnitude of the validator set of mainnet, that mean a total attestation size of 64KB per block, which is half their size. We can instead use the fact that a boolean and a bit both have two states and represent this as a binary instead:
 
 ```
 11000000 10000000
@@ -82,13 +82,13 @@ Adding padding:
 11000000 10000000
 ```
 
-Moving it to little endian byte order (we go byte by byte and reverse the bits):
+Moving it to little-endian byte order (we go byte by byte and reverse the bits):
 
 ```
 00000011 00000001
 ```
 
-Which is what I'll send over the network. This is what SSZ calls `bitvectors`, which is a binary representing an array of booleans of constant size. We know that this array is of size 9 beforehand, so we know what bits are padding and should be ignored. For variable sized bit arrays we'll use `bitlists`, which we'll talk about later.
+Which is what SSZ calls `bitvectors`, and what nodes send over the network: a binary representing an array of booleans of constant size. We know that this array is of size 9 beforehand, so we know what bits are padding and should be ignored. For variable-sized bit arrays we'll use `bitlists`, which we'll talk about later.
 
 ### Internal representation
 
@@ -103,9 +103,7 @@ If we are still representing the number 259 (validators with index 0, 1 and 8 at
 100000011 -> big-endian
 ```
 
-If we watch closely, we confirm something we said before: this are bit-mirrored representations. That means that if I want to know if the validator 0 voted, in the little-endian bit order we address `bitvector[i]`, and in the other case, we just use `bitvector[N-i]`, where `N=9` as it is the size of the vector.
-
-A possible optimization (we'd need to benchmark it) would be to represent the array as the number 259 directly, and use bitwise operations to address bits or shift.
+If we watch closely, we confirm something we said before: these are bit-mirrored representations. That means that if I want to know if the validator 0 voted, in the little-endian bit order, we address `bitvector[i]`, and in the other case, we just use `bitvector[N-i]`, where `N=9` as it is the size of the vector.
 
 This is the code that performs that:
 
@@ -120,6 +118,8 @@ end
 
 It reads the input as a little-endian number, and then represents it as big-endian.
 
+Instead of using Elixir's bitstrings, a possible optimization (we'd need to benchmark it) would be to represent the array as the number 259 directly, and use bitwise operations to address bits or shift.
+
 ## Bitlists
 
 ### Sentinel bits
@@ -130,7 +130,7 @@ In reality, there's not a fixed amount of validators, if someone deposits 32ETH
 110000001
 ```
 
-To serialize this and send it over the network, I do the following:
+To serialize this and send it over the network, we do the following:
 
 1. Add an extra bit = 1:
 
@@ -138,7 +138,7 @@ To serialize this and send it over the network, I do the following:
 1100000011
 ```
 
-2. Add padding to complete the full bytes
+2. Add padding to complete the full bytes:
 
 ```
 11000000 11000000
@@ -150,7 +150,7 @@ To serialize this and send it over the network, I do the following:
 00000011 00000011
 ```
 
-When deserializing, we'll look closely at the last byte, and realize that there's 6 trailing 0s (padding), and discard those and the 7th bit (the sentinel 1).
+When deserializing, we'll look closely at the last byte, realize that there are 6 trailing 0s (padding), and discard those along with the 7th bit (the sentinel 1).
 
 ### Edge case: already a multiple of 8
 

From 3d86bf05aaf87c73c21873bdbb3001ba66d49516 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Fri, 9 Feb 2024 20:01:54 +0100
Subject: [PATCH 17/27] improve readability

---
 docs/bitvectors.md | 103 +++++++++++++++++++++++++++------------------
 1 file changed, 63 insertions(+), 40 deletions(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 52a4c34d0..d77755c66 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -2,25 +2,35 @@
 
 ## Representing integers
 
-Everything in a computer, be it in memory, or disc, or when sent over the network, needs to eventually be represented in binary form. There's two classical ways to do so:
+Computers use transistors to store data. These electrical components only have two possible states: `clear` or `set`. Numerically, we represent the clear state as a `0` and and the set state as `1`. Using 1s and 0s, we can represent any integer number using the binary system, the same way we use the decimal system in our daily lives.
 
-### Big-endian byte order
+As an example, let's take the number 259. For its decimal representation, we use the digits 2, 5 and 9, because each digit, or coefficient, represents a power of 10:
+
+$$ 259 = 200 + 50 + 9 = 2*10^2 + 5*10^1 + 9*10^0 $$
 
-Big-endian can be thought of as "you represent it as you read it". For example, let's represent the number 259 in big-endian. To represent it as a binary we decompose it into powers of two:
+If we wanted to do the same the binary binary, we would use powers of two, and each individual symbol (or bit) can only be 0 or 1.
 
-$$259 = 256 + 2 + 1 = 2^{8} + 2^{1} + 2^0$$
+$$ 259 = 256 + 2 + 1 = 1*2^{8} + 1*2^{1} + 1*2^0 $$
 
-That means that we'll have the bits representing the power of 0, the power of 1 and the power of 8 set to 1. The rest, will be clear (value = 0).
+All other bits are 0. This results in the following binary number:
 
 ```
-0000001 00000011
+100000011
 ```
 
-Similar to our decimal system of representation, the symbols to the left represent the most significant values, and the ones to the right, the least significant ones.
+In written form, similar to our decimal system of representation, the symbols "to the left" represent the most significant digits, and the ones to the right, the least significant ones.
 
-Note that we need two bytes to represent this number. Most CPUs can address bytes, but not individual bits. That is, when we refer to an address in memory, we refer to the whole byte, and the next address corresponds to the next byte.
+### Big-endian byte order
 
-We can also think about this number as the byte array `be = [1, 3]`. Here, the least significant byte is the one with the highest index `be[1] = 3` and the most significant byte is the one with the lowest index `be[0] = 1`.
+Most CPUs can only address bits in groups of 8, called bytes. That is, when we refer to an address in memory, we refer to the whole byte, and the next address corresponds to the next byte. This means that to represent this number, we'll need two bytes, organized as follows:
+
+```
+00000001 00000011
+```
+
+We can also think about this as the byte array `bytes = [1, 3]`. Here, the least significant byte is the one with the highest index `bytes[1] = 3` and the most significant byte is the one with the lowest index `bytes[0] = 1`.
+
+This ordering of bytes, which is similar to the written form, is called `big-endian`.
 
 ### Little-endian byte order
 
@@ -30,19 +40,17 @@ In this representation, we reverse the bytes around. 259 is represented as follo
 00000011 00000001
 ```
 
-Representing it as a byte array, we get `le = [3, 1]`. The lowest index, `le[0] = 3` means the lowest significant byte, and the highest index, `le[1] = 1` is the most significant byte. So, while little-endian is less readable, it is frequently used to represent integers as binaries because of this property.
-
-## Bit vectors
+Representing it as a byte array, we get `bytes = [3, 1]`. The lowest index, `bytes[0] = 3` means the lowest significant byte, and the highest index, `bytes[1] = 1` is the most significant byte. So, while little-endian is less readable, it is frequently used to represent integers as binaries because of this property.
 
 ### Little-endian bit order
 
-Why would we need a third representation? Let's first pose the problem. We want to represent a set of booleans. Imagine we have a fixed amount of validators, equal to 9, and we want to represent whether they attested in a block or not. We may represent this as follows:
+Why would we need a third representation? Let's first pose the problem. Imagine we have a fixed amount of validators, equal to 9, and we want to represent whether they attested in a block or not. If the validators 0, 1 and 8 attested, we may represent this with a boolean array, as follows:
 
 ```
 [true, true, false, false, false, false, false, false, true]
 ```
 
-However, this representation has a problem: each boolean takes one full byte. For a million validators, which is in the order of magnitude of the validator set of mainnet, that mean a total attestation size of 64KB per block, which is half their size. We can instead use the fact that a boolean and a bit both have two states and represent this as a binary instead:
+However, this representation has a problem: each boolean takes up one full byte. For a million validators, which is in the order of magnitude of the validator set of mainnet, that means a total attestation size of 64KB per block, which is half its size. We can instead use the fact that a boolean and a bit both have two states and represent this as a binary instead:
 
 ```
 11000000 10000000
@@ -50,21 +58,36 @@ However, this representation has a problem: each boolean takes one full byte. Fo
 
 This way we reduce the amount of bytes needed by a factor of 8. In this case, we completed the second byte because if we send this over the network we always need to send full bytes, but this effect is diluted when dealing with thousands of bytes.
 
-If we wanted to represent a number with this, as we're addressing by bits instead of bytes, we'd say that the least significant bit is the one with the lowest index, thus why this is called little-endian-*bit*-order. That is, in general:
+If we wanted to represent a number with this, as we're addressing by bits instead of bytes, we'd follow the convention that the least significant bit is the one with the lowest index, thus why this is called little-endian-*bit*-order. That is, in general:
+
+$$ \sum_{i} arr[i]*2^i = 2^0 + 2^1 + 2^8 = 259 $$
+
+If you look closely, this is the same number we used in the examples for the classical byte orders!
 
-$$\sum_{i} arr[i]*2^i = 2^0 + 2^1 + 2^8 = 259$$
+Summarizing all representations:
 
-If you look closely, this is the same number we used in the examples for the classical byte orders! The same way that little-endian byte order was big endian but reversing the bytes, little-endian bit order is big endian but reversing bit by bit.
+```
+00000001 00000011: big-endian
+00000011 00000001: little-endian byte order
+11000000 10000000: little-endian bit order
+```
+
+If we want to convert from each order to each other:
+
+- Little-endian byte order to big-endian: reverse the bytes.
+- Little-endian bit order to big-endian: reverse the bits of the whole number.
+- Little-endian bit order to little-endian byte order: reverse the bits of each individual byte. This is equivalent to reversing all bits (converting to big-endian) and then reversing the bytes (big-endian to little-endian byte order) but in a single step.
+
+## Bit vectors
 
-### Serialization
+### Serialization (SSZ)
 
-The way SSZ represents bit vectors is as follows:
+`bitvectors` are exactly that: a set of booleans with fixed size. SSZ represents bit vectors as follows:
 
-1. Conceptually, a set is represented in little-endian bit ordering.
-2. Padding is added so we have full bytes.
-3. When serializing, we convert from little-endian bit ordering to little-endian byte ordering.
+- Conceptually, a set is represented in little-endian bit ordering, padded with 0s at the end to get full bytes.
+- When serializing, we convert from little-endian bit ordering to little-endian byte ordering.
 
-So, if we want to represent that the validators with indices 0, 1, and 8 attested, we can use the following array:
+If we want to represent that the validators with indices 0, 1, and 8 attested, we can use the following array:
 
 ```
 [true, true, false, false, false, false, false, false, true]
@@ -88,11 +111,11 @@ Moving it to little-endian byte order (we go byte by byte and reverse the bits):
 00000011 00000001
 ```
 
-Which is what SSZ calls `bitvectors`, and what nodes send over the network: a binary representing an array of booleans of constant size. We know that this array is of size 9 beforehand, so we know what bits are padding and should be ignored. For variable-sized bit arrays we'll use `bitlists`, which we'll talk about later.
+This is how nodes send `bitvectors` over the network. We know that this array is of size 9 beforehand, so we know what bits are padding and should be ignored. For variable-sized bit arrays we'll use `bitlists`, which we'll talk about later.
 
 ### Internal representation
 
-There's a trick here: SSZ doesn't specify how to store this in memory after deserializing. We could, theoretically, read the serialized data, transform it from little-endian byte order to little-endian bit order, and use bit addressing (which elixir supports) to get individual values. This would imply, however, going through each byte and reversing the bits, which is a costly operation. If we stuck with little-endian byte order, addressing individual bits would be more complicated, and shifting (moving every bit to the left or right) would be tricky.
+There's a trick here: SSZ doesn't specify how to store a `bitvector` in memory after deserializing. We could, theoretically, read the serialized data, transform it from little-endian byte order to little-endian bit order, and use bit addressing (which elixir supports) to get individual values. This would imply, however, going through each byte and reversing the bits, which is a costly operation. If we stuck with little-endian byte order without transforming it, addressing individual bits would be more complicated, and shifting (moving every bit to the left or right) would be tricky.
 
 For this reason, we represent bitvectors in our node as big-endian binaries. That means that we reverse the bytes (a relatively cheap operation) and, for bit addressing, we just use the complementary index. An example:
 
@@ -103,9 +126,9 @@ If we are still representing the number 259 (validators with index 0, 1 and 8 at
 100000011 -> big-endian
 ```
 
-If we watch closely, we confirm something we said before: these are bit-mirrored representations. That means that if I want to know if the validator 0 voted, in the little-endian bit order, we address `bitvector[i]`, and in the other case, we just use `bitvector[N-i]`, where `N=9` as it is the size of the vector.
+If we watch closely, we confirm something we said before: these are bit-mirrored representations. That means that if I want to know if the validator i voted, in the little-endian bit order, we address `bitvector[i]`, and in the big-endian order, we just use `bitvector[N-i]`, where `N=9` as it's the size of the vector.
 
-This is the code that performs that:
+This is the code that performs this conversion:
 
 ```elixir
 def new(bitstring, size) when is_bitstring(bitstring) do
@@ -116,7 +139,7 @@ def new(bitstring, size) when is_bitstring(bitstring) do
 end
 ```
 
-It reads the input as a little-endian number, and then represents it as big-endian.
+It reads the input as a little-endian number, and then constructs a big-endian binary representation of it.
 
 Instead of using Elixir's bitstrings, a possible optimization (we'd need to benchmark it) would be to represent the array as the number 259 directly, and use bitwise operations to address bits or shift.
 
@@ -124,7 +147,7 @@ Instead of using Elixir's bitstrings, a possible optimization (we'd need to benc
 
 ### Sentinel bits
 
-In reality, there's not a fixed amount of validators, if someone deposits 32ETH in the deposit contract, a new validator will join the set. `bitlists` are used to represent boolean arrays of variable size like this one. Conceptually, they use the little-endian bit order too, but they use a strategy called `sentinel bit` to mark where it ends. Let's imagine, again, that we're representing the same set of 9 validators as before. We start with the following 9 bits:
+In reality, there isn't a fixed amount of validators. If someone deposits 32ETH in the deposit contract, a new validator will join the set. `bitlists` are used to represent boolean arrays of variable size like this one. Conceptually, they use the little-endian bit order too, but they use a strategy called `sentinel bit` to mark where it ends. Let's imagine, again, that we're representing the same set of 9 validators as before. We start with the following 9 bits:
 
 ```
 110000001
@@ -132,19 +155,19 @@ In reality, there's not a fixed amount of validators, if someone deposits 32ETH
 
 To serialize this and send it over the network, we do the following:
 
-1. Add an extra bit = 1:
+1. Add an extra (sentinel) bit, equal to 1:
 
 ```
 1100000011
 ```
 
-2. Add padding to complete the full bytes:
+2. Add padding to complete the full byte:
 
 ```
 11000000 11000000
 ```
 
-3. Move to little-endian byte order (reverse bits within each byte):
+3. Transform to little-endian byte order (reverse bits within each byte):
 
 ```
 00000011 00000011
@@ -154,7 +177,7 @@ When deserializing, we'll look closely at the last byte, realize that there are
 
 ### Edge case: already a multiple of 8
 
-We need to take into account that it might be the case that we already have a multiple of 8 as the number of booleans we're representing. For instance, let's suppose that we have 8 validators and only the first and the second one attested. In little-endian bit ordering, that is:
+It might be the case that we already have a multiple of 8 as the number of booleans we're representing. For instance, let's suppose that we have 8 validators and only the first and the second one attested. In little-endian bit order, that is:
 
 ```
 11000000
@@ -166,19 +189,19 @@ When adding the trailing bit and padding, it will look like this:
 11000000 10000000
 ```
 
-This means that the sentinel bit is, effectively, adding a new full byte. After reversing the bits:
+This means that the sentinel bit is, effectively, adding a new full byte. After serializing:
 
 ```
 00000011 00000001
 ```
 
-When parsing this, we still take care of the last byte, but we will realize that it's comprised of 7 trailing 0s and a sentinel bit, so we'll discard it fully. 
+When parsing this, we still pay attention to the last byte, but we will realize that it's comprised of 7 trailing 0s and a sentinel bit, so we'll discard it entirely.
 
-This also shows the importance of the sentinel bit: if it wasn't for it it wouldn't be obvious to the parser that `00000011` represented 8 elements: it could be a set of two validators where both voted.
+This also shows the importance of the sentinel bit: if it wasn't for it it wouldn't be obvious to the parser that `00000011` represents 8 elements: it could be a set of two validators where both voted (`11`).
 
 ### Internal representation
 
-For bitlists, in this client we do the same as with bitvectors, and for the same reasons: we represent them using big-endian. That is, the first thing we do is reverse the bytes, and then remove the first zeroes of the first byte. The code doing that is the following:
+For `bitlists`, in this client we do the same as with `bitvectors`, and for the same reasons: we represent them using big-endian. The code doing that is the following:
 
 ```elixir
 def new(bitstring) when is_bitstring(bitstring) do
@@ -208,7 +231,7 @@ defp remove_trailing_bit(<<0::8>>), do: <<0::0>>
 # have a sentinel bit.
 ```
 
-We see that we perform two things at the same time:
+We see that the code performs two things at the same time:
 
-1. We read as a little-endian and then represent it as big-endian.
-2. we remove the trailing bits of the last byte, which after reversing, is the first one.
+1. It parses the little-endian byte ordered binary and then represents it as big-endian.
+2. It removes the trailing bits and sentinel of the last byte, which after reversing, is the first one.

From 3b56b2503d5355e1e2b2fad1cdc7d5344618a5b6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Thu, 15 Feb 2024 12:58:35 +0100
Subject: [PATCH 18/27] add quotes

---
 docs/bitvectors.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index d77755c66..31c7aeb8a 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -42,7 +42,7 @@ In this representation, we reverse the bytes around. 259 is represented as follo
 
 Representing it as a byte array, we get `bytes = [3, 1]`. The lowest index, `bytes[0] = 3` means the lowest significant byte, and the highest index, `bytes[1] = 1` is the most significant byte. So, while little-endian is less readable, it is frequently used to represent integers as binaries because of this property.
 
-### Little-endian bit order
+### "Little-endian bit order"
 
 Why would we need a third representation? Let's first pose the problem. Imagine we have a fixed amount of validators, equal to 9, and we want to represent whether they attested in a block or not. If the validators 0, 1 and 8 attested, we may represent this with a boolean array, as follows:
 

From 76c73eab8c4fba307d8d0442f335feedd296910c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 11:33:48 +0100
Subject: [PATCH 19/27] Remove length from bitlist type. Add bitlist
 encoding-decoding to attestation and pending attestation

---
 .../state_transition/operations.ex            |  2 +-
 lib/ssz_ex.ex                                 |  5 ++-
 lib/types/beacon_chain/attestation.ex         | 10 +++++
 lib/types/beacon_chain/pending_attestation.ex | 10 +++++
 lib/utils/bit_list.ex                         | 44 +++++++------------
 test/unit/bit_list_test.exs                   | 19 ++++----
 6 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/lib/lambda_ethereum_consensus/state_transition/operations.ex b/lib/lambda_ethereum_consensus/state_transition/operations.ex
index 1bf76cc18..4ae6daed3 100644
--- a/lib/lambda_ethereum_consensus/state_transition/operations.ex
+++ b/lib/lambda_ethereum_consensus/state_transition/operations.ex
@@ -845,7 +845,7 @@ defmodule LambdaEthereumConsensus.StateTransition.Operations do
   end
 
   defp check_matching_aggregation_bits_length(attestation, beacon_committee) do
-    if BitList.length_of_bitlist(attestation.aggregation_bits) == length(beacon_committee) do
+    if BitList.length(attestation.aggregation_bits) == length(beacon_committee) do
       :ok
     else
       {:error, "Mismatched aggregation bits length"}
diff --git a/lib/ssz_ex.ex b/lib/ssz_ex.ex
index d3fae492c..9bb315cc4 100644
--- a/lib/ssz_ex.ex
+++ b/lib/ssz_ex.ex
@@ -354,7 +354,7 @@ defmodule LambdaEthereumConsensus.SszEx do
     if len > max_size do
       {:error, "excess bits"}
     else
-      {:ok, BitList.to_bytes({bit_list, len})}
+      {:ok, BitList.to_bytes(bit_list)}
     end
   end
 
@@ -407,7 +407,8 @@ defmodule LambdaEthereumConsensus.SszEx do
 
   defp decode_bitlist(bit_list, max_size) when bit_size(bit_list) > 0 do
     num_bytes = byte_size(bit_list)
-    {decoded, len} = BitList.new(bit_list)
+    decoded = BitList.new(bit_list)
+    len = BitList.length(bit_list)
 
     cond do
       len < 0 ->
diff --git a/lib/types/beacon_chain/attestation.ex b/lib/types/beacon_chain/attestation.ex
index da565156a..8ca6a6d29 100644
--- a/lib/types/beacon_chain/attestation.ex
+++ b/lib/types/beacon_chain/attestation.ex
@@ -3,6 +3,8 @@ defmodule Types.Attestation do
   Struct definition for `AttestationMainnet`.
   Related definitions in `native/ssz_nif/src/types/`.
   """
+  alias LambdaEthereumConsensus.Utils.BitList
+
   @behaviour LambdaEthereumConsensus.Container
 
   fields = [
@@ -29,4 +31,12 @@ defmodule Types.Attestation do
       {:signature, TypeAliases.bls_signature()}
     ]
   end
+
+  def encode(%__MODULE__{} = map) do
+    Map.update!(map, :aggregation_bits, &BitList.to_bytes/1)
+  end
+
+  def decode(%__MODULE__{} = map) do
+    Map.update!(map, :aggregation_bits, &BitList.new/1)
+  end
 end
diff --git a/lib/types/beacon_chain/pending_attestation.ex b/lib/types/beacon_chain/pending_attestation.ex
index d04d22815..a072d7713 100644
--- a/lib/types/beacon_chain/pending_attestation.ex
+++ b/lib/types/beacon_chain/pending_attestation.ex
@@ -3,6 +3,8 @@ defmodule Types.PendingAttestation do
   Struct definition for `PendingAttestation`.
   Related definitions in `native/ssz_nif/src/types/`.
   """
+  alias LambdaEthereumConsensus.Utils.BitList
+
   @behaviour LambdaEthereumConsensus.Container
 
   fields = [
@@ -32,4 +34,12 @@ defmodule Types.PendingAttestation do
       {:proposer_index, TypeAliases.validator_index()}
     ]
   end
+
+  def encode(%__MODULE__{} = map) do
+    Map.update!(map, :aggregation_bits, &BitList.to_bytes/1)
+  end
+
+  def decode(%__MODULE__{} = map) do
+    Map.update!(map, :aggregation_bits, &BitList.new/1)
+  end
 end
diff --git a/lib/utils/bit_list.ex b/lib/utils/bit_list.ex
index fa64ca4dd..2cecb5a15 100644
--- a/lib/utils/bit_list.ex
+++ b/lib/utils/bit_list.ex
@@ -1,9 +1,9 @@
 defmodule LambdaEthereumConsensus.Utils.BitList do
   @moduledoc """
-    Set of utilities to interact with BitList, represented as {bitstring, len}.
+  Set of utilities to interact with BitList, represented as a bitstring.
   """
   alias LambdaEthereumConsensus.Utils.BitField
-  @type t :: {bitstring, integer()}
+  @type t :: bitstring
   @bits_per_byte 8
   @sentinel_bit 1
   @bits_in_sentinel_bit 1
@@ -15,22 +15,19 @@ defmodule LambdaEthereumConsensus.Utils.BitList do
   def new(bitstring) when is_bitstring(bitstring) do
     # Change the byte order from little endian to big endian (reverse bytes).
     num_bits = bit_size(bitstring)
-    len = length_of_bitlist(bitstring)
 
     <<pre::integer-little-size(num_bits - @bits_per_byte),
       last_byte::integer-little-size(@bits_per_byte)>> =
       bitstring
 
-    decoded =
-      <<remove_trailing_bit(<<last_byte>>)::bitstring,
-        pre::integer-size(num_bits - @bits_per_byte)>>
-
-    {decoded, len}
+    <<remove_trailing_bit(<<last_byte>>)::bitstring,
+      pre::integer-size(num_bits - @bits_per_byte)>>
   end
 
   @spec to_bytes(t) :: bitstring
-  def to_bytes({bit_list, len}) do
+  def to_bytes(bit_list) do
     # Change the byte order from big endian to little endian (reverse bytes).
+    len = bit_size(bit_list)
     r = rem(len, @bits_per_byte)
 
     <<pre::integer-size(r), post::integer-size(len - r)>> = bit_list
@@ -40,8 +37,9 @@ defmodule LambdaEthereumConsensus.Utils.BitList do
   end
 
   @spec to_packed_bytes(t) :: bitstring
-  def to_packed_bytes({bit_list, len}) do
+  def to_packed_bytes(bit_list) do
     # Change the byte order from big endian to little endian (reverse bytes).
+    len = bit_size(bit_list)
     r = rem(len, @bits_per_byte)
 
     <<pre::integer-size(r), post::integer-size(len - r)>> = bit_list
@@ -55,37 +53,27 @@ defmodule LambdaEthereumConsensus.Utils.BitList do
   Equivalent to bit_list[index] == 1.
   """
   @spec set?(t, non_neg_integer) :: boolean
-  def set?({bit_list, _}, index), do: BitField.set?(bit_list, index)
+  def set?(bit_list, index), do: BitField.set?(bit_list, index)
 
   @doc """
   Sets a bit (turns it to 1).
   Equivalent to bit_list[index] = 1.
   """
   @spec set(t, non_neg_integer) :: t
-  def set({bit_list, len}, index), do: {BitField.set(bit_list, index), len}
+  def set(bit_list, index), do: BitField.set(bit_list, index)
 
   @doc """
   Clears a bit (turns it to 0).
   Equivalent to bit_list[index] = 0.
   """
   @spec clear(t, non_neg_integer) :: t
-  def clear({bit_list, len}, index), do: {BitField.clear(bit_list, index), len}
-
-  def length_of_bitlist(bitlist) when is_binary(bitlist) do
-    bit_size = bit_size(bitlist)
-    <<_::size(bit_size - @bits_per_byte), last_byte>> = bitlist
-    bit_size - leading_zeros(<<last_byte>>) - @bits_in_sentinel_bit
-  end
+  def clear(bit_list, index), do: BitField.clear(bit_list, index)
 
-  defp leading_zeros(<<@sentinel_bit::@bits_in_sentinel_bit, _::7>>), do: 0
-  defp leading_zeros(<<0::1, @sentinel_bit::@bits_in_sentinel_bit, _::6>>), do: 1
-  defp leading_zeros(<<0::2, @sentinel_bit::@bits_in_sentinel_bit, _::5>>), do: 2
-  defp leading_zeros(<<0::3, @sentinel_bit::@bits_in_sentinel_bit, _::4>>), do: 3
-  defp leading_zeros(<<0::4, @sentinel_bit::@bits_in_sentinel_bit, _::3>>), do: 4
-  defp leading_zeros(<<0::5, @sentinel_bit::@bits_in_sentinel_bit, _::2>>), do: 5
-  defp leading_zeros(<<0::6, @sentinel_bit::@bits_in_sentinel_bit, _::1>>), do: 6
-  defp leading_zeros(<<0::7, @sentinel_bit::@bits_in_sentinel_bit>>), do: 7
-  defp leading_zeros(<<0::8>>), do: 8
+  @doc """
+  Calculates the length of the bit_list.
+  """
+  @spec length(t) :: non_neg_integer()
+  def length(bit_list), do: bit_size(bit_list)
 
   @spec remove_trailing_bit(binary()) :: bitstring()
   defp remove_trailing_bit(<<@sentinel_bit::@bits_in_sentinel_bit, rest::7>>), do: <<rest::7>>
diff --git a/test/unit/bit_list_test.exs b/test/unit/bit_list_test.exs
index b065e7c97..db18fdfbc 100644
--- a/test/unit/bit_list_test.exs
+++ b/test/unit/bit_list_test.exs
@@ -1,25 +1,24 @@
 defmodule BitListTest do
   use ExUnit.Case
-  alias LambdaEthereumConsensus.SszEx
   alias LambdaEthereumConsensus.Utils.BitList
 
   describe "Sub-byte BitList" do
     test "build from binary" do
-      input_encoded = <<237, 7>>
-      {:ok, decoded} = SszEx.decode(input_encoded, {:bitlist, 10})
-      assert BitList.set?({decoded, 10}, 0) == true
-      assert BitList.set?({decoded, 10}, 1) == false
-      assert BitList.set?({decoded, 10}, 4) == false
-      assert BitList.set?({decoded, 10}, 9) == true
+      decoded = BitList.new(<<237, 7>>)
 
-      {updated_bitlist, _} =
-        {decoded, 10}
+      assert BitList.set?(decoded, 0) == true
+      assert BitList.set?(decoded, 1) == false
+      assert BitList.set?(decoded, 4) == false
+      assert BitList.set?(decoded, 9) == true
+
+      updated_bitlist =
+        decoded
         |> BitList.set(1)
         |> BitList.set(4)
         |> BitList.clear(0)
         |> BitList.clear(9)
 
-      {:ok, <<254, 5>>} = SszEx.encode(updated_bitlist, {:bitlist, 10})
+      <<254, 5>> = BitList.to_bytes(updated_bitlist)
     end
 
     test "sets a single bit" do

From ab4c24c6f9f1fa5d92183a7ff4c4c9cbef44800a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 11:42:56 +0100
Subject: [PATCH 20/27] add bitlist library usage in participated?(aggr_bits)
 function

---
 lib/lambda_ethereum_consensus/p2p/gossip/handler.ex      | 2 +-
 .../state_transition/accessors.ex                        | 9 ++-------
 2 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
index 370641023..0cf33b1aa 100644
--- a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
+++ b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
@@ -25,7 +25,7 @@ defmodule LambdaEthereumConsensus.P2P.Gossip.Handler do
   def handle_beacon_aggregate_and_proof(%SignedAggregateAndProof{
         message: %AggregateAndProof{aggregate: aggregate}
       }) do
-    votes = BitVector.count(aggregate.aggregation_bits)
+    votes = BitField.count(aggregate.aggregation_bits)
     slot = aggregate.data.slot
     root = aggregate.data.beacon_block_root |> Base.encode16()
 
diff --git a/lib/lambda_ethereum_consensus/state_transition/accessors.ex b/lib/lambda_ethereum_consensus/state_transition/accessors.ex
index 0c4d8442b..77b015b46 100644
--- a/lib/lambda_ethereum_consensus/state_transition/accessors.ex
+++ b/lib/lambda_ethereum_consensus/state_transition/accessors.ex
@@ -6,6 +6,7 @@ defmodule LambdaEthereumConsensus.StateTransition.Accessors do
   alias LambdaEthereumConsensus.SszEx
   alias LambdaEthereumConsensus.StateTransition.{Cache, Math, Misc, Predicates}
   alias LambdaEthereumConsensus.Utils
+  alias LambdaEthereumConsensus.Utils.BitList
   alias LambdaEthereumConsensus.Utils.Randao
   alias Types.{Attestation, BeaconState, IndexedAttestation, SyncCommittee, Validator}
 
@@ -510,13 +511,7 @@ defmodule LambdaEthereumConsensus.StateTransition.Accessors do
     |> Enum.sort()
   end
 
-  defp participated?(bits, index) do
-    # The bit order inside the byte is reversed (e.g. bits[0] is the 8th bit).
-    # Here we keep the byte index the same, but reverse the bit index.
-    bit_index = index + 7 - 2 * rem(index, 8)
-    <<_::size(bit_index), flag::1, _::bits>> = bits
-    flag == 1
-  end
+  defp participated?(bits, index), do: BitList.set?(bits, index)
 
   @doc """
   Return the combined effective balance of the ``indices``.

From a55d94cc3056d018bc92231e0fcbe368a64133b8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 11:45:58 +0100
Subject: [PATCH 21/27] remove empty line

---
 docs/bitvectors.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/docs/bitvectors.md b/docs/bitvectors.md
index 52de54700..26d1282ce 100644
--- a/docs/bitvectors.md
+++ b/docs/bitvectors.md
@@ -2,7 +2,6 @@
 
 ## Representing integers
 
-
 Computers use transistors to store data. These electrical components only have two possible states: `clear` or `set`. Numerically, we represent the clear state as a `0` and the set state as `1`. Using 1s and 0s, we can represent any integer number using the binary system, the same way we use the decimal system in our daily lives.
 
 As an example, let's take the number 259. For its decimal representation, we use the digits 2, 5, and 9, because each digit, or coefficient, represents a power of 10:

From 626dca418a4645d860ebb091643e8c0324b616e9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 11:47:49 +0100
Subject: [PATCH 22/27] fix alias

---
 lib/lambda_ethereum_consensus/p2p/gossip/handler.ex | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
index 0cf33b1aa..200f4f029 100644
--- a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
+++ b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
@@ -7,7 +7,7 @@ defmodule LambdaEthereumConsensus.P2P.Gossip.Handler do
 
   alias LambdaEthereumConsensus.Beacon.BeaconChain
   alias LambdaEthereumConsensus.Beacon.PendingBlocks
-  alias LambdaEthereumConsensus.Utils.BitVector
+  alias LambdaEthereumConsensus.Utils.BitField
   alias Types.{AggregateAndProof, SignedAggregateAndProof, SignedBeaconBlock}
 
   def handle_beacon_block(%SignedBeaconBlock{message: block} = signed_block) do

From 9571cd3b1acf0ba759a9f42b0d9b2d1189bfa542 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 12:10:58 +0100
Subject: [PATCH 23/27] fix dialyzer

---
 lib/lambda_ethereum_consensus/p2p/gossip/handler.ex |  2 +-
 lib/ssz_ex.ex                                       | 10 +++-------
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
index 200f4f029..18d4ef71d 100644
--- a/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
+++ b/lib/lambda_ethereum_consensus/p2p/gossip/handler.ex
@@ -29,7 +29,7 @@ defmodule LambdaEthereumConsensus.P2P.Gossip.Handler do
     slot = aggregate.data.slot
     root = aggregate.data.beacon_block_root |> Base.encode16()
 
-    # We are getting ~500 attestations in half a second. This is overwheling the store GenServer at the moment.
+    # We are getting ~500 attestations in half a second. This is overwhelming the store GenServer at the moment.
     # Store.on_attestation(aggregate)
 
     Logger.debug(
diff --git a/lib/ssz_ex.ex b/lib/ssz_ex.ex
index 0857060c5..73c147f41 100644
--- a/lib/ssz_ex.ex
+++ b/lib/ssz_ex.ex
@@ -300,8 +300,7 @@ defmodule LambdaEthereumConsensus.SszEx do
   end
 
   def pack_bits(value, :bitlist) do
-    len = value |> bit_size()
-    {value, len} |> BitList.to_packed_bytes() |> pack_bytes()
+    value |> BitList.to_packed_bytes() |> pack_bytes()
   end
 
   def chunk_count({:list, type, max_size}) do
@@ -411,9 +410,6 @@ defmodule LambdaEthereumConsensus.SszEx do
     len = BitList.length(bit_list)
 
     cond do
-      len < 0 ->
-        {:error, "missing length information"}
-
       div(len, @bits_per_byte) + 1 != num_bytes ->
         {:error, "invalid byte count"}
 
@@ -653,7 +649,7 @@ defmodule LambdaEthereumConsensus.SszEx do
 
   defp check_first_offset([{offset, _} | _rest], items_index, _binary_size) do
     cond do
-      offset < items_index -> {:error, "OffsetIntoFixedPortion"}
+      offset < items_index -> {:error, "OffsetIntoFixedPortion (#{offset})"}
       offset > items_index -> {:error, "OffsetSkipsVariableBytes"}
       true -> :ok
     end
@@ -739,7 +735,7 @@ defmodule LambdaEthereumConsensus.SszEx do
   defp sanitize_offset(offset, previous_offset, _num_bytes, num_fixed_bytes) do
     cond do
       offset < num_fixed_bytes ->
-        {:error, "OffsetIntoFixedPortion"}
+        {:error, "OffsetIntoFixedPortion #{offset}"}
 
       previous_offset == nil && offset != num_fixed_bytes ->
         {:error, "OffsetSkipsVariableBytes"}

From 0ca18caee04065a23ad600da33fd16919b801064 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 12:32:54 +0100
Subject: [PATCH 24/27] add bitvectors

---
 lib/types/beacon_chain/beacon_state.ex   |  9 +++++++--
 lib/types/beacon_chain/sync_aggregate.ex | 14 +++++++++++++-
 test/spec/utils.ex                       |  2 +-
 3 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/lib/types/beacon_chain/beacon_state.ex b/lib/types/beacon_chain/beacon_state.ex
index 8f51a19be..ccfb2ac49 100644
--- a/lib/types/beacon_chain/beacon_state.ex
+++ b/lib/types/beacon_chain/beacon_state.ex
@@ -3,9 +3,10 @@ defmodule Types.BeaconState do
   Struct definition for `BeaconState`.
   Related definitions in `native/ssz_nif/src/types/`.
   """
-  @behaviour LambdaEthereumConsensus.Container
   alias LambdaEthereumConsensus.Utils.BitVector
 
+  @behaviour LambdaEthereumConsensus.Container
+
   fields = [
     :genesis_time,
     :genesis_validators_root,
@@ -114,6 +115,7 @@ defmodule Types.BeaconState do
     |> Map.update!(:previous_epoch_participation, &Aja.Vector.to_list/1)
     |> Map.update!(:current_epoch_participation, &Aja.Vector.to_list/1)
     |> Map.update!(:latest_execution_payload_header, &Types.ExecutionPayloadHeader.encode/1)
+    |> Map.update!(:justification_bits, &BitVector.to_bytes/1)
   end
 
   def decode(%__MODULE__{} = map) do
@@ -124,6 +126,9 @@ defmodule Types.BeaconState do
     |> Map.update!(:previous_epoch_participation, &Aja.Vector.new/1)
     |> Map.update!(:current_epoch_participation, &Aja.Vector.new/1)
     |> Map.update!(:latest_execution_payload_header, &Types.ExecutionPayloadHeader.decode/1)
+    |> Map.update!(:justification_bits, fn bits ->
+      BitVector.new(bits, Constants.justification_bits_length())
+    end)
   end
 
   @doc """
@@ -261,7 +266,7 @@ defmodule Types.BeaconState do
        {:list, TypeAliases.participation_flags(), ChainSpec.get("VALIDATOR_REGISTRY_LIMIT")}},
       {:current_epoch_participation,
        {:list, TypeAliases.participation_flags(), ChainSpec.get("VALIDATOR_REGISTRY_LIMIT")}},
-      {:justification_bits, {:bitvector, ChainSpec.get("JUSTIFICATION_BITS_LENGTH")}},
+      {:justification_bits, {:bitvector, Constants.justification_bits_length()}},
       {:previous_justified_checkpoint, Types.Checkpoint},
       {:current_justified_checkpoint, Types.Checkpoint},
       {:finalized_checkpoint, Types.Checkpoint},
diff --git a/lib/types/beacon_chain/sync_aggregate.ex b/lib/types/beacon_chain/sync_aggregate.ex
index 38240ae6a..8194cbf2d 100644
--- a/lib/types/beacon_chain/sync_aggregate.ex
+++ b/lib/types/beacon_chain/sync_aggregate.ex
@@ -3,6 +3,8 @@ defmodule Types.SyncAggregate do
   Struct definition for `SyncAggregate`.
   Related definitions in `native/ssz_nif/src/types/`.
   """
+  alias LambdaEthereumConsensus.Utils.BitVector
+
   @behaviour LambdaEthereumConsensus.Container
 
   fields = [
@@ -15,7 +17,7 @@ defmodule Types.SyncAggregate do
 
   @type t :: %__MODULE__{
           # max size SYNC_COMMITTEE_SIZE
-          sync_committee_bits: Types.bitvector(),
+          sync_committee_bits: BitVector.t(),
           sync_committee_signature: Types.bls_signature()
         }
 
@@ -26,4 +28,14 @@ defmodule Types.SyncAggregate do
       {:sync_committee_signature, TypeAliases.bls_signature()}
     ]
   end
+
+  def encode(%__MODULE__{} = map) do
+    Map.update!(map, :sync_committee_bits, &BitVector.to_bytes/1)
+  end
+
+  def decode(%__MODULE__{} = map) do
+    Map.update!(map, :sync_committee_bits, fn bits ->
+      BitVector.new(bits, ChainSpec.get("SYNC_COMMITTEE_SIZE"))
+    end)
+  end
 end
diff --git a/test/spec/utils.ex b/test/spec/utils.ex
index 928a9a677..cf7597aaa 100644
--- a/test/spec/utils.ex
+++ b/test/spec/utils.ex
@@ -132,7 +132,7 @@ defmodule SpecTestUtils do
   def sanitize_ssz(vector_elements, {:vector, module, _size} = _schema) when is_atom(module),
     do: Enum.map(vector_elements, &struct!(module, &1))
 
-  def sanitize_ssz(bitlist, {:bitlist, _size} = _schema), do: elem(BitList.new(bitlist), 0)
+  def sanitize_ssz(bitlist, {:bitlist, _size} = _schema), do: BitList.new(bitlist)
   def sanitize_ssz(bitvector, {:bitvector, size} = _schema), do: BitVector.new(bitvector, size)
 
   def sanitize_ssz(0, {:list, {:int, 8}, _size} = _schema), do: []

From a7b60fd3f11ed29164eae2237c6c60815044030e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 14:33:14 +0100
Subject: [PATCH 25/27] fix case for missing length info

---
 lib/ssz_ex.ex             | 5 ++++-
 test/unit/ssz_ex_test.exs | 4 ++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/lib/ssz_ex.ex b/lib/ssz_ex.ex
index 73c147f41..1452c6f0b 100644
--- a/lib/ssz_ex.ex
+++ b/lib/ssz_ex.ex
@@ -407,9 +407,12 @@ defmodule LambdaEthereumConsensus.SszEx do
   defp decode_bitlist(bit_list, max_size) when bit_size(bit_list) > 0 do
     num_bytes = byte_size(bit_list)
     decoded = BitList.new(bit_list)
-    len = BitList.length(bit_list)
+    len = BitList.length(decoded)
 
     cond do
+      match?(<<_::binary-size(num_bytes - 1), 0>>, bit_list) ->
+        {:error, "BitList has no length information."}
+
       div(len, @bits_per_byte) + 1 != num_bytes ->
         {:error, "invalid byte count"}
 
diff --git a/test/unit/ssz_ex_test.exs b/test/unit/ssz_ex_test.exs
index 6a816edd4..ccd9c2e3b 100644
--- a/test/unit/ssz_ex_test.exs
+++ b/test/unit/ssz_ex_test.exs
@@ -465,8 +465,8 @@ defmodule Unit.SSZExTest do
 
   test "serialize and deserialize bitlist" do
     encoded_bytes = <<160, 92, 1>>
-    assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 16})
-    assert {:ok, ^encoded_bytes} = SszEx.encode(decoded_bytes, {:bitlist, 16})
+    assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 30})
+    assert {:ok, ^encoded_bytes} = SszEx.encode(decoded_bytes, {:bitlist, 30})
 
     encoded_bytes = <<255, 1>>
     assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 16})

From d39fe93ecbe8e5618102c9882d0345dbe2c65e2e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 14:58:50 +0100
Subject: [PATCH 26/27] update places where bitvectors are used

---
 .../state_transition/epoch_processing.ex      | 25 ++++++-------------
 .../state_transition/operations.ex            |  9 +++----
 2 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/lib/lambda_ethereum_consensus/state_transition/epoch_processing.ex b/lib/lambda_ethereum_consensus/state_transition/epoch_processing.ex
index 0bea9ce40..6baa08d55 100644
--- a/lib/lambda_ethereum_consensus/state_transition/epoch_processing.ex
+++ b/lib/lambda_ethereum_consensus/state_transition/epoch_processing.ex
@@ -358,16 +358,10 @@ defmodule LambdaEthereumConsensus.StateTransition.EpochProcessing do
   end
 
   defp update_first_bit(state) do
-    bits =
-      state.justification_bits
-      |> BitVector.new(4)
-      |> BitVector.shift_higher(1)
-      |> BitVector.to_bytes()
-
     %BeaconState{
       state
       | previous_justified_checkpoint: state.current_justified_checkpoint,
-        justification_bits: bits
+        justification_bits: BitVector.shift_higher(state.justification_bits, 1)
     }
   end
 
@@ -377,13 +371,11 @@ defmodule LambdaEthereumConsensus.StateTransition.EpochProcessing do
     with {:ok, block_root} <- Accessors.get_block_root(state, epoch) do
       new_checkpoint = %Types.Checkpoint{epoch: epoch, root: block_root}
 
-      bits =
-        state.justification_bits
-        |> BitVector.new(4)
-        |> BitVector.set(index)
-        |> BitVector.to_bytes()
-
-      %{state | current_justified_checkpoint: new_checkpoint, justification_bits: bits}
+      %{
+        state
+        | current_justified_checkpoint: new_checkpoint,
+          justification_bits: BitVector.set(state.justification_bits, index)
+      }
       |> then(&{:ok, &1})
     end
   end
@@ -395,10 +387,7 @@ defmodule LambdaEthereumConsensus.StateTransition.EpochProcessing do
          range,
          offset
        ) do
-    bits_set =
-      state.justification_bits
-      |> BitVector.new(4)
-      |> BitVector.all?(range)
+    bits_set = BitVector.all?(state.justification_bits, range)
 
     if bits_set and old_justified_checkpoint.epoch + offset == current_epoch do
       %BeaconState{state | finalized_checkpoint: old_justified_checkpoint}
diff --git a/lib/lambda_ethereum_consensus/state_transition/operations.ex b/lib/lambda_ethereum_consensus/state_transition/operations.ex
index 4ae6daed3..3844688b1 100644
--- a/lib/lambda_ethereum_consensus/state_transition/operations.ex
+++ b/lib/lambda_ethereum_consensus/state_transition/operations.ex
@@ -117,13 +117,10 @@ defmodule LambdaEthereumConsensus.StateTransition.Operations do
     # Verify sync committee aggregate signature signing over the previous slot block root
     committee_pubkeys = state.current_sync_committee.pubkeys
 
-    sync_committee_bits =
-      BitVector.new(aggregate.sync_committee_bits, ChainSpec.get("SYNC_COMMITTEE_SIZE"))
-
     participant_pubkeys =
       committee_pubkeys
       |> Enum.with_index()
-      |> Enum.filter(fn {_, index} -> BitVector.set?(sync_committee_bits, index) end)
+      |> Enum.filter(fn {_, index} -> BitVector.set?(aggregate.sync_committee_bits, index) end)
       |> Enum.map(fn {public_key, _} -> public_key end)
 
     previous_slot = max(state.slot, 1) - 1
@@ -138,7 +135,7 @@ defmodule LambdaEthereumConsensus.StateTransition.Operations do
       # Compute participant and proposer rewards
       {participant_reward, proposer_reward} = compute_sync_aggregate_rewards(state)
 
-      total_proposer_reward = BitVector.count(sync_committee_bits) * proposer_reward
+      total_proposer_reward = BitVector.count(aggregate.sync_committee_bits) * proposer_reward
 
       # PERF: make Map with committee_index by pubkey, then
       # Enum.map validators -> new balance all in place, without map_reduce
@@ -146,7 +143,7 @@ defmodule LambdaEthereumConsensus.StateTransition.Operations do
       |> get_sync_committee_indices(committee_pubkeys)
       |> Stream.with_index()
       |> Stream.map(fn {validator_index, committee_index} ->
-        if BitVector.set?(sync_committee_bits, committee_index),
+        if BitVector.set?(aggregate.sync_committee_bits, committee_index),
           do: {validator_index, participant_reward},
           else: {validator_index, -participant_reward}
       end)

From 61b0b953133e0b035a23ffb50792f6489cf37739 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1s=20Arjovsky?= <t.arjovsky@gmail.com>
Date: Tue, 20 Feb 2024 15:39:08 +0100
Subject: [PATCH 27/27] roll back size change

---
 test/unit/ssz_ex_test.exs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/test/unit/ssz_ex_test.exs b/test/unit/ssz_ex_test.exs
index ccd9c2e3b..6a816edd4 100644
--- a/test/unit/ssz_ex_test.exs
+++ b/test/unit/ssz_ex_test.exs
@@ -465,8 +465,8 @@ defmodule Unit.SSZExTest do
 
   test "serialize and deserialize bitlist" do
     encoded_bytes = <<160, 92, 1>>
-    assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 30})
-    assert {:ok, ^encoded_bytes} = SszEx.encode(decoded_bytes, {:bitlist, 30})
+    assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 16})
+    assert {:ok, ^encoded_bytes} = SszEx.encode(decoded_bytes, {:bitlist, 16})
 
     encoded_bytes = <<255, 1>>
     assert {:ok, decoded_bytes} = SszEx.decode(encoded_bytes, {:bitlist, 16})