Replies: 2 comments 1 reply
-
While we have fixed all the known issues with buckets and schema definition, we still have one issue with indexes in case of auto-fix at database startup time (this usually happens if the schema file contains information that differs from the actual files found in the database directory and migration to a new version of indexes). Having a way to check if a replica is consistent with the leader is good for many reasons. Even if using a RAFT election mechanism and leader/replica log should cover all the cases for inconsistencies, a not-discovered bug could cause issues with the replication. We could implement a
Then a new
This should be pretty straightforward when the nodes are not updating the database, so in the 1st version of this command could just acquire a read lock of the files is checking to block writes until the process is finished (transactions acquire locks on files before writing). Further versions could implement a more complex and smarter algorithm where the lock is acquired on the file only when the check of the file is completed and one or more issues are found. The lock will be the 1st phase of a distributed transaction where the 2nd phase contains the pages to replace. This means only the needed files would be locked until the repair is finished reducing the contention lock for active transactions. It would be also nice to have a setting where a regular repair replication is issued against all the nodes. This could be at a regular interval checking also the amount of messages per minute to issue this command when the number updates to the database are low. |
Beta Was this translation helpful? Give feedback.
-
This feature has been implemented under #250. |
Beta Was this translation helpful? Give feedback.
-
We've encountered situations where parts of a database on a node deployed in a cluster is corrupted. This can happen for:
It would be useful to have the ability to initiate a repair process where a replica can repair itself by requesting updates from the leader.
I'm sure this will involve covering some tricky situations, such as what happens if an election round occurs and a new leader is elected during a repair.
Beta Was this translation helpful? Give feedback.
All reactions