Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize crdt.Array Data Structure for Improved Performance #980

Open
hackerwins opened this issue Aug 27, 2024 · 4 comments
Open

Optimize crdt.Array Data Structure for Improved Performance #980

hackerwins opened this issue Aug 27, 2024 · 4 comments
Assignees
Labels
enhancement 🌟 New feature or request

Comments

@hackerwins
Copy link
Member

Description:

The current implementation of crdt.Array is based on RGATreeList, which uses both SplayTree and LLRBTree for internal element retrieval. While this structure works, there's potential for optimization, especially considering the random access nature of crdt.Array operations.

The current implementation of crdt.Array relies on a data structure called RGATreeList, which incorporates both SplayTree and LLRBTree for managing internal elements. While this approach is functional, there's room for improvement, particularly given the random access nature of crdt.Array operations. The SplayTree is utilized to locate nodes based on their indices during Document.Update calls, while the LLRBTree is employed to find nodes according to their Element.CreatedAt values in various operations.

However, the use of SplayTree may not be ideal for crdt.Array. Initially designed to efficiently handle text editing workloads, the SplayTree's characteristic of moving recently accessed nodes to the root could potentially create unnecessary overhead in the context of crdt.Array's random access patterns. This suggests that a different data structure might be more appropriate for optimizing performance.

To address this, we propose investigating and implementing an alternative data structure that is better suited to the access patterns of crdt.Array. The goal of this change is to enhance the performance of local editing operations, potentially leading to improved efficiency and responsiveness in the system overall.

Considerations for Implementation

  1. Benchmark current vs. proposed structures in various scenarios
  2. Assess memory usage of the new structure
  3. Consider scalability for future requirements(Array.Move, ...)

Why:

Implementing a more fitting data structure for crdt.Array will likely improve its performance during local editing, especially given the random access nature of the operations. This optimization will lead to smoother user experiences and better resource management within the application.

@hackerwins
Copy link
Member Author

Related to #989

@binary-ho
Copy link
Contributor

Can i try this?

@m4ushold
Copy link
Contributor

m4ushold commented Sep 4, 2024

Intuitively, using BBSTs (Balanced Binary Search Trees) such as Treap, AVL, or LLRB trees, which have an upper bound of $\log{n}$ for time complexity, seems preferable.
However, since crdt.Array does not require weight-based access, data structures like Y-fast tries or Van Emde Boas trees could be potential alternatives. Nevertheless, given that searches are based on relative positions between nodes rather than key-based access, it’s uncertain whether these structures can effectively support this. Further research and discussion seem necessary.
If anyone has a better idea, please let me know :)

@m4ushold
Copy link
Contributor

m4ushold commented Sep 7, 2024

I have reviewed the Y-fast Trie and the Van Emde Boas Tree. Both of these data structures are designed for integer-based keys, which may not be suitable unless there is a way to efficiently update the key values over a range. Therefore, it seems more practical to experiment with several balanced binary search trees (BBSTs) instead.

  • AVL Tree: This data structure requires additional memory to store metadata related to the height of each node, which might result in higher memory usage compared to other trees.
  • Treap: This appears to be one of the more straightforward choices.
  • LLRB Tree: It's already implemented, so I think I'll just need to add a few features.
  • RB Tree: Implementing this could be hard.
    It would be advisable to select a few of these options for experimentation.

@binary-ho binary-ho self-assigned this Sep 22, 2024
@devleejb devleejb moved this from Backlog to In progress in Yorkie Project - 2024 Nov 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request
Projects
Status: In progress
Development

No branches or pull requests

3 participants