-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Come up with architecture for a scalable and highly available secondary server #1733
Comments
Here is the document that captures aspects related to this ticket : https://docs.google.com/document/d/1UNgcTBlvDSCqX5N-ai05vl6vrGBfTtgdPh-j50TibkU/edit?usp=sharing |
As part of the ongoing ticket, our team has initiated efforts to benchmark server performance. The key objectives for this task include:
The timeline for achieving objectives 3 and 4 is within the current sprint (PR 81). The results of this benchmarking exercise will play a crucial role in shaping our subsequent actions and decisions. |
In the PR-81, We spent time writing tests for the following along with the documentation and demonstrated them in the architecture call
The goal in this sprint is to expand the tests to cover all the notification scenarios and work with Chris to execute the benchmarking against the production environment atSigns. The documentation of the tests completed so far can be found in the branch Also, Planning to explore locust for load testing - https://locust.io/ |
Done with the phase -1 of writing scripts for the above mentioned scenarios and moved on to the locust Script to run multiple clients performing an unauthenticated scan/Info. Locust script can be found Next Goal is to narrow down on the performance i.e.., To get the metrics for the following scenarios and be able to predict the point where the server breaks down
Details collected so far can be seen in the following sheet. |
During this Sprint, we utilized the locust script to conduct a series of performance tests aimed at evaluating the scalability and resilience of our server infrastructure. Specifically, we focused on conducting lookup tests wherein we systematically increased both the number of client connections and the number of keys stored within the server. Test Conditions: Number of Keys: We systematically increased the quantity of keys stored within the server. We initiated the test with 5 unique keys and incrementally expanded it to 10, 100, 1000, and eventually 10,000 keys. Number of Clients: Simultaneously, we varied the number of client connections accessing the server. Beginning with a single client, we progressively scaled up the load to 10, 100, 200, 500, 1000, and ultimately 10,000 concurrent clients.
All the collected performance test metrics can be found in the following sheet. |
We collected metrics by running both client and server on the same VM. Metrics can be found in the following Same_VM_Metrics. Now, we aim to run the server and client on separate virtual machines (VMs) to ensure that their simultaneous operation on the machine does not impact overall performance. |
Is your feature request related to a problem? Please describe.
The secondary server as it is today is not scalable. The only possible scaling option today is vertical scaling. Because of the way our persistence works, it is not possible to run a secondary server per region and honor data locality, replication, etc..
Describe the solution you'd like
Come up with a design for the problem described above. The task will have the following sub-tasks:
Requirements Analysis:
Understand the current and anticipated future requirements: data volume, traffic patterns, and performance expectations.
Scalability Considerations:
Horizontal Scaling: Plan for distributing the load across multiple servers or instances. Implement load balancing mechanisms to evenly distribute incoming traffic.
Vertical Scaling: Consider scaling up resources (CPU, RAM) on individual servers if needed, although horizontal scaling often provides better long-term scalability.
High Availability Design:
Redundancy and Failover: Design the system with redundancy in mind to mitigate single points of failure. Implement failover mechanisms to ensure continuous service in case of server failures.
Replication: Employ data replication strategies to duplicate data across multiple servers or regions for resilience and data availability.
Fault-Tolerant Architecture: Use fault-tolerant technologies and practices to handle failures without service disruptions.
Database Considerations:
Scalable Database: Choose a database system that can scale horizontally
Replication and Backups: Implement database replication for data redundancy and backups to prevent data loss in case of failures.
Load Balancing and Traffic Management:
Implement load balancers to distribute incoming traffic evenly across multiple servers or regions.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: