-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464
Comments
Just now, when I was taking a dump, I thought of a simple solution for the origin server cluster. It can be independent of a centralized data system and rely on the client to establish flow information. For example, if there are three origin servers, when the edge server EdgeA does not have any flow to access origin server A, it immediately accesses origin server B. If origin server B has a flow, the client requests origin server B and also informs origin server A about this information. This completes the exchange of information, making the entire origin server system stateless.
In this way, when the other edge server EdgeB connects to origin server A, origin server A knows that this flow is on origin server B, so it gives EdgeB a 302 redirect to origin server B. This means that once the information is established, other edge servers only need a single 302 redirect to know which origin server the flow is on.
If the edge server finds that the flow does not exist when accessing the second origin server, it informs the first origin server and starts the polling process again. In this system, the worst-case scenario requires polling all the origin servers, but this process can be done very quickly because the network between the edge server and the origin server is generally very good. Regardless of which origin server crashes or if the flow is pushed to a different origin server, the system will be rebuilt and this process does not require synchronizing all the origin servers.
|
Looking forward to everyone submitting PR.
|
What PR are you expecting? Seriously~
|
Figure it out first, then talk about it.
|
When the hot standby switch of the origin server is made, will the user experience any lag? If the anchor is streaming to origin server node A and node A goes down, and then the streaming is switched to node B, will there be any lag perceived by the user during this origin server switch?
|
Check the network quality and buffering settings on the playback side. If the player buffers a few seconds of data, the user side may not experience lag, but there may be a jump in the picture. If the player side does not buffer any data, the picture will freeze first and then continue when data is received.
|
Config for 19350:
Config for 19351:
Publish stream to 19350:
Then play the stream on 19351, click here, then the player will be redirected to 19350. Logs on 19351, redirect client to 19350:
|
We can also start a edge server, which will follow the RTMP302, the config:
The config for origin 19350:
The config for origin 19351:
Then publish to origin 19350:
Then start player to play stream from edge, click here. The log on edge server, connect to 19350 but redirected to 19350:
|
Fixed. |
Please help to test this feature. |
WIKI: https://github.com/ossrs/srs/wiki/v3_CN_OriginCluster https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster
|
Thank you. Happy New Year.
|
The design goal of Origin Cluster is a cluster with less than 5k streams or for disaster recovery with a small number of streams. If you need a cluster with 100k streams, please refer to #1607 (comment). In this solution, each origin server accesses each other, which means that each origin server is an independent service. Since each origin server needs to serve the edge or be accessed, each origin server needs to have a service address. There are two ways to achieve this:
Stateless Origin Server Cluster (Deployment)Suitable for very few streams, such as <100 streams, 1-3 origin servers. The origin server cluster of SRS itself is stateful, which means that requesting a certain stream must be done on a specific server, rather than being able to pull the stream from any server. We cannot attach multiple origin servers behind an SLB (Server Load Balancer), as when playing a stream, the SLB will randomly select an origin server, which may lead to accessing the wrong server or the stream's status and data being located on a specific origin server, rather than being stateless. So, when we talk about a stateless origin server cluster here, it refers to the deployment of the origin server cluster in the form of a stateless application. Since each origin server requires an independent deployment, each deployment has only one replica, and each deployment corresponds to a service (ClusterIP) with a unique name. In reality, it is equivalent to having only one origin server behind the SLB, for example:
Create a separate Deployment for each origin server with Replicas set to 1. Create a corresponding Service with ClusterIP type. This approach may be a bit cumbersome, but it will be easier to migrate to OCM (#1607) in the future. The origin server will be addressed using the service-name instead of the pod-name.service-name method. StatefulSet for Stateful Origin Server ClusterSuitable for a certain number of streams, such as <5k streams, and within 5-30 origin servers. In K8s, each origin server requires a responsive Service, which can be achieved by using StatefulSets and HeadlessService to enable addressing capability for each origin server Pod. For example:
Just create one StatefulSet and one Service, and set the Replicas to the number of origin servers. The origin servers are configured as It can be seen that it will indeed be cumbersome, and when adding new origin servers, it is necessary to update the configurations of other origin servers as well as the edge servers. This solution is suitable for up to 30 origin server nodes.
|
The origin server cluster supports a solution for less than 5k routes. Please refer to: #464 (comment) The origin server cluster supports a solution for less than 100k routes. Please refer to: #1607 (comment) Regarding the definition of the service address for the origin server in the origin server cluster, please refer to: #1501 (comment) Regarding the round-robin issue with multiple node origin servers in the origin server cluster, please refer to: #1501 (comment) Regarding the storage issue with the origin server cluster, please refer to: #1595 (comment) Regarding the API issue with the origin server cluster, please refer to: #1607 (comment)
|
In the example of StatefulSet in K8s, there is an example of deploying a Cassandra cluster, which is a type of KV storage. Since the names and addresses of the Pods are fixed, the first one is chosen as the SeedNode, which means that all the nodes will gossip with this node.
Simply put, Cassandra is also a cluster composed of a group of nodes, but it has a more complex communication mechanism that distinguishes roles such as SeedNode. The Origin Cluster does not want to implement such a complex logic. The future direction is to solve this part of the mechanism by relying on peripheral services through the HTTP API. For example, Go can be used to implement OCM (#1607), and Go can rely on KV to solve the central data storage problem.
|
OriginCluster needs to support the same configuration for easy deployment in K8s, so that it can access itself without causing any issues, but optimization is required. Refer to #1608.
|
In the upgrade, rollback, and grayscale mechanisms of the service, the origin server or origin server cluster can be directly restarted or improved by batch restart. This is mainly because the origin server generally has an edge as a proxy, and the edge will retry after disconnection, which has a minimal impact on users. Reference: #1579 (comment)
|
Currently, there can only be one origin server. When multiple edges connect to multiple origin servers, only one origin server can be selected at a given moment. Therefore, if a stream is sent to two out of N (N>=3) origin servers, such as for hot backup, there will always be one origin server without a stream. If an edge connects to this origin server, it will result in no stream. The edge will have to wait because it cannot know that the origin server does not have this stream.
From the perspective of hot backup and load balancing, it is necessary to support multiple origin servers. These origin servers need to communicate and synchronize their states. This way, when an edge connects to an origin server without a stream, the origin server can inform the edge of the correct origin server.
TRANS_BY_GPT3
The text was updated successfully, but these errors were encountered: