Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

winlinvip · 2015-08-23T11:49:38Z

Currently, there can only be one origin server. When multiple edges connect to multiple origin servers, only one origin server can be selected at a given moment. Therefore, if a stream is sent to two out of N (N>=3) origin servers, such as for hot backup, there will always be one origin server without a stream. If an edge connects to this origin server, it will result in no stream. The edge will have to wait because it cannot know that the origin server does not have this stream.

From the perspective of hot backup and load balancing, it is necessary to support multiple origin servers. These origin servers need to communicate and synchronize their states. This way, when an edge connects to an origin server without a stream, the origin server can inform the edge of the correct origin server.

TRANS_BY_GPT3

The text was updated successfully, but these errors were encountered:

winlinvip · 2017-06-05T01:30:13Z

Just now, when I was taking a dump, I thought of a simple solution for the origin server cluster. It can be independent of a centralized data system and rely on the client to establish flow information.

For example, if there are three origin servers, when the edge server EdgeA does not have any flow to access origin server A, it immediately accesses origin server B. If origin server B has a flow, the client requests origin server B and also informs origin server A about this information. This completes the exchange of information, making the entire origin server system stateless.

Step 1: Client request Origin A, 404 Not Found.
+-------+           +---------+
| EdgeA +------->---+ OriginA |
+-------+           +---------+


Step 2: Client request Origin B, 200 OK.
+-------+           +---------+
| EdgeA +------->---+ OriginB |
+-------+           +---------+

Step 3: Client notify Origin A where the stream is.
+-------+           +---------+
| EdgeA +------->---+ OriginA |
+-------+           +---------+

In this way, when the other edge server EdgeB connects to origin server A, origin server A knows that this flow is on origin server B, so it gives EdgeB a 302 redirect to origin server B. This means that once the information is established, other edge servers only need a single 302 redirect to know which origin server the flow is on.

Step 1: Client request Origin A, 302 to Origin B.
+-------+           +---------+
| EdgeB +------->---+ OriginA |
+-------+           +---------+


Step 2: Client request Origin B, 200 OK.
+-------+           +---------+
| EdgeB +------->---+ OriginB |
+-------+           +---------+

If the edge server finds that the flow does not exist when accessing the second origin server, it informs the first origin server and starts the polling process again. In this system, the worst-case scenario requires polling all the origin servers, but this process can be done very quickly because the network between the edge server and the origin server is generally very good.

Regardless of which origin server crashes or if the flow is pushed to a different origin server, the system will be rebuilt and this process does not require synchronizing all the origin servers.

TRANS_BY_GPT3

winlinvip · 2017-06-05T01:30:32Z

Looking forward to everyone submitting PR.

TRANS_BY_GPT3

winlinvip · 2017-07-18T11:44:37Z

What PR are you expecting? Seriously~
If you expect PR to work, pigs will fly. Get out of the way, let me handle it myself.

TRANS_BY_GPT3

notedit · 2017-07-19T04:29:23Z

Figure it out first, then talk about it.

TRANS_BY_GPT3

com314159 · 2018-01-17T12:26:31Z

When the hot standby switch of the origin server is made, will the user experience any lag?

If the anchor is streaming to origin server node A and node A goes down, and then the streaming is switched to node B, will there be any lag perceived by the user during this origin server switch?

TRANS_BY_GPT3

juntaoliu · 2018-01-18T06:54:12Z

Check the network quality and buffering settings on the playback side.

If the player buffers a few seconds of data, the user side may not experience lag, but there may be a jump in the picture. If the player side does not buffer any data, the picture will freeze first and then continue when data is received.

TRANS_BY_GPT3

winlinvip · 2018-02-16T08:12:35Z

Config for 19350:

listen              19350;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverA.pid;
http_api {
    enabled         on;
    listen          9090;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9091;
    }
}

Config for 19351:

listen              19351;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverB.pid;
http_api {
    enabled         on;
    listen          9091;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090;
    }
}

Publish stream to 19350:

./objs/ffmpeg/bin/ffmpeg -re -i doc/source.200kbps.768x320.flv -c copy \
    -f flv -y rtmp://127.0.0.1:19350/live/livestream

Then play the stream on 19351, click here, then the player will be redirected to 19350.

Logs on 19351, redirect client to 19350:

[2018-02-16 16:08:29.641][Trace][68305][106] RTMP client ip=::ffff:127.0.0.1, fd=9
[2018-02-16 16:08:29.643][Trace][68305][106] complex handshake success
[2018-02-16 16:08:29.643][Trace][68305][106] connect app, tcUrl=rtmp://127.0.0.1:19351/live, pageUrl=http://www.ossrs.net:8085/players/srs_player.html?vhost=www.ossrs.net&stream=livestream&autostart=false, swfUrl=http://www.ossrs.net:8085/players/srs_player/release/srs_player.swf?_version=1.31, schema=rtmp, vhost=127.0.0.1, port=19351, app=live, args=null
[2018-02-16 16:08:29.694][Trace][68305][106] client identified, type=Play, vhost=127.0.0.1, app=live, stream_name=livestream, duration=-1.00
[2018-02-16 16:08:29.694][Trace][68305][106] connected stream, tcUrl=rtmp://127.0.0.1:19351/live, pageUrl=http://www.ossrs.net:8085/players/srs_player.html?vhost=www.ossrs.net&stream=livestream&autostart=false, swfUrl=http://www.ossrs.net:8085/players/srs_player/release/srs_player.swf?_version=1.31, schema=rtmp, vhost=__defaultVhost__, port=19351, app=live, stream=livestream, args=null
[2018-02-16 16:08:29.694][Trace][68305][106] source url=/live/livestream, ip=::ffff:127.0.0.1, cache=1, is_edge=0, source_id=-1[-1]
[2018-02-16 16:08:29.695][Trace][68305][106] http: on_hls ok, url=http://127.0.0.1:9090/api/v1/clusters?vhost=__defaultVhost__&ip=127.0.0.1&app=live&stream=livestream, response={"code":0,"data":{"query":{"ip":"127.0.0.1","vhost":"__defaultVhost__","app":"live","stream":"livestream"},"origin":{"ip":"127.0.0.1","port":19350,"vhost":"__defaultVhost__","api":"127.0.0.1:9090","routers":["127.0.0.1:9090"]}}}
[2018-02-16 16:08:29.695][Trace][68305][106] rtmp: redirect in cluster, url=http://127.0.0.1:9090/api/v1/clusters?vhost=__defaultVhost__&ip=127.0.0.1&app=live&stream=livestream, target=127.0.0.1:19350
[2018-02-16 16:08:29.721][Trace][68305][106] client finished.

winlinvip · 2018-02-16T08:15:23Z

We can also start a edge server, which will follow the RTMP302, the config:

listen              1935;
max_connections     1000;
pid                 objs/edge.pid;
daemon              off;
srs_log_tank        console;
vhost __defaultVhost__ {
    cluster {
        mode            remote;
        origin          127.0.0.1:19351;
    }
}

Remark: The edge will try to fetch stream from 19351, then it'll be redirected to 19350.

The config for origin 19350:

listen              19350;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverA.pid;
http_api {
    enabled         on;
    listen          9090;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9091;
    }
}

The config for origin 19351:

listen              19351;
max_connections     1000;
daemon              off;
srs_log_tank        console;
pid                 ./objs/origin.cluster.serverB.pid;
http_api {
    enabled         on;
    listen          9091;
}
vhost __defaultVhost__ {
    cluster {
        mode            local;
        origin_cluster  on;
        coworkers       127.0.0.1:9090;
    }
}

Then publish to origin 19350:

./objs/ffmpeg/bin/ffmpeg -re -i doc/source.200kbps.768x320.flv -c copy \
        -f flv -y rtmp://127.0.0.1:19350/live/livestream

Then start player to play stream from edge, click here.

The log on edge server, connect to 19350 but redirected to 19350:

[2018-02-16 16:24:36.844][Trace][68543][107] RTMP client ip=::ffff:127.0.0.1, fd=8
[2018-02-16 16:24:36.847][Trace][68543][107] complex handshake success
[2018-02-16 16:24:36.847][Trace][68543][107] connect app, tcUrl=rtmp://127.0.0.1:1935/live, pageUrl=http://www.ossrs.net:8085/players/srs_player.html?app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true&vhost=127.0.0.1, swfUrl=http://www.ossrs.net:8085/players/srs_player/release/srs_player.swf?_version=1.31, schema=rtmp, vhost=127.0.0.1, port=1935, app=live, args=null
[2018-02-16 16:24:36.902][Trace][68543][107] client identified, type=Play, vhost=127.0.0.1, app=live, stream_name=livestream, duration=-1.00
[2018-02-16 16:24:36.902][Trace][68543][107] connected stream, tcUrl=rtmp://127.0.0.1:1935/live, pageUrl=http://www.ossrs.net:8085/players/srs_player.html?app=live&stream=livestream&server=127.0.0.1&port=1935&autostart=true&vhost=127.0.0.1, swfUrl=http://www.ossrs.net:8085/players/srs_player/release/srs_player.swf?_version=1.31, schema=rtmp, vhost=__defaultVhost__, port=1935, app=live, stream=livestream, args=null
[2018-02-16 16:24:36.903][Trace][68543][107] source url=/live/livestream, ip=::ffff:127.0.0.1, cache=1, is_edge=1, source_id=-1[-1]
[2018-02-16 16:24:36.903][Trace][68543][107] dispatch cached gop success. count=0, duration=-1
[2018-02-16 16:24:36.903][Trace][68543][107] create consumer, queue_size=30.00, jitter=1
[2018-02-16 16:24:36.903][Trace][68543][107] ignore disabled exec for vhost=__defaultVhost__
[2018-02-16 16:24:36.903][Trace][68543][107] mw changed sleep 350=>350, max_msgs=128, esbuf=218750, sbuf 146988=>109375, realtime=0
[2018-02-16 16:24:36.903][Trace][68543][107] start play smi=0.00, mw_sleep=350, mw_enabled=1, realtime=0, tcp_nodelay=0
[2018-02-16 16:24:36.904][Trace][68543][107] update source_id=108[108]
[2018-02-16 16:24:36.904][Trace][68543][107] -> PLA time=0, msgs=0, okbps=0,0,0, ikbps=0,0,0, mw=350
[2018-02-16 16:24:36.907][Trace][68543][108] complex handshake success.
[2018-02-16 16:24:36.907][Trace][68543][108] connected, dsu=1
[2018-02-16 16:24:36.908][Trace][68543][108] edge change from 100 to state 101 (pull).
[2018-02-16 16:24:36.910][Warn][68543][108][35] RTMP redirect 127.0.0.1:19351 to 127.0.0.1:19350 stream=

winlinvip · 2018-02-16T08:42:00Z

Fixed.

winlinvip · 2018-02-16T08:42:14Z

Please help to test this feature.

winlinvip · 2018-02-16T10:19:26Z

WIKI:
Please ensure that you maintain the markdown structure.

https://github.com/ossrs/srs/wiki/v3_CN_OriginCluster

https://github.com/ossrs/srs/wiki/v3_EN_OriginCluster

TRANS_BY_GPT3

wuxianlijiang · 2018-02-16T15:36:48Z

Thank you. Happy New Year.

TRANS_BY_GPT3

winlinvip · 2018-03-03T01:48:35Z

Example:

https://github.com/ossrs/srs/wiki/v3_EN_SampleOriginCluster

https://github.com/ossrs/srs/wiki/v3_CN_SampleOriginCluster

winlinvip · 2020-02-15T03:56:26Z

The design goal of Origin Cluster is a cluster with less than 5k streams or for disaster recovery with a small number of streams. If you need a cluster with 100k streams, please refer to #1607 (comment).

In this solution, each origin server accesses each other, which means that each origin server is an independent service. Since each origin server needs to serve the edge or be accessed, each origin server needs to have a service address. There are two ways to achieve this:

Stateless origin server cluster: 1~3 origin servers, each requiring the creation of a separate Deployment and Service. The advantage is that it is stateless and does not require interconnection, resulting in higher stability.
Stateful origin server cluster: 3~30 origin servers, only requiring the creation of a single StatefulSet and Service. The advantage is that it has a simple configuration, but the downside is that managing state can be complex. After creation, only a few fields such as Replicas, Template, and UpdateStrategy can be updated.

In both of the above scenarios, it is necessary to configure the "coworkers" for the origin server and the "origin" for the edge server, including the addresses of all the origin servers.

Stateless Origin Server Cluster (Deployment)

Suitable for very few streams, such as <100 streams, 1-3 origin servers.

The origin server cluster of SRS itself is stateful, which means that requesting a certain stream must be done on a specific server, rather than being able to pull the stream from any server. We cannot attach multiple origin servers behind an SLB (Server Load Balancer), as when playing a stream, the SLB will randomly select an origin server, which may lead to accessing the wrong server or the stream's status and data being located on a specific origin server, rather than being stateless.

So, when we talk about a stateless origin server cluster here, it refers to the deployment of the origin server cluster in the form of a stateless application. Since each origin server requires an independent deployment, each deployment has only one replica, and each deployment corresponds to a service (ClusterIP) with a unique name. In reality, it is equivalent to having only one origin server behind the SLB, for example:

Origin Server	Deployment	Service	Domain
---	---	---	---
Origin Server 0	origin-0-deploy	origin-0-service	origin-0-service
Origin Server 1	origin-1-deploy	origin-1-service	origin-1-service
Origin Server 2	origin-2-deploy	origin-2-service	origin-2-service
Origin Server N	origin-N-deploy	origin-N-service	origin-N-service

Note: For the deployment instances of stateless clusters, refer to the Wiki.

Create a separate Deployment for each origin server with Replicas set to 1. Create a corresponding Service with ClusterIP type. This approach may be a bit cumbersome, but it will be easier to migrate to OCM (#1607) in the future. The origin server will be addressed using the service-name instead of the pod-name.service-name method.

StatefulSet for Stateful Origin Server Cluster

Suitable for a certain number of streams, such as <5k streams, and within 5-30 origin servers.

In K8s, each origin server requires a responsive Service, which can be achieved by using StatefulSets and HeadlessService to enable addressing capability for each origin server Pod. For example:

Origin Server	StatefulSet	Service	Domain
---	---	---	---
Origin Server 0	origin	service	origin-0.service
Origin Server 1	origin	service	origin-1.service
Origin Server 2	origin	service	origin-2.service
Origin Server N	origin	service	origin-N.service

Note: For deployment instances of stateful clusters, refer to the Wiki.

Just create one StatefulSet and one Service, and set the Replicas to the number of origin servers.

The origin servers are configured as coworkers: origin-0.service origin-1.service origin-2.service;
The edge servers are configured with all or some of the origin servers: origin origin-0.service origin-1.service origin-2.service;

It can be seen that it will indeed be cumbersome, and when adding new origin servers, it is necessary to update the configurations of other origin servers as well as the edge servers. This solution is suitable for up to 30 origin server nodes.

TRANS_BY_GPT3

winlinvip · 2020-02-15T03:58:41Z

The origin server cluster supports a solution for less than 5k routes. Please refer to: #464 (comment)

The origin server cluster supports a solution for less than 100k routes. Please refer to: #1607 (comment)

Regarding the definition of the service address for the origin server in the origin server cluster, please refer to: #1501 (comment)

Regarding the round-robin issue with multiple node origin servers in the origin server cluster, please refer to: #1501 (comment)

Regarding the storage issue with the origin server cluster, please refer to: #1595 (comment)

Regarding the API issue with the origin server cluster, please refer to: #1607 (comment)

TRANS_BY_GPT3

winlinvip · 2020-02-15T10:12:41Z

In the example of StatefulSet in K8s, there is an example of deploying a Cassandra cluster, which is a type of KV storage. Since the names and addresses of the Pods are fixed, the first one is chosen as the SeedNode, which means that all the nodes will gossip with this node.

          - name: CASSANDRA_SEEDS
            value: "cassandra-0.cassandra.default.svc.cluster.local"

Note: Here is an article introducing Cassandra: https://www.cnblogs.com/loveis715/p/5299495.html

Simply put, Cassandra is also a cluster composed of a group of nodes, but it has a more complex communication mechanism that distinguishes roles such as SeedNode.

The Origin Cluster does not want to implement such a complex logic. The future direction is to solve this part of the mechanism by relying on peripheral services through the HTTP API. For example, Go can be used to implement OCM (#1607), and Go can rely on KV to solve the central data storage problem.

TRANS_BY_GPT3

winlinvip · 2020-02-15T14:49:52Z

OriginCluster needs to support the same configuration for easy deployment in K8s, so that it can access itself without causing any issues, but optimization is required. Refer to #1608.

TRANS_BY_GPT3

winlinvip · 2020-02-18T01:42:28Z

In the upgrade, rollback, and grayscale mechanisms of the service, the origin server or origin server cluster can be directly restarted or improved by batch restart. This is mainly because the origin server generally has an edge as a proxy, and the edge will retry after disconnection, which has a minimal impact on users. Reference: #1579 (comment)

TRANS_BY_GPT3

winlinvip added the Feature It's a new feature. label Aug 23, 2015

winlinvip added this to the srs 3.0 release milestone Aug 23, 2015

winlinvip assigned winlinvip and unassigned winlinvip Aug 23, 2015

winlinvip mentioned this issue Sep 17, 2015

How to loadbalance on a Origin - Edge configuration? #482

Closed

winlinvip mentioned this issue Oct 21, 2015

Enhance the handling of the returned result from the API hook. #507

Closed

winlinvip changed the title ~~Origin Cluster for Fault Tolarence and Load Balance, 源站热备和负载均衡集群~~ Cluster: Origin Cluster for Fault Tolarence and Load Balance, 源站热备和负载均衡集群 Jun 5, 2017

winlinvip modified the milestones: srs 4.0 release, srs 3.0 release Jun 5, 2017

winlinvip mentioned this issue Jul 18, 2017

About Cluster: Several questions about forward, edge, slave, master, etc. Please maintain the markdown structure. #938

Closed

winlinvip modified the milestones: srs 3.0 release, srs 4.0 release Jul 18, 2017

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, support config for origin cluster.

d0fbf44

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, support origin cluster api

469250f

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, query origin info and ip addresses

ec362b2

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, refine code

55c9619

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, support config origin cluster

92f2bcd

winlinvip added a commit that referenced this issue Feb 16, 2018

For #464, refine result of origin cluster api

2f09ec4

winlinvip added a commit that referenced this issue Feb 16, 2018

Fix #464, support origin cluster

c70421e

winlinvip added a commit that referenced this issue Feb 16, 2018

Fix #464, support RTMP origin cluster. 3.0.29

4bf5ab2

winlinvip closed this as completed Feb 16, 2018

winlinvip added a commit that referenced this issue Mar 3, 2018

For #464: Add example for origin cluster

449c632

springjk mentioned this issue Jan 13, 2020

Redirected flow name duplicated in the origin server cluster #1575

Closed

winlinvip mentioned this issue Feb 15, 2020

Docker source station cluster, two-way streaming cannot be played simultaneously #1501

Closed

winlinvip mentioned this issue Feb 15, 2020

Support docker and k8s in native #1595

Closed

winlinvip mentioned this issue Feb 18, 2020

OCM(OriginClusterMaster): Use external application for Origin Cluster to discovery services #1607

Closed

mawenwu1983 mentioned this issue Nov 19, 2020

Origin server cluster, pulling the stream directly from the origin server will fail. #2045

Closed

winlinvip mentioned this issue Feb 5, 2021

Support Multiple-CPUs(or Threads) to improve concurrency. #2188

Closed

10 tasks

winlinvip self-assigned this Sep 25, 2021

winlinvip changed the title ~~Cluster: Origin Cluster for Fault Tolarence and Load Balance, 源站热备和负载均衡集群~~ Cluster: Origin Cluster for Fault Tolarence and Load Balance. Jul 26, 2023

winlinvip added the TransByAI Translated by AI/GPT. label Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

winlinvip commented Aug 23, 2015 •

edited

Loading

winlinvip commented Jun 5, 2017 •

edited

Loading

winlinvip commented Jun 5, 2017 •

edited

Loading

winlinvip commented Jul 18, 2017 •

edited

Loading

notedit commented Jul 19, 2017 •

edited by winlinvip

Loading

com314159 commented Jan 17, 2018 •

edited by winlinvip

Loading

juntaoliu commented Jan 18, 2018 •

edited by winlinvip

Loading

winlinvip commented Feb 16, 2018 •

edited

Loading

winlinvip commented Feb 16, 2018 •

edited

Loading

winlinvip commented Feb 16, 2018

winlinvip commented Feb 16, 2018

winlinvip commented Feb 16, 2018 •

edited

Loading

wuxianlijiang commented Feb 16, 2018 •

edited by winlinvip

Loading

winlinvip commented Mar 3, 2018

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading

Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

Cluster: Origin Cluster for Fault Tolarence and Load Balance. #464

Comments

winlinvip commented Aug 23, 2015 • edited Loading

winlinvip commented Jun 5, 2017 • edited Loading

winlinvip commented Jun 5, 2017 • edited Loading

winlinvip commented Jul 18, 2017 • edited Loading

notedit commented Jul 19, 2017 • edited by winlinvip Loading

com314159 commented Jan 17, 2018 • edited by winlinvip Loading

juntaoliu commented Jan 18, 2018 • edited by winlinvip Loading

winlinvip commented Feb 16, 2018 • edited Loading

winlinvip commented Feb 16, 2018 • edited Loading

winlinvip commented Feb 16, 2018

winlinvip commented Feb 16, 2018

winlinvip commented Feb 16, 2018 • edited Loading

wuxianlijiang commented Feb 16, 2018 • edited by winlinvip Loading

winlinvip commented Mar 3, 2018

winlinvip commented Feb 15, 2020 • edited Loading

Stateless Origin Server Cluster (Deployment)

StatefulSet for Stateful Origin Server Cluster

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 15, 2020 • edited Loading

winlinvip commented Feb 18, 2020 • edited Loading

winlinvip commented Aug 23, 2015 •

edited

Loading

winlinvip commented Jun 5, 2017 •

edited

Loading

winlinvip commented Jun 5, 2017 •

edited

Loading

winlinvip commented Jul 18, 2017 •

edited

Loading

notedit commented Jul 19, 2017 •

edited by winlinvip

Loading

com314159 commented Jan 17, 2018 •

edited by winlinvip

Loading

juntaoliu commented Jan 18, 2018 •

edited by winlinvip

Loading

winlinvip commented Feb 16, 2018 •

edited

Loading

winlinvip commented Feb 16, 2018 •

edited

Loading

winlinvip commented Feb 16, 2018 •

edited

Loading

wuxianlijiang commented Feb 16, 2018 •

edited by winlinvip

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 15, 2020 •

edited

Loading

winlinvip commented Feb 18, 2020 •

edited

Loading