Failed to start: discovered another streaming server with cluster ID "example-stan" #61

veerapatyok · 2019-12-26T07:27:30Z

I got error when I deploy NatsStreamingCluster

[1] 2019/12/26 07:16:45.762521 [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "example-stan"

I use GKE

full message

[1] 2019/12/26 07:16:45.747712 [INF] STREAM: ServerID: JTmPHIR4BFp2ZuAWkekcIl
[1] 2019/12/26 07:16:45.747715 [INF] STREAM: Go version: go1.11.13
[1] 2019/12/26 07:16:45.747717 [INF] STREAM: Git commit: [910d6e1]
[1] 2019/12/26 07:16:45.760913 [INF] STREAM: Recovering the state...
[1] 2019/12/26 07:16:45.761073 [INF] STREAM: No recovered state
[1] 2019/12/26 07:16:45.762399 [INF] STREAM: Shutting down.
[1] 2019/12/26 07:16:45.762521 [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "example-stan"

The text was updated successfully, but these errors were encountered:

makkus · 2020-01-08T23:25:12Z

I'm getting the same error, on a dev k3d cluster.

timjkelly · 2020-01-13T15:26:58Z

I'm also seeing this error when installing via the instructions here: https://github.com/nats-io/nats-streaming-operator#deploying-a-nats-streaming-cluster

bfalese-navent · 2020-01-16T18:01:33Z

Is not working any more. same error

dannylesnik · 2020-01-22T09:54:22Z

Stuck with the same problem, only one replica is nats streaming pod is working. All other exit wit the same error.

[FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "example-stan"

kelvin-yue-scmp · 2020-01-31T02:04:09Z

Having the same issue

maertu · 2020-02-04T15:31:30Z

Same

veerapatyok · 2020-02-04T16:07:48Z

I have temporary solution: I made nat-streaming-cluster.yaml and inside a file I added

config:
    debug: true

nat-streaming-cluster.yaml

---
apiVersion: "streaming.nats.io/v1alpha1"
kind: "NatsStreamingCluster"
metadata:
  name: "example-stan"
spec:
  # Number of nodes in the cluster
  size: 3

  # NATS Streaming Server image to use, by default
  # the operator will use a stable version
  #
  image: "nats-streaming:latest"

  # Service to which NATS Streaming Cluster nodes will connect.
  #
  natsSvc: "example-nats"

 config:
    debug: true

veerapatyok · 2020-03-17T06:56:22Z

I change to KubeMQ

hasanovkhalid · 2020-04-20T17:15:34Z

Any update on this issue? The same behaviour on EKS. If I keep retrying it works eventually, however, when there is a pod restart it starts happening again.

sneerin · 2020-05-19T03:21:28Z

the same issue for me

lundbird · 2020-05-21T21:58:58Z

After trying the config above I get the error: [FTL] STREAM: Failed to start: failed to join Raft group example-stan. I am able to create a working nats+stan configuration by using the statefulsets here: https://docs.nats.io/nats-on-kubernetes/minimal-setup#ha-setup-using-statefulsets

drshade · 2020-06-17T13:39:27Z

Same problem, and adding the "debug: true" worked for me once, but unpredictably on the next few attempts (had to delete and apply the cluster a few times).

For my configuration I suspect that it may be a timing issue with NATS Streaming racing my Envoy proxy sidecar (I have Istio installed in my cluster) and that by adding the "debug: true" NATS Streaming takes a bit longer to boot up, giving Envoy enough time to be ready. Bit of a tricky one to debug as the images are based on scratch with no real ability to inject a sleep as part of the image cmd.

Am I the only one using Istio, or is this a common theme?

lanox · 2020-07-23T09:29:23Z

I have the same issue, is there anyone can help.

[1] [INF] STREAM: Starting nats-streaming-server[stan-service] version 0.16.2
[1] [INF] STREAM: ServerID: ZfhJYXPEJEzpUKNLHWlD0F
[1] [INF] STREAM: Go version: go1.11.13
[1] [INF] STREAM: Git commit: [910d6e1]
[1] [INF] STREAM: Recovering the state...
[1] [INF] STREAM: No recovered state
[1] [INF] STREAM: Shutting down.
[1] [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "stan-service"

hbobenicio · 2020-07-23T19:26:39Z

Same issue here.

describing the pods created by the nats-streaming-operator, I see the cli command line args setting the cluster-id as follows:

$ kubectl describe -n mynamespace stan-cluster-2

Name:         stan-cluster-2
Containers:
  stan:
    Image:         nats-streaming:0.18.0
    Command:
      /nats-streaming-server
      -cluster_id
      stan-cluster
      -nats_server
      nats://nats-cluster:4222
      -m
      8222
      -store
      file
      -dir
      store
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1

pod 1 runs ok. Then pod 2 and 3 try to run with the same cluster-id and fails because it's already in use (by pod 1).

What is the correct way the nats-streaming-operator should assign the cluster-id's to the cluster servers? is it some config I'm missing here?

PS.: I'm not mounting any volume yet on the pod spec.

hbobenicio · 2020-07-23T19:59:46Z

maybe this line can be a clue what's happening:

nats-streaming-operator/internal/operator/controller.go

Line 379 in 079120f

"-cluster_id", o.Name,

isn't it supposed to be pod.Name or something?

hbobenicio · 2020-07-23T20:36:36Z

I downloaded the code, changed o.Name for pod.Name and then I've put some logs to compare both values.
I docker built the image and redeploy the operator in my minikube... this is what follows:

$ kubectl logs -n poc nats-streaming-operator-5d4777f476-2wf7n

time="2020-07-23T20:25:22Z" level=info msg="cluster name: stan-cluster" # this is the o.Name
time="2020-07-23T20:25:22Z" level=info msg="pod name: stan-cluster-2" # this is the pod.Name

now the cluster id is correctly set for the pods:

$ kubectl logs -n poc stan-cluster-2 # stan-cluster-2 is the correct cluster-id!

[1] 2020/07/23 20:27:22.126726 [INF] STREAM: Starting nats-streaming-server[stan-cluster-2] version 0.18.0

and all servers are ready.

lanox · 2020-07-23T21:16:51Z

@hbobenicio that is what I have done and it seem to work although I am not sure how to validate that all 3 nodes are functioning correctly.

I can see 3 nodes been connected but that is about is

Is there a way to check which nodes is receiving?

hbobenicio · 2020-07-24T00:32:26Z

@lanox there are some ways to test it... a quick test would be running nats-box on the cluster and sending/receiving some messages, or maybe writing some test app and running it on your cluster. try checking logs and trying some caos testing at last.

Good to know that it worked for you too

hbobenicio · 2020-07-24T01:24:53Z

My bad... I mixed up the concept of cluster_id (o.Name is actually correct) with cluster_node_id. The bug is somewhere else bellow here:

nats-streaming-operator/internal/operator/controller.go

Line 399 in 079120f

    
           isClustered := o.Spec.Config != nil && (o.Spec.Size > 1 || o.Spec.Config.Clustered)

nats-streaming-operator/internal/operator/controller.go

Line 402 in 079120f

if isClustered && !ftModeEnabled {

nats-streaming-operator/internal/operator/controller.go

Line 404 in 079120f

storeArgs = append(storeArgs, fmt.Sprintf("--cluster_node_id=%q", pod.Name))

my yaml describing NatsStreamingCluster doesn't have a config entry, so isClustered fails and it doesn't get the cluster_node_id set.

lanox · 2020-07-24T01:42:54Z

Thanks for looking into this. I think it looked like it was working because it was working as an individual node and not clustered nodes? , hence I was saying I am not sure if this worked as it supposes to, however, I could be wrong in what I am saying.

lanox · 2020-07-24T05:42:06Z

@hbobenicio so this is what fixed the problem for me, I added this ft_group_name:"production-cluster" in my config section and that told the streeaming-operator that it is running in fault tolerance mode and that one singe node will be active while other 2 are in the standby mode.

This is what I did to test.

 kubectl get pods
NAME                                       READY   STATUS    RESTARTS   AGE
nats-operator-58644766bf-hpx9p             1/1     Running   1          24h
nats-service-1                             1/1     Running   0          15m
nats-service-2                             1/1     Running   0          15m
nats-service-3                             1/1     Running   0          15m
nats-streaming-operator-56d59c9846-l6qlm   1/1     Running   0          52m
stan-service-1                             1/1     Running   1          15m
stan-service-2                             1/1     Running   0          15m
stan-service-3                             1/1     Running   0          15m

then

kubectl logs stan-service-1 -c stan
[1] [INF] STREAM: Starting nats-streaming-server[stan-service] version 0.16.2
[1] [INF] STREAM: ServerID: ZwZuUeXKPK3Y7OjI7R1hLd
[1] [INF] STREAM: Go version: go1.11.13
[1] [INF] STREAM: Git commit: [910d6e1]
[1] [INF] STREAM: Starting in standby mode
[1] [INF] STREAM: Server is active
[1] [INF] STREAM: Recovering the state...
[1] [INF] STREAM: No recovered state
[1] [INF] STREAM: Message store is FILE
[1] [INF] STREAM: Store location: store
[1] [INF] STREAM: ---------- Store Limits ----------
[1] [INF] STREAM: Channels:            unlimited
[1] [INF] STREAM: --------- Channels Limits --------
[1] [INF] STREAM:   Subscriptions:     unlimited
[1] [INF] STREAM:   Messages     :     unlimited
[1] [INF] STREAM:   Bytes        :     unlimited
[1] [INF] STREAM:   Age          :        1h0m0s
[1] [INF] STREAM:   Inactivity   :     unlimited *
[1] [INF] STREAM: ----------------------------------
[1] [INF] STREAM: Streaming Server is ready

Then I deleted stan-service-1

Then I checked which other nodes have become master

 kubectl logs stan-service-2 -c stan
[1] [INF] STREAM: Starting nats-streaming-server[stan-service] version 0.16.2
[1] [INF] STREAM: ServerID: BlQLxnAFPv7yf7uaWdXsa9
[1] [INF] STREAM: Go version: go1.11.13
[1] [INF] STREAM: Git commit: [910d6e1]
[1] [INF] STREAM: Starting in standby mode

kubectl logs stan-service-3 -c stan
[1] [INF] STREAM: Starting nats-streaming-server[stan-service] version 0.16.2
[1] [INF] STREAM: ServerID: B3niCweLpvzSewgx3mUsJ9
[1] [INF] STREAM: Go version: go1.11.13
[1] [INF] STREAM: Git commit: [910d6e1]
[1] [INF] STREAM: Starting in standby mode
[1] [INF] STREAM: Server is active
[1] [INF] STREAM: Recovering the state...
[1] [INF] STREAM: No recovered state
[1] [INF] STREAM: Message store is FILE
[1] [INF] STREAM: Store location: store
[1] [INF] STREAM: ---------- Store Limits ----------
[1] [INF] STREAM: Channels:            unlimited
[1] [INF] STREAM: --------- Channels Limits --------
[1] [INF] STREAM:   Subscriptions:     unlimited
[1] [INF] STREAM:   Messages     :     unlimited
[1] [INF] STREAM:   Bytes        :     unlimited
[1] [INF] STREAM:   Age          :        1h0m0s
[1] [INF] STREAM:   Inactivity   :     unlimited *
[1] [INF] STREAM: ----------------------------------
[1] [INF] STREAM: Streaming Server is ready

and stan-service-1 is showing standby.

I think the documentation needs to be updated as well as the example deployments.

lanox · 2020-07-24T05:55:57Z

My bad... I mixed up the concept of cluster_id (o.Name is actually correct) with cluster_node_id. The bug is somewhere else bellow here:

nats-streaming-operator/internal/operator/controller.go

Line 404 in 079120f

storeArgs = append(storeArgs, fmt.Sprintf("--cluster_node_id=%q", pod.Name))

my yaml describing NatsStreamingCluster doesn't have a config entry, so isClustered fails and it doesn't get the cluster_node_id set.

oh and it seems you can only run it in cluster mode or FT mode, but can't be together.

hbobenicio · 2020-07-24T11:49:11Z

Yeah, they are mutually exclusive modes. My use case is for cluster mode.

I think those checks for modes could be improved or, if the config object for the spec is really necessary, then a validation to check if this is missing would be a better error report. But I still think the best approach is that it works even without the config entry.

So, until de fix is made, this is the workaround:

If you have a yaml without a config entry like this, just put an empty config entry then:

apiVersion: "streaming.nats.io/v1alpha1"
kind: "NatsStreamingCluster"
metadata:
  name: "my-stan-cluster"
  namespace: ${NAMESPACE}
spec:
  size: ${CLUSTER_SIZE}
  image: "nats-streaming:0.18.0"
  natsSvc: ${NATS_CLUSTER_NAME}

  # Here... without a config entry, isClustered is false even with spec.Size > 1.
  # Just put an empty config
  config: {}

… code correctly set cluster-node-id for cluster mode

…figs workaround issue #61 - adding missing configs to all examples of cluster mode

sergeyshaykhullin · 2020-09-11T15:52:40Z

@hbobenicio @wallyqs Hello. I'am getting same error using STAN helm chart

[1] 2020/09/11 15:51:26.922551 [INF] STREAM: Starting nats-streaming-server[stan] version 0.18.0
[1] 2020/09/11 15:51:26.922673 [INF] STREAM: ServerID: PWeRnm2bTpcMaHZatM8MdC
[1] 2020/09/11 15:51:26.922678 [INF] STREAM: Go version: go1.14.4
[1] 2020/09/11 15:51:26.922681 [INF] STREAM: Git commit: [026e3a6]
[1] 2020/09/11 15:51:26.951206 [INF] STREAM: Recovering the state...
[1] 2020/09/11 15:51:26.953525 [INF] STREAM: Recovered 0 channel(s)
[1] 2020/09/11 15:51:26.961610 [INF] STREAM: Shutting down.
[1] 2020/09/11 15:51:26.962248 [FTL] STREAM: Failed to start: discovered another streaming server with cluster ID "stan"

stan:
  replicas: 3

  nats:
    url: nats://nats.nats:4222

  store:
    ...:

  cluster:
    enabled: true

  sql:
    ...:

wallyqs · 2020-09-11T16:31:01Z

Hi @sergeyshaykhullin I think this is an error from the helm charts? Btw I think the error is that it missing defining
ft: https://github.com/nats-io/k8s/tree/master/helm/charts/stan#fault-tolerance-mode

  ft:
    group: "stan"

jdoig mentioned this issue Jan 20, 2020

What's the current state of STAN clustered on GCP? #62

Open

hbobenicio added a commit to hbobenicio/nats-streaming-operator that referenced this issue Jul 24, 2020

workaround issue nats-io#61 - adding missing configs in order for the…

3d77a4b

… code correctly set cluster-node-id for cluster mode

hbobenicio mentioned this issue Jul 24, 2020

workaround issue #61 - adding missing configs to all examples of cluster mode #76

Merged

wallyqs added a commit that referenced this issue Jul 27, 2020

Merge pull request #76 from hbobenicio/fix-examples-always-define-con…

cb7c460

…figs workaround issue #61 - adding missing configs to all examples of cluster mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to start: discovered another streaming server with cluster ID "example-stan" #61

Failed to start: discovered another streaming server with cluster ID "example-stan" #61

veerapatyok commented Dec 26, 2019

makkus commented Jan 8, 2020

timjkelly commented Jan 13, 2020

bfalese-navent commented Jan 16, 2020

dannylesnik commented Jan 22, 2020

kelvin-yue-scmp commented Jan 31, 2020

maertu commented Feb 4, 2020

veerapatyok commented Feb 4, 2020

veerapatyok commented Mar 17, 2020

hasanovkhalid commented Apr 20, 2020 •

edited

Loading

sneerin commented May 19, 2020

lundbird commented May 21, 2020

drshade commented Jun 17, 2020

lanox commented Jul 23, 2020

hbobenicio commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 23, 2020 •

edited

Loading

lanox commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 24, 2020 •

edited

Loading

hbobenicio commented Jul 24, 2020 •

edited

Loading

lanox commented Jul 24, 2020 •

edited

Loading

lanox commented Jul 24, 2020

lanox commented Jul 24, 2020

hbobenicio commented Jul 24, 2020 •

edited

Loading

sergeyshaykhullin commented Sep 11, 2020

wallyqs commented Sep 11, 2020

Failed to start: discovered another streaming server with cluster ID "example-stan" #61

Failed to start: discovered another streaming server with cluster ID "example-stan" #61

Comments

veerapatyok commented Dec 26, 2019

makkus commented Jan 8, 2020

timjkelly commented Jan 13, 2020

bfalese-navent commented Jan 16, 2020

dannylesnik commented Jan 22, 2020

kelvin-yue-scmp commented Jan 31, 2020

maertu commented Feb 4, 2020

veerapatyok commented Feb 4, 2020

veerapatyok commented Mar 17, 2020

hasanovkhalid commented Apr 20, 2020 • edited Loading

sneerin commented May 19, 2020

lundbird commented May 21, 2020

drshade commented Jun 17, 2020

lanox commented Jul 23, 2020

hbobenicio commented Jul 23, 2020 • edited Loading

hbobenicio commented Jul 23, 2020 • edited Loading

hbobenicio commented Jul 23, 2020 • edited Loading

lanox commented Jul 23, 2020 • edited Loading

hbobenicio commented Jul 24, 2020 • edited Loading

hbobenicio commented Jul 24, 2020 • edited Loading

lanox commented Jul 24, 2020 • edited Loading

lanox commented Jul 24, 2020

lanox commented Jul 24, 2020

hbobenicio commented Jul 24, 2020 • edited Loading

sergeyshaykhullin commented Sep 11, 2020

wallyqs commented Sep 11, 2020

hasanovkhalid commented Apr 20, 2020 •

edited

Loading

hbobenicio commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 23, 2020 •

edited

Loading

lanox commented Jul 23, 2020 •

edited

Loading

hbobenicio commented Jul 24, 2020 •

edited

Loading

hbobenicio commented Jul 24, 2020 •

edited

Loading

lanox commented Jul 24, 2020 •

edited

Loading

hbobenicio commented Jul 24, 2020 •

edited

Loading