Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Unhandled errors in websockets can lead to panics #119

Closed
2 tasks done
phughk opened this issue Dec 5, 2023 · 4 comments · Fixed by #138
Closed
2 tasks done

Bug: Unhandled errors in websockets can lead to panics #119

phughk opened this issue Dec 5, 2023 · 4 comments · Fixed by #138
Labels
bug Something isn't working

Comments

@phughk
Copy link
Contributor

phughk commented Dec 5, 2023

Describe the bug

https://github.com/surrealdb/surrealdb.go/pull/118/files

Not much detail, but there is a code line. Apparently, when the error isnt handled correctly, the websocket connections can panic.

The TODO is going to be added to the codebase

Steps to reproduce

Unknown, may need to review code

Expected behaviour

No panics, errors handled correctly

SurrealDB version

1.0.0+20231130.a420c821.dirty for macos on aarch64

Contact Details

hugh@surrealdb.com

Is there an existing issue for this?

  • I have searched the existing issues

Code of Conduct

  • I agree to follow this project's Code of Conduct
@phughk phughk added the bug Something isn't working label Dec 5, 2023
@ElecTwix
Copy link
Contributor

ElecTwix commented Dec 5, 2023

Hi @phughk , I could take this.
It's worth noting that some of Gorilla WS's errors cannot be recovered, such as connection errors.
I was thinking of fixing the reconnection issue as well in PR because it is the root cause.
What do you think?

reconnection issue: #110

@rytswd
Copy link

rytswd commented Dec 6, 2023

With the NewTestSurrealDB function in the PR #118, I can easily reproduce this error, by simply shutting down SurrealDB server and sleep for a second.

func TestSurrealDBPanic(t *testing.T) {
	t.Parallel()

	_, db, close := NewTestSurrealDB(t)
	_ = db // db implicitly holds websocket connection, and can panic when SurrealDB is not accessible
	close()

	time.Sleep(1 * time.Second)
}

I agree with @ElecTwix -- when the connection is lost, it would be the best if the server can go into reconnection loop. I'd imagine there would be some adjustment needed for the read path as well, but we should first focus on server to automatically recover (in that sense panic needs to be addressed).

@ElecTwix
Copy link
Contributor

Hi everyone,
I have created a diagram for implementing reconnection and reauthentication.
What are your thoughts?

Reconnect
full view

@phughk
Copy link
Contributor Author

phughk commented Jan 25, 2024

Thanks so much for this!

I agree that implicit retries are favourable. We need to be careful how we go about it though - we don't want indefinite loops, and we do want to fail-fast in many cases (such as deploying to a cluster where SurrealDB hasnt yet started or isn't reachable).

The pattern for this is called a circuit breaker. We should explore how circuit breakers are implemented in golang (I haven't used them myself, only in-house solutions). There could be frameworks that help with this.

The number of retries and timeout durations must be configurable. Some people may want to have no retries and short timeouts, for example.

Additionally, if the retries have failed, there cannot be a panic, and instead must be a checkable error code. Then users can handle logic on their side - reconnect, reconnect with new address, or terminate the client.

Regarding the diagram for the approach - I think this is great @ElecTwix thank you! It looks sensible. I would advise having the circuit breaker pattern in the db.go file. Configuration should be outside db.go, though. Db.go should use the configuration provided.

ElecTwix added a commit to ElecTwix/surrealdb.go that referenced this issue May 21, 2024
instead of panic it will return error when any send function call
ElecTwix added a commit to ElecTwix/surrealdb.go that referenced this issue May 21, 2024
instead of panic it will return error when any send function call
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants