feat(server): abort on panic #4026

jjbayer · 2024-09-11T13:36:36Z

Make the tasks spawned by services joinable, and abort the entire process if one of the join handles returns a panic.

This is a partial implementation that requires follow-up. Only EnvelopeBuffer and EnvelopeProcessor are actively monitored for panics for now.

ref: https://getsentry.atlassian.net/browse/INC-875

iambriccardo

The idea LGTM, it would be nice as I wrote if we could figure out in the future a more defensive mechanism for services to fallback in case of panics.

iambriccardo · 2024-09-11T15:43:46Z

relay-server/src/lib.rs

+ }
+ Err(e) => {
+ if e.is_panic() {
+ std::panic::resume_unwind(e.into_panic());


Do we maybe want in a future iteration to define a respawn behavior of services? It might be tricky to make sure existing channels are re-setup.

This actually re-triggers the panic and makes the process terminate. Respawning services is another option I would like to discuss on Monday, but it has its drawbacks (what if the service keeps panicking on every re-spawn?).

Yes this is something I thought of, I feel like for that we should have some global retry counters or heuristics to know when it's not possible to restart a service anymore.

jjbayer · 2024-09-12T05:45:16Z

relay-server/src/lib.rs

+ }
+ Err(e) => {
+ if e.is_panic() {
+ std::panic::resume_unwind(e.into_panic());


This actually re-triggers the panic and makes the process terminate. Respawning services is another option I would like to discuss on Monday, but it has its drawbacks (what if the service keeps panicking on every re-spawn?).

jjbayer · 2024-09-12T05:51:55Z

relay-system/src/service.rs

@@ -1046,12 +1047,12 @@ mod tests {
 impl Service for MockService {
 type Interface = MockMessage;

- fn spawn_handler(self, mut rx: Receiver<Self::Interface>) {
+ fn spawn_handler(self, mut rx: Receiver<Self::Interface>) -> JoinHandle<()> {


Note: by requiring that spawn_handler returns exactly one JoinHandle, we restrict the impl to define exactly one main task. Not sure if this is what we want, because the purpose of the spawn handler was to give the implementor more liberty. If we do restrict it, we might as well replace the trait method spawn_handler by a trait method run, and call tokio::spawn from the outside.

jjbayer · 2024-09-12T05:56:51Z

relay-server/src/services/health_check.rs

@@ -225,6 +226,8 @@ impl Service for HealthCheckService {
 });
 }
 });
+
+ j1 // TODO: should return j1 + j2


We have a few places where the spawn handler spawns more than one task. In a follow-up, we should transform these to something like

tokio::spawn(async { let subtask = tokio::spawn(async {...}); /// ... subtask.await; });

jjbayer · 2024-09-12T06:05:19Z

relay-server/src/lib.rs

+ }
+ }
+ }
+ _ = Controller::shutdown_handle().finished() => {


Note: when every service implements a shutdown listener, awaiting on finished becomes unnecessary: We can simply await on all the join_handles and guarantee that every service finished its main task.

jjbayer · 2024-09-20T06:16:39Z

Closing in favor of #4037.

jjbayer added 2 commits September 11, 2024 11:53

wip

fd0ae7a

resume_unwind

ffbf7dc

jjbayer self-assigned this Sep 11, 2024

iambriccardo self-requested a review September 11, 2024 15:39

iambriccardo reviewed Sep 11, 2024

View reviewed changes

jjbayer added 2 commits September 12, 2024 07:53

ref

bf61ede

changelog

734c9cb

jjbayer commented Sep 12, 2024

View reviewed changes

ref

71137de

jjbayer commented Sep 12, 2024

View reviewed changes

jjbayer closed this Sep 20, 2024

jjbayer mentioned this pull request Sep 20, 2024

Implement graceful shutdown #4050

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(server): abort on panic #4026

feat(server): abort on panic #4026

jjbayer commented Sep 11, 2024 •

edited

Loading

iambriccardo left a comment

iambriccardo Sep 11, 2024

jjbayer Sep 12, 2024

iambriccardo Sep 12, 2024

jjbayer Sep 12, 2024

jjbayer Sep 12, 2024

jjbayer Sep 12, 2024

jjbayer Sep 12, 2024

jjbayer commented Sep 20, 2024

feat(server): abort on panic #4026

feat(server): abort on panic #4026

Conversation

jjbayer commented Sep 11, 2024 • edited Loading

iambriccardo left a comment

Choose a reason for hiding this comment

iambriccardo Sep 11, 2024

Choose a reason for hiding this comment

jjbayer Sep 12, 2024

Choose a reason for hiding this comment

iambriccardo Sep 12, 2024

Choose a reason for hiding this comment

jjbayer Sep 12, 2024

Choose a reason for hiding this comment

jjbayer Sep 12, 2024

Choose a reason for hiding this comment

jjbayer Sep 12, 2024

Choose a reason for hiding this comment

jjbayer Sep 12, 2024

Choose a reason for hiding this comment

jjbayer commented Sep 20, 2024

jjbayer commented Sep 11, 2024 •

edited

Loading