Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the FLAME.Pool handle backend.init/1 errors #67

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mentels
Copy link

@mentels mentels commented Oct 31, 2024

Even though the FLAME.Backend allows for the backend.init/1 to return an error:

@callback init(opts :: Keyword.t()) :: {:ok, state :: term()} | {:error, term()}

The FLAME.Pool would not handle it and get into an infinite restart loop. This behavior can be tested with FlameTest.Application.call in the https://github.com/mentels/flame_test that uses a custom backend which init/1 returns {:error, just_fail}.

If the pool is configured with min: 0, after the call we will keep getting:

iex(1)> FlameTest.Application.call
:ok

13:35:17.334 [error] Task #PID<0.394.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []

13:35:18.353 [error] Task #PID<0.396.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []

13:35:19.354 [error] Task #PID<0.398.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []
...

While if the runners are started at the application start-up (e.g.: min: 1), the application will not start at all, reporting an error:

13:37:00.209 [notice] Application flame exited: :stopped
** (Mix) Could not start application flame_test: FlameTest.Application.start(:normal, []) returned an error: shutdown: failed to start child: {FLAME.Pool, FlameTest.Pool}
    ** (EXIT) shutdown: failed to start child: {FLAME.Pool, FlameTest.Pool}
        ** (EXIT) an exception was raised:
            ** (MatchError) no match of right hand side value: {:error, :just_fail}
                (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
                (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
                (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4

This commit attempts to fix the error by:

  • preventing the MatchError resulting from the {:ok, pid} = DynamicSupervisor.start_child(state.runner_sup, spec) in the start_child_runner/2
  • adding more explicit error handling in the start_child_runner/2
  • catch an exit signal possibly resulting from the DynamicSupervisor.start_child/2 or Runner.remote_boot/2
    • the previous catch clause with {:exit, reason} was not catching the exit signal from the DynamicSupervisor.start_child/2 ```elixir iex(1)> try do throw({:exit, :ha}) catch {:exit, :ha} -> :haha end :haha iex(2)> try do exit(:ha) catch {:exit, :ha} -> :haha end ** (exit) :ha iex:2: (file) iex(2)> ````
  • adding a new handle_info/2 clause to handle the {:error, reason} Task result
  • adding new clauses to the fun in boot_runners/2
    • the previous {:exit, reason} clause would never match since the Pool GenServer is not trapping exists and thus the tasks spawn with Task.async_stream/3 would never return {:exit, reason} as mentioned in the https://hexdocs.pm/elixir/Task.html#async_stream/5

I think that the Runner.remote_boot/2 might have the same problem but I will stop here and see if this work lands anywhere - maybe I'm missing the point :)

Even though the `FLAME.Backend` allows for the `backend.init/1` to return
an error:

`@callback init(opts :: Keyword.t()) :: {:ok, state :: term()} | {:error, term()}`

The `FLAME.Pool` would not handle it and and get into an infinite restart loop. This behaviour
can be tested wit `FlameTest.Application.call` in the https://github.com/mentels/flame_test
that uses a custom backend which `init/1` returns `{:error, :just_fail}`.

If the pool is configured with `min: 0`, after the call we will keep getting:
```elixir
iex(1)> FlameTest.Application.call
:ok

13:35:17.334 [error] Task #PID<0.394.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []

13:35:18.353 [error] Task #PID<0.396.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []

13:35:19.354 [error] Task #PID<0.398.0> started from FlameTest.Pool terminating
** (MatchError) no match of right hand side value: {:error, :just_fail}
    (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
    (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
    (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
Function: #Function<1.81824075/0 in FLAME.Pool.async_boot_runner/1>
    Args: []
...
```

While if the runners are started at the application start up (e.g: `min: 1`), the error
application will not start at all, reporting an error:

```elixir
13:37:00.209 [notice] Application flame exited: :stopped
** (Mix) Could not start application flame_test: FlameTest.Application.start(:normal, []) returned an error: shutdown: failed to start child: {FLAME.Pool, FlameTest.Pool}
    ** (EXIT) shutdown: failed to start child: {FLAME.Pool, FlameTest.Pool}
        ** (EXIT) an exception was raised:
            ** (MatchError) no match of right hand side value: {:error, :just_fail}
                (flame 0.5.1) lib/flame/pool.ex:696: FLAME.Pool.start_child_runner/2
                (elixir 1.17.2) lib/task/supervised.ex:101: Task.Supervised.invoke_mfa/2
                (elixir 1.17.2) lib/task/supervised.ex:36: Task.Supervised.reply/4
````

This commit attempts to fix the error by:
* preventing  the `MatchError` resulting from the `{:ok, pid} =  DynamicSupervisor.start_child(state.runner_sup, spec)`
in the `start_child_runner/2`
* adding more explicit error handling in the `start_child_runner/2`
* catch an exit signal possibly resulting from the `DynamicSupervisor.start_child/2` or `Runner.remote_boot/2`
  * the previous catch clause with `{:exit, reason}` was not catching the exit signal from the `DynamicSupervisor.start_child/2`
    ```elixir
    iex(1)> try do throw({:exit, :ha}) catch {:exit, :ha} -> :haha end
    :haha
    iex(2)> try do exit(:ha) catch {:exit, :ha} -> :haha end
    ** (exit) :ha
        iex:2: (file)
    iex(2)>
    ````
* adding a new `handle_info/2` clause to handle the `{:error, reason}` Task result
* adding new clauses to the fun in `boot_runners/2`
  * the previous `{:exit, reason}` clause would never match since the Pool `GenServer` is not trapping exists
  and thus the tasks spawn with `Task.async_stream/3` would never return `{:exit, reason}` as mentioned in the
  https://hexdocs.pm/elixir/Task.html#async_stream/5

I think that the `Runner.remote_boot/2` might have the same problem but I will stop here and see if this
work lands anywhere - maybe I'm missing the point :)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant