Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NimblePool.checkout :idle_timeout When setting pool_max_idle_time #291

Open
mathieurousseau opened this issue Sep 26, 2024 · 6 comments · May be fixed by #292
Open

NimblePool.checkout :idle_timeout When setting pool_max_idle_time #291

mathieurousseau opened this issue Sep 26, 2024 · 6 comments · May be fixed by #292

Comments

@mathieurousseau
Copy link

Hello

We added a pool_max_idle_time to pool configuration.
We then started to have the following issue:

Elixir.ErlangError Erlang error: "** (Oban.CrashError) ** (exit) exited in: NimblePool.checkout(#PID<0.26263.0>)\n    ** (EXIT) shutdown: :idle_timeout" 
    lib/finch/http1/pool.ex:96 Finch.HTTP1.Pool.request/6
    lib/finch.ex:493 anonymous fn/4 in Finch.request/3
    /app/deps/telemetry/src/telemetry.erl:324 :telemetry.span/3
    lib/req/finch.ex:239 Req.Finch.run_finch_request/3
    lib/req/finch.ex:71 Req.Finch.run/4
    lib/req/request.ex:1103 Req.Request.run_request/1
    lib/req/request.ex:1047 Req.Request.run/1

connection max idle time is 10_000
pool max idle time is 20_000

If we let it empty (:infinity). We have no issue.

Thanks.
Mathieu

@oliveigah
Copy link
Contributor

Hey @mathieurousseau! I'm suspecting that this may be a issue of how idle termination is done on finch.

It stops the pool when the first connection becomes idle and not all of them. I'm not sure if this is the semantics we need here, and it's not clear how to achieve proper semantics with the current implementation of nimble_pool's idle ping feature. Gona take a look at it.

Is this bug consitently reproducible?

What I think may be happening is something like this:

  • your pool have 2 connections idle for more than 20 seconds, so this pool should be terminated in the next verification cycle
  • before the pool is terminated some oban worker picks up the first connection to make some reequest
  • now the idle verification happens and terminates the pool because the only remaining connection is idle for more than 20 seconds
  • oban worker crashes because the finch pool is terminated with the reason idle timeout

@TheOneric
Copy link

I recently got similar errors in my logs and switched finch to the patched version from #292 by @oliveigah for testing. (Note: finch or nimble_pool were not upgraded before this started to occur, though another dependency also using NimblePool was upgraded; not sure if related)

Unfortunately the patch did not seem to fix this as today this showed up in logs again:

2025-03-16 19:04:17.836 [error] #PID<0.26339.0> running MyApp.Web.Endpoint (connection #PID<0.26338.0>, stream id 1) terminated
Server: mydomain.example:80 (http)
Request: POST /inbox
** (exit) exited in: NimblePool.checkout(#PID<0.26315.0>)
    ** (EXIT) shutdown: :idle_timeout

If I’m reading the application code correctly, pool_max_idle_time is set to 60 5(?) seconds, connection timeout to 10 seconds

@oliveigah
Copy link
Contributor

Hey @TheOneric, thanks for your input.

I've taken another look at the code, and the only way I can see this still happening with my proposed changes in #292 is if the work you're doing after checking out a connection takes longer than the pool_max_idle_time. This seems to be the case since your pool_max_idle_time is set to only 5 seconds.

The reasoning behind this is:

  1. Suppose pool_max_idle_time is set to 10 seconds.
  2. A checkout occurs at t1, so last_checkout_time = t1.
  3. The process that checked out this connection performs some work that takes 30 seconds.
  4. If no other checkouts occur during this time, the idle verification will eventually terminate the pool, as more than 10 seconds have passed since the last checkout.

I think a good change would be to track whether a connection is currently checked out before terminating it due to idle verification. I'll work on this and then we can check if it resolves the issue for you.

In the meantime, you can try increasing your pool_max_idle_time to see if it helps.

@TheOneric
Copy link

Thanks! I now increased the pool_max_idle to 60 seconds.

It being set to only 5 seconds by default before might actually be a bug in the application as it kind of seems like two app config options got confused.

@oliveigah
Copy link
Contributor

@TheOneric, I've updated my branch in #292 to include an "in-use" verification to prevent idle termination while a connection is checked out.

If you could test it and report back on the PR, that would be great!

@TheOneric
Copy link

TheOneric commented Mar 16, 2025

Thanks for the quick update! I reverted to a 5s pool timeout and am testing the new patch now; if the error doesn't return in a couple days to a week, I’ll let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants