-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix prepared statements for parallel queries #136
base: master
Are you sure you want to change the base?
Fix prepared statements for parallel queries #136
Conversation
78f6316
to
fe9dac0
Compare
Thanks for the contribution @davoclavo. I would love to see some failing specs. But the changes look safe enough for me. Could you rebase? |
PostgreSQL prepared statements and portals are referenced by names, where setting an empty string is ok, as it means it is unnamed prepared statement. [1] However there is a weird issue happening under high loads where a different set of parameters are being set to the wrong prepared statements: `bind message supplies 10 parameters, but prepared statement “...” requires 12` And also sporadically we get `ERROR: portal "" cannot be run` making it super hard to troubleshoot as all portals currently share the same name. Setting a unique name for each prepared statement has fixed the problem in our production setting, however I am not able to pinpoint why the problem exists in the first place, as "" seems to be a valid portal name. [1] - https://www.postgresql.org/docs/11/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY
fe9dac0
to
a8d5f78
Compare
Thanks for picking up this stale pull request. I have already rebased with master. I agree, I would also love to see failing specs but sadly wasn't able to replicate the problem after attempting a couple of times :( - If there is some peace of mind, we have been running this modified version in production for about 6 months without facing this problem. |
The integration tests all currently run with a pool size of 1, which might explain the inability to replicate. |
@dangerousben I noticed that and already tried setting a larger pool for concurrent connections without luck. Is there any other place I should make an adjustment? See this particular line in my commit: a8d5f78#diff-3f8ad9e99aaed7920116b65d875d3736R49 |
It looks ok, and quick bit of debugging shows that the statements are not run sequentially. Not sure why it's not reproducing the problem. |
My theory is that the queries I am making are fast enough to not cause this race condition, but I am not entirely sure honestly. I will try to make other attempts to replicate this problem, if you have any ideas I am keen to try them out. |
Hello!
We were running into a problem where many prepared queries happening in parallel were getting the wrong set of parameters assigned to them.
The unproven theory is that the prepared query
name
andportal
values are currently set to""
potentially causing collisions. I changed all those hardcoded""
to a generated value calledgenName
which contains anAtomicInt.incrementAndGet
-- that seems to fix the bug in our set up:finagle-postgres 0.11
viaquill-finagle-postgres 3.1.0
However, I wasn't able to replicate the bug in the specs. Perhaps the problem is that the queries that I am performing are fast enough to not cause a race condition? or maybe the test queries are somehow running in a serialized fashion anyways? not sure :( - I'd gladly re-write the test to verify this behavior successfully, and would love if someone is able to guide me in the right direction.
I'd like to add that we are getting several symptoms that all seem to be related with this issue:
ERROR: portal "" cannot be run
when those parallel queries are getting executedpg_terminate_backend
regularly in order to cancel thembind message supplies 10 parameters, but prepared statement “...” requires 12