Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handshake regression #407

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Conversation

bazsi
Copy link
Member

@bazsi bazsi commented Dec 3, 2024

This is an alternative solution to #388

  • this fixes a regression caused by Logproto remove handshake in progress #155 , reported by @HofiOne
  • This simplifies the I/O tracking for files, LogProto/LogTransport should get full authority on whether data is already available before poll and also how we need to poll it
  • Instead of the ugly "immediate_check" mechanism, we wake up LogReader if there's any event available

The regression happens in the LogReader: #155 causes us to skip a poll iteration because of the handshake (even if it does not exist). This causes a problem as at the very first iteration it is only "immediate_check" that causes us to start fetching messages. And that only fires once. By skipping that first iteration, we stall and stop fetching messages.

Instead of reverting the handshake related change (which should be correct in any async environment), I dropped the immediate_check mechanism (instead of a single internal user that could probably be eliminated) and implemented checking for I/O state in the usual polling paths. Which means: we want to know if there are unprocessed bytes in the file (e.g. poll-file-events), and if there are no bytes there, we want to know if our buffer contains an entire line.

I tried to limit the scope of this change as much as possible.

@MrAnno MrAnno self-requested a review December 3, 2024 22:53
@bazsi bazsi force-pushed the fix-handshake-regression branch 2 times, most recently from f259d45 to c2cf91f Compare December 4, 2024 19:02
Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
@bazsi bazsi force-pushed the fix-handshake-regression branch from c2cf91f to a56c0b5 Compare December 9, 2024 10:09
bazsi added 3 commits December 9, 2024 11:33
…OL already

With that, our prepare() method will be able to wake us up if we indeed
buffered a full line already.

Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
…t at EOF

Previously this case was handled using the complex "immediate_check"
logic, which I am about to get rid of.

Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
This was to force the LogReader instance to immediately start reading
once at startup. This is being replaced by poll-fd-events reporting
readability if we are at the middle of the file and also by LogProto
if we have buffered data.

Signed-off-by: Balazs Scheidler <balazs.scheidler@axoflow.com>
@bazsi bazsi force-pushed the fix-handshake-regression branch from a56c0b5 to 0cac6bf Compare December 9, 2024 10:33
@MrAnno
Copy link
Member

MrAnno commented Dec 9, 2024

@HofiOne This fixes the restart mechanism without reverting the handshake simplification.

I reviewed the changes and everything seems to be fine, but the text-server/buffered-server implementation is complex and it's relatively easy to miss small details during review.

Would it be possible to run your internal tests on this PR to double-check for regressions (if you have plans to backport this change, of course)?

@HofiOne
Copy link
Contributor

HofiOne commented Dec 10, 2024

@HofiOne This fixes the restart mechanism without reverting the handshake simplification.

I reviewed the changes and everything seems to be fine, but the text-server/buffered-server implementation is complex and it's relatively easy to miss small details during review.

Would it be possible to run your internal tests on this PR to double-check for regressions (if you have plans to backport this change, of course)?

@MrAnno The issue came up during the backport attempt of the simplification, and I'd like to continue it, but

  • currently, I cannot work on these unfortunately
  • this is the first really problematic point of the now divergent code bases
  • e.g. the text-server is completely removed as I can remember during the proxy refactoring, which got a different implementation in AxoSyslog as well, meanwhile, our wildcard-file fixes also affected these code parts

I cannot see now how to resolve this best. I'll surely get back to this, but as far as I can see, it will not be in the near future.

Thanks for the signal!

@MrAnno
Copy link
Member

MrAnno commented Dec 10, 2024

Okay, we move forward then. Thanks.

@MrAnno MrAnno merged commit 2032241 into axoflow:main Dec 10, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants