-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fallocate is interrupted by signal at startup #1368
Comments
@MatiasElo Do you have any comments for this ? |
Hmm, this is the first time I've seen this failure. Does this happen constantly or was it a random occurrence? Also, what was the return code of |
The error occurs easily on k8s env, 10% recurrence. I think fallocate return core is EINTR(Interrupted system call)。 Size is around 4M |
Thanks for the info. Looks like a good solution would be to add a number of retries if EINTR is received. |
Does this change fix the issue you are seeing? |
strange that the issue is not reproduced after I recompile...update later.... |
Update:
The other issue, similar to this is that I sometimes meet SIGSEGV in dpdk which is called odp_pktio_start() at startup. Thanks. |
Hmm, I haven't had to trace signals before, so unfortunately I cannot help much. Usually I just isolate the data plane cores and redirect all signals to a set of control cores. One thing which pops out in your log is |
fallocate() (and ftruncate()) may fail due to system interrupts, so retry the operation FALLOCATE_RETRIES times. Fixes: OpenDataPlane#1368 Signed-off-by: Matias Elo <matias.elo@nokia.com> Reported-and-tested-by: Christian Hong <guochun.hgc@alibaba-inc.com>
fallocate() (and ftruncate()) may fail due to system interrupts, so retry using TEMP_FAILURE_RETRY macro. Fixes: OpenDataPlane#1368 Signed-off-by: Matias Elo <matias.elo@nokia.com> Reported-and-tested-by: Christian Hong <guochun.hgc@alibaba-inc.com>
A pool create failed issue is detected in our system, error shows the system call fallocate is interruptted:
"odp_ishm.c:707:create_file():Huge page memory allocation failed: fd=582, file=/dev/hugepages/0/odp-16-ishm-pool_008_pkt-rx:7-0, err="Interrupted system call""
Is that better to retry the system call after getting the error return ?
While the signal is raised is unknown yet...
The text was updated successfully, but these errors were encountered: