Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix assert failed when timeout during call rate_ddir #1648

Merged
merged 1 commit into from
Oct 19, 2023

Conversation

zqs-Oppenauer
Copy link
Contributor

@zqs-Oppenauer zqs-Oppenauer commented Oct 18, 2023

fix assert failed when timeout during call rate_ddir

Adding DDIR_TIMEOUT in enum fio_ddir, and rate_ddir returns it when fio timeouts. set_io_u_file will directly break out of the loop, and fill_io_u won't be called again, which causes assert to fail in rate_ddir, because td->rwmix_ddir is DDIR_INVAL.

Signed-off-by: QingSong Zhu zhuqingsong.0909@bytedance.com

@axboe
Copy link
Owner

axboe commented Oct 18, 2023

This needs a MUCH better description in the actual commit message. There's nothing in there, you left it all on the github pull request message which won't go into the tree.

You also need the Signed-off-by in there, no in the git pull request.

io_u.c Outdated
@@ -952,6 +953,11 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
set_rw_ddir(td, io_u);

if (io_u->ddir == DDIR_INVAL) {
now = utime_since_now(&td->epoch);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be duplicating this timeout check. This fix very much feels like it's just papering around the symptoms rather than fixing the core issue, which is that we can get DDIR_INVAL if we have timed out.

io_u.c Outdated
break;

zbd_put_io_u(td, io_u);

put_file_log(td, f);
td_io_close_file(td, f);
io_u->file = NULL;

if (ret == 2)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is why I think it's papering around it, you introduce some random other return value for this condition.

@axboe
Copy link
Owner

axboe commented Oct 18, 2023

Caveat - haven't attempted to reproduce this, but can't we just do:

diff --git a/io_u.c b/io_u.c
index 07e5bac5ef52..27c4d55fa4b7 100644
--- a/io_u.c
+++ b/io_u.c
@@ -792,7 +792,7 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 		 * https://github.com/axboe/fio/issues/1501#issuecomment-1418327049
 		 */
 		td->rwmix_ddir = ddir;
-	} else
+	} else if (ddir != DDIR_INVAL)
 		td->rwmix_ddir = rate_ddir(td, ddir);
 	return td->rwmix_ddir;
 }

and be done with it?

@axboe
Copy link
Owner

axboe commented Oct 18, 2023

Or probably this is the better way, same kind of thing:

diff --git a/io_u.c b/io_u.c
index 07e5bac5ef52..ea016c90e0e2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -792,7 +792,7 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 		 * https://github.com/axboe/fio/issues/1501#issuecomment-1418327049
 		 */
 		td->rwmix_ddir = ddir;
-	} else
+	} else if (ddir_rw(ddir))
 		td->rwmix_ddir = rate_ddir(td, ddir);
 	return td->rwmix_ddir;
 }

@zqs-Oppenauer
Copy link
Contributor Author

Or probably this is the better way, same kind of thing:

diff --git a/io_u.c b/io_u.c
index 07e5bac5ef52..ea016c90e0e2 100644
--- a/io_u.c
+++ b/io_u.c
@@ -792,7 +792,7 @@ static enum fio_ddir get_rw_ddir(struct thread_data *td)
 		 * https://github.com/axboe/fio/issues/1501#issuecomment-1418327049
 		 */
 		td->rwmix_ddir = ddir;
-	} else
+	} else if (ddir_rw(ddir))
 		td->rwmix_ddir = rate_ddir(td, ddir);
 	return td->rwmix_ddir;
 }

Thank you very much for your reply!
When timeout, td->rwmix_ddir will always be DDIR_INVAL and set_io_u_file loops to call fill_io_u until get_next_file failed, I think it might be more reasonable to exit the set_io_u_file directly after checking the timeout. Besides, when td->rwmix_ddir is DDIR_INVAL and fill_io_u is called again, the array td->io_issues will be accessed as index -1 because the value of DDIR_INVAL is -1, the relevant code is as below, i don't know whether this is a bug?
截屏2023-10-19 00 04 17

Maybe i can add a new value like DDIR_TIMEOUT in enum fio_ddir and rate_ddirreturns DDIR_TIMEOUT when timeout like this?

diff --git a/io_u.c b/io_u.c
index 07e5bac5..cf8c3618 100644
--- a/io_u.c
+++ b/io_u.c
@@ -717,7 +717,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
                 * check if the usec is capable of taking negative values
                 */
                if (now > td->o.timeout) {
-                       ddir = DDIR_INVAL;
+                       ddir = DDIR_TIMEOUT;
                        return ddir;
                }
                usec = td->o.timeout - now;
@@ -726,7 +726,7 @@ static enum fio_ddir rate_ddir(struct thread_data *td, enum fio_ddir ddir)
 
        now = utime_since_now(&td->epoch);
        if ((td->o.timeout && (now > td->o.timeout)) || td->terminate)
-               ddir = DDIR_INVAL;
+               ddir = DDIR_TIMEOUT;
 
        return ddir;
 }
@@ -951,7 +951,7 @@ static int fill_io_u(struct thread_data *td, struct io_u *io_u)
 
        set_rw_ddir(td, io_u);
 
-       if (io_u->ddir == DDIR_INVAL) {
+       if (io_u->ddir == DDIR_INVAL || io_u->ddir == DDIR_TIMEOUT) {
                dprint(FD_IO, "invalid direction received ddir = %d", io_u->ddir);
                return 1;
        }
@@ -1419,6 +1419,11 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
                put_file_log(td, f);
                td_io_close_file(td, f);
                io_u->file = NULL;
+
+               if (io_u->ddir == DDIR_TIMEOUT) {
+                       return 1;
+               }

@zqs-Oppenauer
Copy link
Contributor Author

zqs-Oppenauer commented Oct 18, 2023

This needs a MUCH better description in the actual commit message. There's nothing in there, you left it all on the github pull request message which won't go into the tree.

You also need the Signed-off-by in there, no in the git pull request.

Thank you very much for your reminder. I will add description and Signed-off-by in the commit message.

@axboe
Copy link
Owner

axboe commented Oct 18, 2023

I did ponder whether DDIR_TIMEOUT would be a solution, since it's much cleaner than hacking in odd return values and needing to check it twice. That provides a way to pass this information back inline, rather than out-of-band. I think that proposed patch looks MUCH better than the original one.

@zqs-Oppenauer
Copy link
Contributor Author

I did ponder whether DDIR_TIMEOUT would be a solution, since it's much cleaner than hacking in odd return values and needing to check it twice. That provides a way to pass this information back inline, rather than out-of-band. I think that proposed patch looks MUCH better than the original one.

Thanks very much, I have already updated this commit according to the proposed patch and added the description and Signed-off-by in the commit message.

io_u.c Outdated
@@ -1419,6 +1419,11 @@ static long set_io_u_file(struct thread_data *td, struct io_u *io_u)
put_file_log(td, f);
td_io_close_file(td, f);
io_u->file = NULL;

if (io_u->ddir == DDIR_TIMEOUT) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No braces necessary here.

@axboe
Copy link
Owner

axboe commented Oct 19, 2023

Looks much better, just one minor style issue. Outside of that, please line wrap commit messages at around 72 chars. Long lines are hard to read in git log. With those two things tweaked, this should be ready to go in.

@axboe
Copy link
Owner

axboe commented Oct 19, 2023

Once I triggered CI, it failed. The cases where we have a switch based on ddir, you need to also add DDIR_TIMEOUT to the DDIR_INVAL cases.

Adding DDIR_TIMEOUT in enum fio_ddir, and rate_ddir returns it when fio timeouts.
set_io_u_file will directly break out of the loop, and fill_io_u won't be called,
which causes assert to fail in rate_ddir, because td->rwmix_ddir is DDIR_INVAL.

Signed-off-by: QingSong Zhu zhuqingsong.0909@bytedance.com
@axboe axboe merged commit 2038b72 into axboe:master Oct 19, 2023
10 checks passed
@axboe
Copy link
Owner

axboe commented Oct 19, 2023

Thanks!

@zqs-Oppenauer zqs-Oppenauer deleted the fix_issue_1642 branch October 19, 2023 12:35
@vincentkfu vincentkfu mentioned this pull request Oct 19, 2023
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants