Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] Fix progress bars being displayed as partially completed in Jupyter notebooks #46289

Merged
merged 6 commits into from
Jul 2, 2024

Conversation

scottjlee
Copy link
Contributor

@scottjlee scottjlee commented Jun 27, 2024

Why are these changes needed?

There is a bug with progress bars generated from Ray Data on Jupyter notebooks, where the progress bar is left partially complete after the dataset finishes executing:
Screenshot at Jun 26 18-09-15

This PR fixes the bug so that progress bars are marked as fully completed after execution finishes. We also add a "success" or "failure" message in the bar after execution terminates:

  • On success: success
  • On failure:
    failure

The progress bar output for the same code outside Jupyter notebook on master:
master

The same code with this PR:
after

  • After the progress bar completes, the progress bar is left remaining in the output:
    Screenshot at Jun 26 19-55-24

Related issue number

Closes #44983

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sjl <sjl@anyscale.com>
@@ -177,7 +177,6 @@ def __init__(self, state: ProgressBarState, pos_offset: int):
desc=state["desc"] + " " + str(state["pos"]),
total=state["total"],
position=pos_offset + state["pos"],
leave=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes it so that we show the overall progress bar. we are matching the parameters used with default tqdm: https://github.com/ray-project/ray/blob/master/python/ray/data/_internal/progress_bar.py#L59-L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however, this has the effect of leaving the progress bar in the output after completion:
Screenshot at Jun 26 19-55-24

personally, i don't think it's a bad outcome to leave the progress bar output after completion, as it makes it easy for me to track overall progress. but happy to hear from others, or if there are users who prefer to hide it after completion by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be useful if we persisted progress bars, but can we configure this at the Ray Data level? If we change the configuration here, it'd affect other libraries that use tqdm_ray.

Btw, is this necessary to fix the issue, or is this an orthogonal change?

Copy link
Contributor Author

@scottjlee scottjlee Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this change, the progress bars would disappear after completion, which is the intended behavior when setting leave=False in tqdm: https://tqdm.github.io/docs/tqdm/#tqdm-objects

agree that it makes sense to have some configuration which can be set separately. i am thinking i can add a leave parameter to tqdm_ray, which can be set from Ray Data's ProgressBar class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, still a bit confused -- if we just made the process.update_bar(state) change and didn't modify leave, would we still be able to close the issue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did not modify leave, there is another issue where all of the progress bars will go away after completion, due to leave=False being set (this is the current behavior). The fix in this PR causes the bars to update and disappear after the bar completes:
Screenshot at Jun 27 11-38-13

so to account for this new issue, we can have leave=True to keep the bars around.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha. Yeah, that looks janky

Don't have a strong opinion about whether we leave=True for just Jupyter or for all Ray Data progress bars. @raulchen what's your opinion?

Let's definitely add a configuration for leave in tqdm_ray.tqdm and only configure for Ray Data. Want to avoid changing the behavior for other libraries and users who might use tqdm_ray.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally I actually prefer always leaving the progress bars. because it's convenient to inspect some stats.
One minor concern is that it might be confusing for some users, making them the pipeline still running.
What about we update the progress bar message in the end to something like Dataset execution finished in .. seconds, Dataset execution failed, see above stack trace for error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the feedback. Added the functionality to show the result for the overall progress bar as below.

  • On success:
    success

  • On failure:
    failure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also no longer need to set leave in tqdm_ray or in Ray Data, since we will always be using leave=True, and this is the default value in tqdm.

@scottjlee scottjlee changed the title [Data] Fix progress bars being displayed as partially completes in Jupyter notebooks [Data] Fix progress bars being displayed as partially completed in Jupyter notebooks Jun 27, 2024
Signed-off-by: sjl <sjl@anyscale.com>
@scottjlee scottjlee marked this pull request as ready for review June 27, 2024 05:46
Signed-off-by: sjl <sjl@anyscale.com>
@scottjlee scottjlee added the go add ONLY when ready to merge, run all tests label Jul 2, 2024
@raulchen raulchen merged commit baab89d into ray-project:master Jul 2, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data] Ray Data progress bars left partially completed after execution completes on Jupyter notebook
4 participants