Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mlflow logging integration with yolox training #1773

Merged
merged 16 commits into from
Jul 11, 2024

Conversation

Im-Himanshu
Copy link
Contributor

Needed integration of yolox to log experiments with mlflow, the pull request provide additional option in -l --logger argument to log output in "mlflow".

Requires an environment file (.env) in the root folder of the projects.

required additional dependency of mlflow and python-dotenv failing which error is raised if logger is set to mlflow.

@Im-Himanshu Im-Himanshu changed the title mlflow integration with yolox training mlflow logging integration with yolox training May 15, 2024
@Im-Himanshu
Copy link
Contributor Author

Im-Himanshu commented May 15, 2024

Tested the logging on to data bricks, following logs are available for all the runs.

Logged params

image

logged metrices

image

logged artifacts

image

@Im-Himanshu
Copy link
Contributor Author

@FateScript Requesting you to please review the pull request.

Copy link
Member

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Im-Himanshu Thanks for your contribution : )

Please check my review suggestion and lint your code to pass the github workflow.

yolox/core/trainer.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Outdated Show resolved Hide resolved
yolox/core/trainer.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Outdated Show resolved Hide resolved
yolox/core/trainer.py Outdated Show resolved Hide resolved
yolox/utils/logger.py Outdated Show resolved Hide resolved
@FateScript
Copy link
Member

Any update? @Im-Himanshu

@Im-Himanshu
Copy link
Contributor Author

Im-Himanshu commented Jun 17, 2024

Any update? @Im-Himanshu

@FateScript Excuse me for the delayed response, I have pushed new commits to implement all the suggestions.
Kindly review.

@Im-Himanshu
Copy link
Contributor Author

Any update? @Im-Himanshu

@FateScript Excuse me for the delayed response, I have pushed new commits to implement all the suggestions. Kindly review.

@FateScript Gentle Reminder to Please check and merge the request.

docs/mlflow_integration.md Outdated Show resolved Hide resolved
docs/mlflow_integration.md Outdated Show resolved Hide resolved
Copy link
Member

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Im-Himanshu Please fix it.

@@ -98,8 +98,11 @@ def setup_logger(save_dir, distributed_rank=0, filename="log.txt", mode="a"):

logger.remove()
save_file = os.path.join(save_dir, filename)
crnt_log_save_file = os.path.join(save_dir, 'train_log_crnt.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this train_log_crnt.txt is needed? Seems that your code redirect io to it and remove this file if it exists.

Copy link
Contributor Author

@Im-Himanshu Im-Himanshu Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FateScript As you have suggested in your earlier review that you recommend creating a new logger file.
Moreover, it is deleted at that start to keep only current run logs in this and upload that part only, current .log file has logs of all the previous runs which may be confusing in experiment tracking.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my bad, I didn't make it clear. In fact, the logger file means logger.py but not the file where logs are saved.

Your code here just make a copy of the log file. It's better for you to reset to code here. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it and reverted to old code.

though I think it would have been better to log only the current experiment logs to mlflow.

docs/mlflow_integration.md Outdated Show resolved Hide resolved
docs/mlflow_integration.md Show resolved Hide resolved
yolox/utils/mlflow_logger.py Show resolved Hide resolved
yolox/utils/mlflow_logger.py Show resolved Hide resolved
yolox/utils/mlflow_logger.py Outdated Show resolved Hide resolved
yolox/utils/mlflow_logger.py Outdated Show resolved Hide resolved
yolox/utils/mlflow_logger.py Outdated Show resolved Hide resolved
@FateScript
Copy link
Member

FateScript commented Jul 4, 2024

@Im-Himanshu Also please don't forget to lint your code.

@Im-Himanshu
Copy link
Contributor Author

@Im-Himanshu Also please don't forget to lint your code.

Linted the code, the only major issue in lint, is the import statement which has to be done inside class (same as being done in wandb logger) because this is an optional feature.
image

@Im-Himanshu
Copy link
Contributor Author

@FateScript @Cloudhax23 completed all the suggestion, please check.

Copy link
Member

@FateScript FateScript left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -98,8 +98,11 @@ def setup_logger(save_dir, distributed_rank=0, filename="log.txt", mode="a"):

logger.remove()
save_file = os.path.join(save_dir, filename)
crnt_log_save_file = os.path.join(save_dir, 'train_log_crnt.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my bad, I didn't make it clear. In fact, the logger file means logger.py but not the file where logs are saved.

Your code here just make a copy of the log file. It's better for you to reset to code here. Thanks!

yolox/utils/mlflow_logger.py Show resolved Hide resolved
@Im-Himanshu
Copy link
Contributor Author

@FateScript Removed the additional logger and linted the code again to remove build process error.

@FateScript
Copy link
Member

@Im-Himanshu please isort your code(isort -rc), see here for more details.

@Im-Himanshu
Copy link
Contributor Author

Im-Himanshu commented Jul 8, 2024

@Im-Himanshu please isort your code(isort -rc), see here for more details.

@FateScript in trainer.py and mlflow_logger.py, isort has transferred the import datetime and other default python package import at the end but it was not the case in original code so as of now I have manually shifted them to the top, please see if it is correct or not.

I think my isort setting are not matched to the project setting, if there are any settings of isort that I can follow for this project, I can rerun the isort or otherwise the sort that I have done now should be correct based on my intuition.

from yolox.utils import is_main_process



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two many blank lines here.

@FateScript
Copy link
Member

@Im-Himanshu please isort your code(isort -rc), see here for more details.

@FateScript in trainer.py and mlflow_logger.py, isort has transferred the import datetime and other default python package import at the end but it was not the case in original code so as of now I have manually shifted them to the top, please see if it is correct or not.

I think my isort setting are not matched to the project setting, if there are any settings of isort that I can follow for this project, I can rerun the isort or otherwise the sort that I have done now should be correct based on my intuition.

If you instal the correct isort version(4.3.21 here), everything will be ok.
BTW, be sure of the format check here passed before you commit.

@Im-Himanshu
Copy link
Contributor Author

@Im-Himanshu please isort your code(isort -rc), see here for more details.

@FateScript in trainer.py and mlflow_logger.py, isort has transferred the import datetime and other default python package import at the end but it was not the case in original code so as of now I have manually shifted them to the top, please see if it is correct or not.
I think my isort setting are not matched to the project setting, if there are any settings of isort that I can follow for this project, I can rerun the isort or otherwise the sort that I have done now should be correct based on my intuition.

If you instal the correct isort version(4.3.21 here), everything will be ok. BTW, be sure of the format check here passed before you commit.

@FateScript My isort version was not as required by the library, hence the previous errors.
Anyway I have removed the given pylint and isort errors. It should pass all the checks this time.

yolox/utils/__init__.py Outdated Show resolved Hide resolved
@FateScript
Copy link
Member

LGTM.

@FateScript FateScript merged commit 58c2dd0 into Megvii-BaseDetection:main Jul 11, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants