-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix batch transform issue for tabular predictor with multiple partitions #138
Conversation
if isinstance(test_data, str) and not os.path.isdir(test_data): | ||
# either a file to a dataframe, or a file to an image | ||
if is_image_file(test_data): | ||
logger.warning( | ||
"Are you sure you want to do batch inference on a single image? You might want to try `deploy()` and `predict_real_time()` instead" | ||
) | ||
else: | ||
test_data = load_pd.load(test_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it cause problem if images are uploaded here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These logics have been copied over to line #L1184 - #L1192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured. Now after I read it again, I think moving this to _predict()
makes sense.
if isinstance(test_data, str) and not os.path.isdir(test_data): | ||
# either a file to a dataframe, or a file to an image | ||
if is_image_file(test_data): | ||
raise ValueError("Image file is not supported for batch inference") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reword to "Single image file"? And suggest to try .deploy()
if not persist: | ||
os.remove(results_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
# Ensure inference_kwargs is a dictionary right before use | ||
if inference_kwargs is None: | ||
inference_kwargs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check seems to be redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this code are moved from previous #L64 where to make sure if inference_kwargs
are explicitly set up to None in payload, we will still have the dict format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you updated line #65 to make the default value be {}
, so inference_kwargs
should not be None
. But this is a good safety nest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default will be set to {}
only if inference_kwargs
not available from payload
, it will be set to None
if
payload = {
"inference_kwargs": None
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
Description:
This PR fixes the issue where batch transform jobs fail due to column misalignment when the input CSV file is partitioned into multiple records. The problem arises because headers from different partitions are not handled properly, leading to misaligned columns and prediction failures during inference.
Changes:
_read_with_fallback
and_align_columns
helper functions to handle column alignment.transform_fn
intabular_serve.py
to use these helper functions.Limitations:
original_features
, which can be tracked in issue #4477.Steps to Reproduce:
The following script can be used to reproduce the issue:
Expected Behavior:
The batch transform job should handle multiple partitions correctly, aligning columns across the partitions and ignoring or managing headers if present in individual partitions.
Observed Behavior:
The job fails with the following error logs:
Logs show that the columns are misaligned for certain partitions:
Environment:
autogluon==1.1.0
MultiRecord
strategy.MaxPayloadInMB=1
is set to ensure multiple partitions.Additional Information:
The issue seems to be that AutoGluon Cloud is not handling the headers properly when dealing with batch transform partitioned records. In a multi-partition job, not all batches will have the header/column, which is causing the column misalignment.
Note:
This fix currently only works for the tabular predictor. Support for multimodal and timeseries predictors depends on the implementation of
original_features
, which can be tracked in issue #4477.