Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to save the preprocessing objects for inference? (OneHotEncoder, Scaler) #76

Closed
kkristacia opened this issue May 21, 2024 · 5 comments · Fixed by #79
Closed
Labels
enhancement New feature or request

Comments

@kkristacia
Copy link

Hi thank you for developing this package! I want to be able to load the already saved model, then use it for inference like in production. How can I let the inference dataset to go through the same preprocessing steps eg. OneHotEncoding of categorical variables, scaling?

@akashsaravanan-georgian
Copy link
Member

akashsaravanan-georgian commented May 22, 2024

Hi @kkristacia,
To load the model, you just need to run the same steps as creating the model. The only difference is that while calling model = AutoModelWithTabular.from_pretrained(...) make sure you set the first argument pretrained_model_name_or_path to the path that you saved your model in.

Similarly, to preprocess the inference dataset, I would recommend running load_data_from_folder function with the same parameters used in the load_data_from_folder while training. Use the same training data to reconstruct the encoders and replace the test data with your inference data. I know this isn't optimal so we'll definitely change this in a future version.

Please let me know if you run into any other issues and I can help you solve it! :)

@akashsaravanan-georgian akashsaravanan-georgian added the question Further information is requested label May 22, 2024
@kkristacia
Copy link
Author

Hi Akash, thanks for the clarification. Yea I was hoping for some way to not use the training data during inference. Definitely will be great if future versions can have the functionality!

@dsunart
Copy link

dsunart commented May 28, 2024

Hi Akash. Just to second this - it would be great if the preprocessing objects were saved for making inferences in production. Loading my whole dataset into my production environment would take up space unnecessarily. Love the toolkit, and looking forward to seeing an update in the future!

@akashsaravanan-georgian akashsaravanan-georgian added enhancement New feature or request and removed question Further information is requested labels May 28, 2024
@akashsaravanan-georgian
Copy link
Member

Thanks @dsunart! I'm reopening this issue as a feature request. It should be added in as part of our next release!

@akashsaravanan-georgian
Copy link
Member

Hey @kkristacia and @dsunart, happy to note that this is now part of the toolkit. You can see this in action in this example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants