-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Predict does not work because of data type mismatch for same dataframe. #4124
Comments
Thanks for filing this @Mhsh! We'll take a look at this issue and get back to you soon. |
I had the same problem for category data. When I was passing the single record for prediction the same above error occurred and then I have to manually convert the data type to 'category' as below. X_train['Rough Type'].head(1).astype('category') |
@Mhsh do you mind showing the stack trace for the categorical data case like for the integer nullable case? Thanks! |
We should be able to handle the Integer/IntegerNullable case with #4077. Seeing @Mhsh's stack trace for the categorical data will be helpful to decide if/how to handle this since the |
I got below error when I passed the single dataframe for prediction. This was not related to nullable object or something but the data (categorical) which I was passing in dataframe was not recognised as categorical logical type. When I tried to see the details below was the response. The above code works when I pass X_test dataframe which contains record which imitates the X_train data. |
@Mhsh the df.ww.init(schema=X_train.ww.schema)
model.predict(df) |
Thanks @tamargrey. The code is working with the workaround that you provided. |
@tamargrey , am getting the below error when i tried to set the types from
Am following the tutorial here: https://compose.alteryx.com/en/stable/examples/predict_next_purchase.html. Following are my changes:
|
@gautamborad I expect this error is happening because If if you want to update the logical types of only the columns in |
@tamargrey thanks for the quick reply! I think the columns in both
Gives the output:
Hope am not missing something obvious here. Also,
|
I have trained a dataset using eval ml and below is the best fit pipeline.
pipeline = RegressionPipeline(component_graph={'Replace Nullable Types Transformer': ['Replace Nullable Types Transformer', 'X', 'y'], 'Imputer': ['Imputer', 'Replace Nullable Types Transformer.x', 'Replace Nullable Types Transformer.y'], 'One Hot Encoder': ['One Hot Encoder', 'Imputer.x', 'Replace Nullable Types Transformer.y'], 'Random Forest Regressor': ['Random Forest Regressor', 'One Hot Encoder.x', 'Replace Nullable Types Transformer.y']}, parameters={'Imputer':{'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'boolean_impute_strategy': 'most_frequent', 'categorical_fill_value': None, 'numeric_fill_value': None, 'boolean_fill_value': None}, 'One Hot Encoder':{'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Random Forest Regressor':{'n_estimators': 100, 'max_depth': 6, 'n_jobs': -1}}, random_seed=0)
I have stored the model and loading it with below code and everything is working fine.
Model Store
import pickle
best_pipeline.save(MODEL_NAME+'.pkl')
Model Load
with open('BidPrediction.pkl', "rb") as f:
model = pickle.load(f)
try:
df = X_train
print(model.predict(df))
except Exception as e:
print(e)
Ouput:
501 117.129174
90 343.367964
527 153.735972
576 225.164953
200 140.293371
...
277 222.844593
9 1127.711225
359 1385.688900
192 146.658027
559 833.751599
Name: Ideal Bid Price/Cts ($), Length: 485, dtype: float64
ISSUE
Now I am trying to use this model to predict a single data from dataframe and it is giving below error.
try:
df = X_train.head(1)
print(model.predict(df))
except Exception as e:
print(e)
ERROR:
Input X data types are different from the input types the pipeline was fitted on.
When I tried to inspect the error it seems the problem is that there is different data type are assigned to pipeline as compared with input feature. My guess is that it works with whole dataset because there are null entries in 'Target Days' and 'Actual Days' whereas it is not null when single instance is passed.
ERROR details:
{'input_features_types': Logical Type Semantic Tag(s)
Column
Rough Type Categorical ['category']
Source Categorical ['category']
Avg Size Double ['numeric']
Manufaturing Rate Per Cts ($) Double ['numeric']
Expected Color Variation % Double ['numeric']
Expected Polish Variation % Double ['numeric']
Profit Margin % Double ['numeric']
Sales Rate Per Cts Double ['numeric']
Target Days Integer ['numeric']
Actual Days Integer ['numeric']
Interest Paid/Cts Double ['numeric'], 'pipeline_features_types': Logical Type Semantic Tag(s)
Column
Rough Type Categorical ['category']
Source Categorical ['category']
Avg Size Double ['numeric']
Manufaturing Rate Per Cts ($) Double ['numeric']
Expected Color Variation % Double ['numeric']
Expected Polish Variation % Double ['numeric']
Profit Margin % Double ['numeric']
Sales Rate Per Cts Double ['numeric']
Target Days IntegerNullable ['numeric']
Actual Days IntegerNullable ['numeric']
Interest Paid/Cts Double ['numeric']}
Not sure how to convert the Integer to IntegerNullable as I am getting error when I try to use single row of dataframe for prediction.
Note- I will deploy this model so mostly single record with come for prediction.
The text was updated successfully, but these errors were encountered: