Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ValueError: [...] the array at index 0 has size 894365 and the array at index 1 has size 1117957 #1296

Closed
DRMPN opened this issue May 30, 2024 · 0 comments · Fixed by #1312
Assignees
Labels
api Anything related to user-facing interfaces & parameter passing bug Something isn't working

Comments

@DRMPN
Copy link
Collaborator

DRMPN commented May 30, 2024

Expected Behavior

Auto preprocessing should work correctly. Pipeline should be fitted.

Current Behavior

FEDOT fails to fit catboostreg model with use_auto_preprocessing=True option.

PS C:\Users\nnikitin-user\Desktop\automl_may> & C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/python.exe c:/Users/nnikitin-user/Desktop/automl_may/flood_1.py
2024-05-16 13:16:58,812 - ApiDataProcessor - Preprocessing data
2024-05-16 13:16:58,812 - ApiDataProcessor - Train Data (Original) Memory Usage: 452.05 MB Data Shapes: ((1117957, 53), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Train Data (Processed) Memory Usage: 1.05 GB Data Shape: ((1117957, 126), (1117957, 1))
2024-05-16 13:22:54,236 - ApiDataProcessor - Data preprocessing runtime = 0:05:55.423210
2024-05-16 13:22:55,149 - AssumptionsHandler - Initial pipeline fitting started
2024-05-16 13:23:21,260 - PipelineNode - Trying to fit pipeline node with operation: catboostreg
2024-05-16 13:23:22,181 - AssumptionsHandler - Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension
0, the array at index 0 has size 894365 and the array at index 1 has size 1117957.
Traceback (most recent call last):
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
    pipeline.fit(data_train, n_jobs=eval_n_jobs)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
    train_predicted = self._fit(input_data=copied_input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
    self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
    self.fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
    operation_implementation.fit(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
    input_data = input_data.get_not_encoded_data()
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
    new_features = np.hstack((num_features, cat_features))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957

Traceback (most recent call last):
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 71, in fit_assumption_and_check_correctness
    pipeline.fit(data_train, n_jobs=eval_n_jobs)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 197, in fit
    train_predicted = self._fit(input_data=copied_input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\pipeline.py", line 112, in _fit
    train_predicted = self.root_node.fit(input_data=input_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\pipelines\node.py", line 200, in fit
    self.fitted_operation, operation_predict = self.operation.fit(params=self._parameters,
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\operation.py", line 87, in fit
    self.fitted_operation = self._eval_strategy.fit(train_data=data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\boostings.py", line 33, in fit
    operation_implementation.fit(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\operations\evaluation\operation_implementations\models\boostings_implementations.py", line 28, in fit
    input_data = input_data.get_not_encoded_data()
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\core\data\data.py", line 628, in get_not_encoded_data
    new_features = np.hstack((num_features, cat_features))
  File "<__array_function__ internals>", line 200, in hstack
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\shape_base.py", line 370, in hstack
    return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
  File "<__array_function__ internals>", line 200, in concatenate
ValueError: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\nnikitin-user\Desktop\automl_may\flood_1.py", line 85, in <module>
    auto_model.fit(features=train, target="FloodProbability")
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py", line 181, in fit
    self.current_pipeline, self.best_models, self.history = self.api_composer.obtain_model(self.train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 63, in obtain_model
    initial_assumption, fitted_assumption = self.propose_and_fit_initial_assumption(train_data)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\api_composer.py", line 107, in propose_and_fit_initial_assumption    assumption_handler.fit_assumption_and_check_correctness(deepcopy(initial_assumption[0]),
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 86, in fit_assumption_and_check_correctness
    self._raise_evaluating_exception(ex)
  File "C:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\api_utils\assumptions\assumptions_handler.py", line 94, in _raise_evaluating_exception    raise ValueError(advice_info)
ValueError: Initial pipeline fit was failed due to: all the input array dimensions except for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 894365 and the array at index 1 has size 1117957. Check pipeline structure and the correctness of the data
PS C:\Users\nnikitin-user\Desktop\automl_may>

Possible Solution

Some features are deleted during the auto preprocessing.
Perhaps it is related to categorical features.
Debug the following breakpoints to find and fix the problem.
image

Steps to Reproduce

  1. Download code and data from https://www.kaggle.com/code/eliyahusanti/fedot-nss-lab-automl-catboost-0-8676
  2. Set FEDOT parameter use_auto_preprocessing=True
  3. Run the code

Context [OPTIONAL]

Participating in a Kaggle competition.

@DRMPN DRMPN added bug Something isn't working api Anything related to user-facing interfaces & parameter passing labels May 30, 2024
@aPovidlo aPovidlo self-assigned this Jul 22, 2024
aPovidlo added a commit that referenced this issue Jul 23, 2024
Fix of copying categorical features in data splitting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Anything related to user-facing interfaces & parameter passing bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants