Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The truth value of a DataFrame is ambiguous. #1298

Closed
DRMPN opened this issue Jun 1, 2024 · 2 comments
Closed

[Bug]: The truth value of a DataFrame is ambiguous. #1298

DRMPN opened this issue Jun 1, 2024 · 2 comments
Labels
api Anything related to user-facing interfaces & parameter passing bug Something isn't working

Comments

@DRMPN
Copy link
Collaborator

DRMPN commented Jun 1, 2024

Expected Behavior

Pipeline starts tuning with provided input data.

tuned_pipiline = auto_model.tune(input_data=orig_data, timeout=10, cv_folds=10, n_jobs=4)

Current Behavior

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 tuned_pipiline = auto_model.tune(input_data=train, timeout=10, cv_folds=10, n_jobs=4)

File [c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\fedot\api\main.py:230](file:///C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/lib/site-packages/fedot/api/main.py:230), in Fedot.tune(self, input_data, metric_name, iterations, timeout, cv_folds, n_jobs, show_progress)
    227     raise ValueError(NOT_FITTED_ERR_MSG)
    229 with fedot_composer_timer.launch_tuning('post'):
--> 230     if not input_data: 
    231         input_data = self.train_data
    232     cv_folds = cv_folds or self.params.get('cv_folds')

File [c:\Users\nnikitin-user\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py:1527](file:///C:/Users/nnikitin-user/AppData/Local/Programs/Python/Python310/lib/site-packages/pandas/core/generic.py:1527), in NDFrame.__nonzero__(self)
   1525 @final
   1526 def __nonzero__(self) -> NoReturn:
-> 1527     raise ValueError(
   1528         f"The truth value of a {type(self).__name__} is ambiguous. "
   1529         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1530     )

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Possible Solution

Change line 230 in fedot/api/main.py to the following:

if not input_data: 
    input_data = self.train_data

Steps to Reproduce

Data from https://www.kaggle.com/competitions/playground-series-s4e6

from fedot.api.main import Fedot
import pandas as pd

train = pd.read_csv("/automl-june/playground-series-s4e6/train.csv")
test = pd.read_csv("/automl-june/playground-series-s4e6/test.csv")

train.drop(columns=["id"], inplace=True)
test.drop(columns=["id"], inplace=True)

auto_model = Fedot(
    problem="classification",
    metric=["precision", "accuracy", "roc_auc"],
    preset="best_quality",
    with_tuning=True,
    timeout=60,
    cv_folds=10,
    seed=42,
    n_jobs=1,
    logging_level=10,
    use_pipelines_cache=False,
    use_auto_preprocessing=False,
)

auto_model.fit(features=train, target="Target")

prediction = auto_model.predict(features=test, save_predictions=True)

print(auto_model.return_report().head(10))

print(auto_model.get_metrics(target=train.Target))

tuned_pipiline = auto_model.tune(input_data=train, timeout=10, cv_folds=10, n_jobs=4)

Context [OPTIONAL]

Participating in a Kaggle competition PS4E6.

@DRMPN DRMPN added bug Something isn't working api Anything related to user-facing interfaces & parameter passing labels Jun 1, 2024
@Lopa10ko
Copy link
Collaborator

Note

closed as irrelevant (can be reissued if necessary)

the signature of the tune function indicates that it expects an instance of InputData, but in a snippet from the Steps to Reproduce, a pd.DataFrame object is passed.

for this particular launch, you can do the following:

from fedot.core.data.data import array_to_input_data

...

input_data = array_to_input_data(features_array=train.loc[:, train.columns != 'Target'].values,
                                 target_array=train.Target.values)
tuned_pipiline = auto_model.tune(input_data=input_data, timeout=2, cv_folds=10, n_jobs=4)

seems to work as expected on kaggle data

@aPovidlo
Copy link
Collaborator

@Lopa10ko Я думаю, что лучше сделать по аналогии с другими API методами. Например, в fit() ожидается features: FeaturesType. Поэтому думаю, что и для tune() стоит сделать по аналогии.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Anything related to user-facing interfaces & parameter passing bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants