Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edit sort and loop for fitted pipeline #159

Merged
merged 1 commit into from
Nov 5, 2024

Conversation

perib
Copy link
Collaborator

@perib perib commented Oct 31, 2024

[please review the Contribution Guidelines prior to submitting your pull request. go ahead and delete this line if you've already reviewed said guidelines.]

What does this PR do?

Previously, the best fitted pipeline was selected by identifying a single pipeline with the max of the first objective function. If that pipeline failed, TPOT crashes without fitted_estimator_

Two changes

a) Pipelines are now sorted with all objective functions, in order. Now when multiple pipeline have the same score, they are also sorted by the second score, and so on. Previously, a random pipeline with the best score was selected, which may not have been the optimal pipeline given the other scores.

b) There is a very rare but not impossible chance that a pipeline will work correctly on in the objective functions, but fail on the full dataset. For example, a selector function that happens to select only positive values during when evaluated on the cv folds, might then select a different column that does include negative values when trained on the full dataset. If the final estimator is MultinomialNB, this will execute correctly on the objective function, but throw an error on the full dataset as it cannot accept negative values. This could cause TPOT to crash

To resolve this, TPOT will now loop through the best pipelines. If a pipeline fails, it will catch the error and try the next best pipeline. This prevents a terminal error from occurring.

Where should the reviewer start?

How should this PR be tested?

Double check that the sort order is correct and it runs without issue on test data.

@nickotto nickotto merged commit 94af584 into EpistasisLab:main Nov 5, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants