Actualized the set of candidate models for composing #250

nicl-nno · 2021-03-15T15:46:01Z

Some model (e.g. LDA/QDA, SVC, etc) looks redunant since they are rare candidates in optimizer chains.
The best practices should be derived from state-of-the-art frameworks (TPOT, etc) to remove non-effective and add new models.

The 'old' models can be saved as an alternative variant of repository.json

nicl-nno · 2021-04-20T14:51:19Z

The following list can be used as initial assumption:
https://github.com/Ennosigaeon/dswizard/blob/master/ida/appendix.pdf

kasyanovse · 2023-09-28T12:19:48Z

Текущий список моделей (неполный):

Регрессия

XGBRegressor
AdaBoostRegressor
GradientBoostingRegressor
DecisionTreeRegressor
ExtraTreesRegressor
RandomForestRegressor
SklearnLinReg
SklearnRidgeReg
SklearnLassoReg
SklearnSVR
SklearnSGD
LGBMRegressor
CatBoostRegressor
LinearRegRANSACImplementation
NonLinearRegRANSACImplementation
LinearRegFSImplementation
NonLinearRegFSImplementation
DecomposerRegImplementation
IsolationForestRegImplementation
FedotKnnRegImplementation

Классификация

XGBClassifier
SklearnLogReg
SklearnBernoulliNB
SklearnMultinomialNB
DecisionTreeClassifier
RandomForestClassifier
MLPClassifier
LGBMClassifier
CatBoostClassifier
LDAImplementation
QDAImplementation
FedotSVCImplementation
FedotCNNImplementation
FedotKnnClassImplementation

Кластеризация

SklearnKmeans

Предобработка

ScalingImplementation
NormalizationImplementation
ImputationImplementation
PCAImplementation
KernelPCAImplementation
PolyFeaturesImplementation
OneHotEncodingImplementation
LabelEncodingImplementation
FastICAImplementation

Работа с текстом

TfidfVectorizer
CountVectorizer
TextCleanImplementation
PretrainedEmbeddingsImplementation

Работа с временными рядами

ARIMAImplementation
AutoRegImplementation
STLForecastARIMAImplementation
ExpSmoothingImplementation
CGRUImplementation
PolyfitImplementation
GLMImplementation
RepeatLastValueImplementation
NaiveAverageForecastImplementation
LaggedTransformationImplementation
SparseLaggedTransformationImplementation
TsSmoothingImplementation
ExogDataTransformationImplementation
GaussianFilterImplementation
NumericalDerivativeFilterImplementation
CutImplementation

kasyanovse · 2023-09-28T12:33:57Z

Предлагаю уменьшить количество моделей для композиции. Основная мотивация - сократить пространство поиска оптимального пайплайна. Логика устранения: тяжелые или дублирующие друг друга модели.

Для регрессии:
1. SklearnLinReg - отлично заменяется SklearnRidgeReg
2. SklearnSVR - слишком задачезависимо
3. CatBoostRegressor - уже отключен
4. XGBRegressor - уже отключен
5. ExtraTreesRegressor - по сути тот же RandomForestRegressor
6. DecisionTreeRegressor - RandomForestRegressor более эффективен
7. GradientBoostingRegressor - LGBM значительно быстрее, а реализует тот же принцип
8. AdaBoostRegressor - еще один небыстрый бустинг на деревьях. Можно заменить на LGBM. (а точно ли?)
9. NonLinearRegRANSACImplementation - небыстрый алгоритм.
Для временных рядов:
1. ARIMAImplementation - сложно настраивать, эффективна только для ограниченного круга рядов
2. CGRUImplementation - ресурсозатратно при невысокой эффективности
3. SparseLaggedTransformationImplementation - можно сделать частью LaggedImplementation
4. CutImplementation - очень специфический инструмент, который лучше всего использовать при ручном конструировании пайплайнов
5. TsSmoothingImplementation, GaussianFilterImplementation - лучше всего реализовать один класс фильтров с параметром, регулирующим тип фильтрации
6. NumericalDerivativeFilterImplementation - медленный, редко бывает полезен

kasyanovse · 2023-09-28T12:37:30Z

The best practices should be derived from state-of-the-art frameworks (TPOT, etc) to remove non-effective and add new models.

Я глянул списки моделей в TPOT и AutoGluon. Там очень много моделей, однако не понятно, все ли из них используются при подборе решения.

kasyanovse · 2023-11-21T08:42:29Z

Наиболее оптимальный вариант, кмк, создать укрупненные operations, отвечающие за классы моделей. Например, operation, отвечающие за леса, за градиентные бустинги, за линейные модели, и т.д.. На этапе композиции использовать эти operations, а на этапе тюнинга выбирать конкретную модель. Так получится использовать большой пул моделей без излишнего раздувания пространства поиска оптимальной структуры пайплайна.

kasyanovse · 2023-12-25T14:06:10Z

Linked: #1217

nicl-nno · 2024-10-11T18:36:54Z

Moved to #1339.

kasyanovse self-assigned this Sep 28, 2023

kasyanovse removed their assignment Dec 25, 2023

nicl-nno mentioned this issue Oct 11, 2024

enh: Rework the list of tabular models and operations #1339

Open

nicl-nno closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Actualized the set of candidate models for composing #250

Actualized the set of candidate models for composing #250

nicl-nno commented Mar 15, 2021

nicl-nno commented Apr 20, 2021

kasyanovse commented Sep 28, 2023 •

edited

Loading

kasyanovse commented Sep 28, 2023 •

edited

Loading

kasyanovse commented Sep 28, 2023

kasyanovse commented Nov 21, 2023

kasyanovse commented Dec 25, 2023

nicl-nno commented Oct 11, 2024

Actualized the set of candidate models for composing #250

Actualized the set of candidate models for composing #250

Comments

nicl-nno commented Mar 15, 2021

nicl-nno commented Apr 20, 2021

kasyanovse commented Sep 28, 2023 • edited Loading

kasyanovse commented Sep 28, 2023 • edited Loading

kasyanovse commented Sep 28, 2023

kasyanovse commented Nov 21, 2023

kasyanovse commented Dec 25, 2023

nicl-nno commented Oct 11, 2024

kasyanovse commented Sep 28, 2023 •

edited

Loading

kasyanovse commented Sep 28, 2023 •

edited

Loading