-
Hello everyone, I have several questions with this library, I have read the documentation. But I haven't been able to solve them. I haven't even solved it by searching the support forums. 1. Why do you support RandomForestClassifier and not DecisionTreeClassifier? If RandomForestClassifier are based on DecisionTreeClassifier. (My question is not with ulterior motives, it is just out of pure curiosity). 2. In Supported Algorithms in the "On CPU" section there is another section called "Other task" and I can see that the method train_test_split appears but only for "Only dense data is supported". What does this mean? 3. In the same section of "Other task" the method GridSearchCV. Does it mean it is not supported? For example, I have improved this code by activating Verbose with the logging library as it appears in the documentation and I know that it works because the method "train_test_split" returns:
Instead for this code: scoring = {'accuracy': make_scorer(accuracy_score),
'precision': make_scorer(precision_score, average='macro'),
'recall': make_scorer(recall_score, average='macro'),
'f1': make_scorer(f1_score, average='macro')}
# Define preprocessing for numerical variables.
numeric_transformers = [RobustScaler(copy=False), MinMaxScaler(copy=False), 'passthrough']
numeric_transformers_robust = [RobustScaler(copy=False)]
# Define preprocessing for categorical variables.
categorical_transformers = [OneHotEncoder(dtype='int8'), TargetEncoder(target_type='binary', random_state=1998)]
# Create the preprocessor that will apply the appropriate transformations to the numerical and categorical variables.
preprocessor = ColumnTransformer(
transformers=[
('feature1', 'passthrough', ['feature1']),
('feature2', 'passthrough', ['feature2']),
('feature3', 'passthrough', ['feature3']),
('feature4', 'passthrough', ['feature4']),
('feature5', 'passthrough', ['feature5']),
('feature6', 'passthrough', ['feature6']),
('feature7', 'passthrough', ['feature7']),
('feature8', 'passthrough', ['feature8'])])
# Create the pipeline that combines the preprocessor with the classifier
clf = Pipeline(steps=[('preprocessor', preprocessor),
('classifier', DecisionTreeClassifier())])
# Define the parameter grid for grid search
param_grid = {
'preprocessor__feature1': numeric_transformers_robust,
'preprocessor__feature2': numeric_transformers_robust,
'preprocessor__feature3': numeric_transformers,
'preprocessor__feature4': numeric_transformers,
'preprocessor__feature5': numeric_transformers,
'preprocessor__feature6': numeric_transformers_robust,
'preprocessor__feature7': numeric_transformers_robust,
'preprocessor__feature8': categorical_transformers,
'classifier__criterion': ['entropy'],
'classifier__splitter': ['best'],
'classifier__max_depth': [49, 52, 57, 58, 59, 61, 62, 63, 64, 65, 100],
'classifier__min_samples_split': [3, 6, 7, 8, 9, 11],
'classifier__min_samples_leaf': [1, 6, 7, 8, 9, 11],
#'classifier__max_features': ['auto', 'sqrt', 'log2', None],
'classifier__random_state': [1998]
}
# Create the GridSearchCV object with KFold
grid_search_kfold = GridSearchCV(clf, param_grid, cv=KFold(n_splits=10),
scoring=scoring ,
n_jobs=-1, verbose=1,
refit='f1')
# Fit GridSearchCV to data with KFold
grid_search_kfold.fit(X_train, y_train) When you click to execute, nothing appears from the logs, only from the library, only the verbose of GridSearchCV when entering n_jobs! =1: DetailsI am using Jupyter Notebook and I know that n_jobs is ignored by this library and I know that d4p.daalinit(n_threads) has to be used, I saw it in #1164.
4. In the System requirements and supported configurations section in the For CPU subsection says that all processors of the X86 architecture are compatible as long as they have the SSE2, SSE4.2, AVX2, AVX512 instruction set. Does this mean that an AMD processor with these instruction sets would work? Is it mandatory for an Intel processor with this set of instructions to have integrated graphics for this library to work?. I have only been able to find this information #932 Thanks. And sorry for my "Google Traductor" English. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
@gbullido
They have a differences in implementation in our variant. But possibly would be extended to decision tree although it's of lower priority as simple decision trees usually not a bottlenecks and scikit implementation is sufficient.
Sparse data is not supported for example. So current accelerated version does support only dense data
Yes, it's not supported - there is no point for accelerating this particular call - there is no compute load there. it's just used from stock scikit version.
Yes, AMD would work. Integrated graphics is required only if you would like to offload compute to graphic. For your usecases any X86 will work