Replies: 1 comment 4 replies
-
I agree, the time complexity of oversampling techniques is somewhat unexplored. Some runtime measurements are incorporated though. There was an extensive evaluation (shared in corresponding papers), and based on the average runtimes on 104 datasets a ranking of oversampling techniques is available. For example, if one is interested in the 10 quickest techniques overall, then can query them as import smote_variants as sv
# get 10 quickest oversamplers
oversamplers = sv.get_all_oversamplers(n_quickest=10) Although this is not a true time complexity analysis, it can still be used to query computationally efficient techniques for further research or application purposes. Nevertheless, a proper time complexity analysis by varying number of majority and minority samples, features, imbalance ratios class overlap, etc. would be very useful. Regarding the noise filters, they are not intended to be primary or necessary steps for oversampling pipelines, but similar analysis on them could still be useful. |
Beta Was this translation helpful? Give feedback.
-
For Scikit Learn some have created tools for demoing latency (model fitting) against error.
The Scitime estimator is useful for some of the algorithms in Scikit-learn but not all
It would be useful to benchmark and measure the time complexity of oversamplers and see which ones are fast (or not) based on size of dataset and log-odds of majority proportions.
Beta Was this translation helpful? Give feedback.
All reactions