For active learning regression (ALR), there are two problem settings. Supervised ALR is similar to the conventional pool based AL where the selection proceed interactively. Unsupervised ALR (passive sampling sometimes) assume we don't have any labeled instances when we select data.
Active learning for Regression: | Supervised | Unsupervised |
---|---|---|
Non-batch mode | QBC/EMCM/RSAL/GSy/iGS | P-ALICE/Gsx/iRDM |
Batch mode | EBMALR | - |
- Active learning for regression based on query by committee [2007, IDEAL]: QBC. The learner attempts to collect data that reduces variance over both the output predictions and the parameters of the model itself. (95)
- Maximizing Expected Model Change for Active Learning in Regression [2013, ICDE]: Use expected model change from gradient descend (EMCM). Choose Gradient Boost Decision Tree (GBDT) as the learner for nonlinear regression. (60)
- Kernel ridge regression with active learning for wind speed prediction [2013, Applied Energy]: RSAL. Use a residual regressor to predict the error of each unlabeled sample and selects the one with the largest error to label. (69)
- Pool-Based Sequential Active Learning for Regression [2018, IEEE transactions on neural networks and learning systems]: Reduce EBMALR to the sequential selection (non-batch). And take diversity into account when only query single instance. (12)
- Active learning for regression using greedy sampling (2019, Information Science): The first approach (GSy) selects new samples to increase the diversity in the output space (the predicted value farthest from the values of annotated instances). The second (iGS) selects new samples to increase the diversity in both input and output spaces (the predicted value farthest from the values of annotated instances,and the selected instance farthest from the labeled one.). (16)
- Active Nearest Neighbor Regression Through Delaunay Refinement [2022, ICML]: Active Nearest Neighbor Regressor (ANNR) select novel query points in a way that takes the geometry of the function graph into account.
- Black-Box Batch Active Learning for Regression [2023]
- Pool-based Active Learning in Approximate Linear Regression (2009, Machine Learning): Only for linear regression. P-ALICE (Pool-based Active Learning using the Importance-weighted least-squares learning based on Conditional Expectation of the generalization error). Estimate the label uncertainty as the weights while selecting the M samples, and builds a weighted linear regression model from them. The base learner used is an additive regression model, and the parameters are learned by importance-weighted least-squares minimization. (66)
- Active learning for regression using greedy sampling (2019, Information Science): The unsupervised approach (GSx) selects samples on the original space . (16)
- Pool-Based Unsupervised Active Learning for Regression Using Iterative Representativeness-Diversity Maximization (iRDM) [2020, Arxiv]: Unsupervised ALR is to actively select instances once for all. Select the samples to label without knowing any true label information at the beginning. Very good comparison of previous works. (0)
- Regression tree‑based active learning [2023, DMKD]
- Offline EEG-based driver drowsiness estimation using enhanced batch-mode active learning (EBMAL) for regression [2016, IEEE International Conference on Systems, Man, and Cybernetics (SMC)]: Consider informativeness, representativeness and diversity. The diversity was achieved by using k-means after the pre-selection with conventional AL strategy. QBC and EMCM(Expected Model Change Maximization) as based AL strategy. EBMALR. (23)
- A Framework and Benchmark for Deep Batch Active Learning for Regression [2022]