Skip to content
fernando edited this page Aug 16, 2014 · 4 revisions

Under-sampling methods.

UnderSampler

UnderSampler is an object that under-samples the majority class(es) at random with replacement.

Parameters:

  • ratio : Controls the number of new samples to draw. The number of new samples is given by int(ratio * num_minority_samples)
  • random_state : Seed for random numbers generation.

Methods:

  • fit : Find the target statistics to determine the minority class, and the number of samples in each class.
  • transform : Returns the re sampled version of the original data set (X, y) passed to fit.
  • fit_transform : Automatically performs both fit and transform.

TomekLinks

TomekLinks is an object that identifies all Tomek link between the majority and minority class and eliminates the link element that belongs to the majority class.

Parameters:

Methods:

  • fit : Find the target statistics to determine the minority class, and the number of samples in each class.
  • transform : Returns the re sampled version of the original data set (X, y) passed to fit.
  • fit_transform : Automatically performs both fit and transform.

ClusterCentroids

ClusterCentroids is an object that under-samples the majority by replacing cluster of samples by the cluster centroid of a KMeans algorithm.

(Experimental) A KMeans algorithm is fitted to the data, the number of clusters N being decided by the level of under sampling. The majority samples are then completely replaced by the set cluster centroids from KMeans.

Parameters:

  • kargs : Dictionary to pass any parameters to the scikit-learn KMeans object.
  • ratio : Controls the number of new samples to draw. The number of new samples is given by int(ratio * num_minority_samples)
  • random_state : Seed for random numbers generation.

Methods:

  • fit : Find the target statistics to determine the minority class, and the number of samples in each class.
  • transform : Returns the re sampled version of the original data set (X, y) passed to fit.
  • fit_transform : Automatically performs both fit and transform.
Clone this wiki locally