Skip to content

jhn-nt/cross-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cross-learn

extensive scoring of crossvalidation loops.

version python

cross-learn is an ensemble of sklearn wrappers aiming to simplify the validation of statistical learning models.
Particularly, these libraries address how the groups parameter is handled by scikit-learn, which has been bugging me for a while.
The main features I focused on are:

  • Cleanliness of code.
  • Flexibility.
  • Automation and completeness of models scoring.
  • Simplification of nested crossvalidation procedures.

The code is functionally split in 3 separate modules: crossvalidators, evaluation and transformers.

evaluation module

Contains the crossvalidate_classification and crossvalidate_regression methods, all-in-one wrappers to obtain crossvalidation and nested crossvalidation scores with any sklearn-like model or pipeline, but most importantly allows for intra-fold dependencies during crossvalidation (ie nested crossvalidation with GroupKFold or similar).

Functionally, these methods act as simple scoring tracers to ease readability of evaluation metrics.

transformers module

Revisions of some vanilla sklearn transformers with some new functionality:

  • DropColin: Unsupervised filtering of linearly correlated features.
  • DropColinCV: Crossvalidated extension of DropColin.
  • DropByMissingRate: Filters out features missing more than a predefined thershold.
  • DropByMissingRateCV: Crossvalidated extension of DropByMissingRate.

Installation Notes:

Run:

pip install "git+https://github.com/jhn-nt/cross-learn.git"

Notes

These are libraires I have been developing during the years on personal projects. After noticing I was re-writing time after time the same routines for the same problems I have decided to write them one last time for good.
Hopefully they will be of good use for others as well.

The code is fully scikit-learn compatbile and likely will see major revisions as I come up with new ideas. I have been moslty focusing on polishness and ease-of-use with a great focus on typing.

Most of all, writing these libraries has been a fantastic exercise to learn to build a cleaner and more re-usable code.

Very open to any feedback

Cheers!

Releases

No releases published

Packages

No packages published

Languages