cross-learn

extensive scoring of crossvalidation loops.

cross-learn is an ensemble of sklearn wrappers aiming to simplify the validation of statistical learning models.
Particularly, these libraries address how the groups parameter is handled by scikit-learn, which has been bugging me for a while.
The main features I focused on are:

Cleanliness of code.
Flexibility.
Automation and completeness of models scoring.
Simplification of nested crossvalidation procedures.

The code is functionally split in 3 separate modules: crossvalidators, evaluation and transformers.

evaluation module

Contains the crossvalidate_classification and crossvalidate_regression methods, all-in-one wrappers to obtain crossvalidation and nested crossvalidation scores with any sklearn-like model or pipeline, but most importantly allows for intra-fold dependencies during crossvalidation (ie nested crossvalidation with GroupKFold or similar).

Functionally, these methods act as simple scoring tracers to ease readability of evaluation metrics.

transformers module

Revisions of some vanilla sklearn transformers with some new functionality:

DropColin: Unsupervised filtering of linearly correlated features.
DropColinCV: Crossvalidated extension of DropColin.
DropByMissingRate: Filters out features missing more than a predefined thershold.
DropByMissingRateCV: Crossvalidated extension of DropByMissingRate.

Installation Notes:

Run:

pip install "git+https://github.com/jhn-nt/cross-learn.git"

Notes

These are libraires I have been developing during the years on personal projects. After noticing I was re-writing time after time the same routines for the same problems I have decided to write them one last time for good.
Hopefully they will be of good use for others as well.

The code is fully scikit-learn compatbile and likely will see major revisions as I come up with new ideas. I have been moslty focusing on polishness and ease-of-use with a great focus on typing.

Most of all, writing these libraries has been a fantastic exercise to learn to build a cleaner and more re-usable code.

Very open to any feedback

Cheers!

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
crlearn		crlearn
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cross-learn

extensive scoring of crossvalidation loops.

evaluation module

transformers module

Installation Notes:

Notes

About

Releases

Packages

Languages

License

jhn-nt/cross-learn

Folders and files

Latest commit

History

Repository files navigation

cross-learn

extensive scoring of crossvalidation loops.

evaluation module

transformers module

Installation Notes:

Notes

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages