-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #40 from mathurinm/point2skglm
DOC update readme & doc landing page to point towards skglm
- Loading branch information
Showing
5 changed files
with
301 additions
and
242 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,136 +1,11 @@ | ||
=========== | ||
Group Lasso | ||
=========== | ||
⚠️⚠️ **Disclaimer** ⚠️⚠️: This package is no longer maintained. | ||
|
||
.. image:: https://pepy.tech/badge/group-lasso | ||
:target: https://pepy.tech/project/group-lasso | ||
:alt: PyPI Downloads | ||
If you are looking for efficient and scikit-learn-like models with group structure such **Group Lasso** and **Group Logistic Regression**, have a look at `skglm <https://github.com/scikit-learn-contrib/skglm>`_ | ||
|
||
.. image:: https://github.com/yngvem/group-lasso/workflows/Unit%20tests/badge.svg | ||
:target: https://github.com/yngvem/group-lasso | ||
|
||
.. | ||
.. image:: https://coveralls.io/repos/github/yngvem/group-lasso/badge.svg | ||
:target: https://coveralls.io/github/yngvem/group-lasso | ||
``skglm`` provides efficient and scikit-learn-compatible models with group structure such as `Group Lasso <https://contrib.scikit-learn.org/skglm/generated/skglm.GroupLasso.html#skglm.GroupLasso>`_ and Group Logistic Regression. | ||
It extends the features of ``scikit-learn`` for Generalized Linear Models by implementing a wealth of missing models. | ||
Check out the `documentation <https://contrib.scikit-learn.org/skglm/api.html>`_ for the full list of supported models, and the `Gallery of examples <https://contrib.scikit-learn.org/skglm/auto_examples/index.html>`_ to see its speed and efficiency when tackling large scale problems. | ||
|
||
.. image:: https://readthedocs.org/projects/group-lasso/badge/?version=latest | ||
:target: https://group-lasso.readthedocs.io/en/latest/?badge=latest | ||
|
||
.. image:: https://img.shields.io/pypi/l/group-lasso.svg | ||
:target: https://github.com/yngvem/group-lasso/blob/master/LICENSE | ||
|
||
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg | ||
:target: https://github.com/python/black | ||
|
||
.. image:: https://www.codefactor.io/repository/github/yngvem/group-lasso/badge | ||
:target: https://www.codefactor.io/repository/github/yngvem/group-lasso | ||
:alt: CodeFactor | ||
|
||
The group lasso [1]_ regulariser is a well known method to achieve structured | ||
sparsity in machine learning and statistics. The idea is to create | ||
non-overlapping groups of covariates, and recover regression weights in which | ||
only a sparse set of these covariate groups have non-zero components. | ||
|
||
There are several reasons for why this might be a good idea. Say for example | ||
that we have a set of sensors and each of these sensors generate five | ||
measurements. We don't want to maintain an unneccesary number of sensors. | ||
If we try normal LASSO regression, then we will get sparse components. | ||
However, these sparse components might not correspond to a sparse set of | ||
sensors, since they each generate five measurements. If we instead use group | ||
LASSO with measurements grouped by which sensor they were measured by, then | ||
we will get a sparse set of sensors. | ||
|
||
An extension of the group lasso regulariser is the sparse group lasso | ||
regulariser [2]_, which imposes both group-wise sparsity and coefficient-wise | ||
sparsity. This is done by combining the group lasso penalty with the | ||
traditional lasso penalty. In this library, I have implemented an efficient | ||
sparse group lasso solver being fully scikit-learn API compliant. | ||
|
||
------------------ | ||
About this project | ||
------------------ | ||
This project is developed by Yngve Mardal Moe and released under an MIT | ||
lisence. | ||
|
||
------------------ | ||
Installation guide | ||
------------------ | ||
Group-lasso requires Python 3.5+, numpy and scikit-learn. | ||
To install group-lasso via ``pip``, simply run the command:: | ||
|
||
pip install group-lasso | ||
|
||
Alternatively, you can manually pull this repository and run the | ||
``setup.py`` file:: | ||
|
||
git clone https://github.com/yngvem/group-lasso.git | ||
cd group-lasso | ||
python setup.py | ||
|
||
------------- | ||
Documentation | ||
------------- | ||
|
||
You can read the full documentation on | ||
`readthedocs <https://group-lasso.readthedocs.io/en/latest/maths.html>`_. | ||
|
||
-------- | ||
Examples | ||
-------- | ||
|
||
There are several examples that show usage of the library | ||
`here <https://group-lasso.readthedocs.io/en/latest/auto_examples/index.html>`_. | ||
|
||
------------ | ||
Further work | ||
------------ | ||
|
||
1. Fully test with sparse arrays and make examples | ||
2. Make it easier to work with categorical data | ||
3. Poisson regression | ||
|
||
---------------------- | ||
Implementation details | ||
---------------------- | ||
The problem is solved using the FISTA optimiser [3]_ with a gradient-based | ||
adaptive restarting scheme [4]_. No line search is currently implemented, but | ||
I hope to look at that later. | ||
|
||
Although fast, the FISTA optimiser does not achieve as low loss values as the | ||
significantly slower second order interior point methods. This might, at | ||
first glance, seem like a problem. However, it does recover the sparsity | ||
patterns of the data, which can be used to train a new model with the given | ||
subset of the features. | ||
|
||
Also, even though the FISTA optimiser is not meant for stochastic | ||
optimisation, it has to my experience not suffered a large fall in | ||
performance when the mini batch was large enough. I have therefore | ||
implemented mini-batch optimisation using FISTA, and thus been able to fit | ||
models based on data with ~500 columns and 10 000 000 rows on my moderately | ||
priced laptop. | ||
|
||
Finally, we note that since FISTA uses Nesterov acceleration, is not a | ||
descent algorithm. We can therefore not expect the loss to decrease | ||
monotonically. | ||
|
||
---------- | ||
References | ||
---------- | ||
|
||
.. [1] Yuan, M. and Lin, Y. (2006), Model selection and estimation in | ||
regression with grouped variables. Journal of the Royal Statistical | ||
Society: Series B (Statistical Methodology), 68: 49-67. | ||
doi:10.1111/j.1467-9868.2005.00532.x | ||
.. [2] Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). | ||
A sparse-group lasso. Journal of Computational and Graphical | ||
Statistics, 22(2), 231-245. | ||
.. [3] Beck, A. and Teboulle, M. (2009), A Fast Iterative | ||
Shrinkage-Thresholding Algorithm for Linear Inverse Problems. | ||
SIAM Journal on Imaging Sciences 2009 2:1, 183-202. | ||
doi:10.1137/080716542 | ||
.. [4] O’Donoghue, B. & Candès, E. (2015), Adaptive Restart for | ||
Accelerated Gradient Schemes. Found Comput Math 15: 715. | ||
doi:10.1007/s10208-013-9150- | ||
If you are looking for the ``grouplasso`` documentation, view the `old version of the README <./old_README.rst>`_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,122 +1,27 @@ | ||
Efficient Group Lasso in Python | ||
=============================== | ||
|
||
This library provides efficient computation of sparse group lasso regularise | ||
linear and logistic regression. | ||
.. Caution:: | ||
|
||
What is group lasso? | ||
-------------------- | ||
This package is no longer maintained and it is advised to use `skglm <https://github.com/scikit-learn-contrib/skglm>`_ instead. | ||
|
||
It is often the case that we have a dataset where the covariates form natural | ||
groups. These groups can represent biological function in gene expression | ||
data or maybe sensor location in climate data. We then wish to find a sparse | ||
subset of these covariate groups that describe the relationship in the data. | ||
Let us look at an example to crystalise the usefulness of this further. | ||
``skglm`` provides efficient and scikit-learn-compatible models with group structure such as `Group Lasso <https://contrib.scikit-learn.org/skglm/generated/skglm.GroupLasso.html#skglm.GroupLasso>`_ and Group Logistic Regression. | ||
It extends the features of ``scikit-learn`` for Generalized Linear Models by implementing a wealth of missing models. | ||
Check out the `documentation <https://contrib.scikit-learn.org/skglm/api.html>`_ for the full list of supported models, and the `Gallery of examples <https://contrib.scikit-learn.org/skglm/auto_examples/index.html>`_ to see its speed and efficiency when tackling large scale problems. | ||
|
||
Say that we work as data scientists for a large Norwegian food supplier and | ||
wish to make a prediction model for the amount of that will be sold based on | ||
weather data. We have weather data from cities in Norway and need to know how | ||
the fruit should be distributed across different warehouses. From each city, | ||
we have information about temperature, precipitation, wind strength, wind | ||
direction and how cloudy it is. Multiplying the number of cities with the | ||
number of covariates per city, we get 1500 different covariates in total. | ||
It is unlikely that we need all these covariates in our model, so we seek a | ||
sparse set of these to do our predictions with. | ||
|
||
Let us now assume that the weather data API that we use charge money by | ||
the number of cities we query, but the amount of information we get per | ||
city. We therefore wish to create a regression model that predicts fruit | ||
demand based on a sparse set of city observations. One way to achieve such | ||
sparsity is through the framework of group lasso regularisation [1]_. | ||
|
||
What is sparse group lasso | ||
-------------------------- | ||
Follow :ref:`this link <old_doc>` to access the documentation of the unmaintained package. | ||
|
||
The sparse group lasso regulariser [2]_ is an extension of the group lasso | ||
regulariser that also promotes parameter-wise sparsity. It is the combination | ||
of the group lasso penalty and the normal lasso penalty. If we consider the | ||
example above, then the sparse group lasso penalty will yield a sparse set | ||
of groups and also a sparse set of covariates in each selected group. An | ||
example of where this is useful is if each city query has a set price that | ||
increases based on the number of measurements we want from each city. | ||
|
||
A quick mathematical interlude | ||
------------------------------ | ||
|
||
Let us now briefly describe the mathematical problem solved in group lasso | ||
regularised machine learning problems. Originally, group lasso algorithm [1]_ | ||
was defined as regularised linear regression with the following loss function | ||
|
||
.. math:: | ||
\text{arg} \frac{1}{n} \min_{\mathbf{\beta}_g \in \mathbb{R^{d_g}}} | ||
|| \sum_{g \in \mathcal{G}} \left[\mathbf{X}_g\mathbf{\beta}_g\right] - \mathbf{y} ||_2^2 | ||
+ \lambda_1 ||\mathbf{\beta}||_1 | ||
+ \lambda_2 \sum_{g \in \mathcal{G}} \sqrt{d_g}||\mathbf{\beta}_g||_2, | ||
where :math:`\mathbf{X}_g \in \mathbb{R}^{n \times d_g}` is the data matrix | ||
corresponding to the covariates in group :math:`g`, :math:`\mathbf{\beta}_g` | ||
is the regression coefficients corresponding to group :math:`g`, | ||
:math:`\mathbf{y} \in \mathbf{R}^n` is the regression target, :math:`n` is the | ||
number of measurements, :math:`d_g` is the dimensionality of group :math:`g`, | ||
:math:`\lambda_1` is the parameter-wise regularisation penalty, | ||
:math:`\lambda_2` is the group-wise regularisation penalty and | ||
:math:`\mathcal{G}` is the set of all groups. | ||
|
||
Notice, in the equation above, that the 2-norm is *not* squared. A consequence | ||
of this is that the regulariser has a "kink" at zero, uninformative covariate | ||
groups to have zero-valued regression coefficients. Later, it has been popular | ||
to use this methodology to regularise other machine learning algorithms, such | ||
as logistic regression. The "only" thing neccesary to do this is to exchange | ||
the squared norm term, :math:`|| \sum_{g \in \mathcal{G}} \left[\mathbf{X}_g\mathbf{\beta}_g\right] - \mathbf{y} ||_2^2`, | ||
with a general loss term, :math:`L(\mathbf{\beta}; \mathbf{X}, \mathbf{y})`, | ||
where :math:`\mathbf{\beta}` and :math:`\mathbf{X}` is the concatenation | ||
of all group coefficients and group data matrices, respectively. | ||
|
||
|
||
API design | ||
---------- | ||
|
||
The ``group-lasso`` python library is modelled after the ``scikit-learn`` API | ||
and should be fully compliant with the ``scikit-learn`` ecosystem. | ||
Consequently, the ``group-lasso`` library depends on ``numpy``, ``scipy`` | ||
and ``scikit-learn``. | ||
|
||
Currently, the only supported algorithm is group-lasso regularised linear | ||
and multiple regression, which is available in the ``group_lasso.GroupLasso`` | ||
class. However, I am working on an experimental class with group lasso | ||
regularised logistic regression, which is available in the | ||
``group_lasso.LogisticGroupLasso`` class. Currently, this class only supports | ||
binary classification problems through a sigmoidal transformation, but | ||
I am working on a multiple classification algorithm with the softmax | ||
transformation. | ||
|
||
All classes in this library is implemented as both ``scikit-learn`` | ||
transformers and their regressors or classifiers (dependent on their | ||
use case). The reason for this is that to use lasso based models for | ||
variable selection, the regularisation coefficient should be quite high, | ||
resulting in sub-par performance on the actual task of interest. Therefore, | ||
it is common to first use a lasso-like algorithm to select the relevant | ||
features before using another another algorithm (say ridge regression) | ||
for the task at hand. Therefore, the ``transform`` method of | ||
``group_lasso.GroupLasso`` to remove the columns of the input dataset | ||
corresponding to zero-valued coefficients. | ||
|
||
.. it mandatory to keep the toctree here although it doesn't show up in the page, otherwise it doesn't show up in sidebar | ||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Contents: | ||
|
||
installation | ||
auto_examples/index | ||
maths | ||
api_reference | ||
|
||
|
||
References | ||
---------- | ||
.. [1] Yuan M, Lin Y. Model selection and estimation in regression with | ||
grouped variables. Journal of the Royal Statistical Society: Series B | ||
(Statistical Methodology). 2006 Feb;68(1):49-67. | ||
.. [2] Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). | ||
A sparse-group lasso. Journal of Computational and Graphical | ||
Statistics, 22(2), 231-245. | ||
:maxdepth: 2 | ||
:hidden: | ||
:includehidden: | ||
:caption: Contents: | ||
|
||
installation | ||
auto_examples/index | ||
maths | ||
api_reference |
Oops, something went wrong.