Skip to content

Commit

Permalink
Merge branch 'main' into sparse-kde
Browse files Browse the repository at this point in the history
  • Loading branch information
GardevoirX committed Aug 7, 2024
2 parents 1f28aae + 3c784c9 commit 1f4ea8c
Show file tree
Hide file tree
Showing 17 changed files with 79 additions and 68 deletions.
4 changes: 2 additions & 2 deletions .codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ coverage:
status:
project:
default:
target: 90%
target: 95%
patch:
default:
target: 90%
target: 95%

comment: false
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.12"

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.12"

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: "3.12"

Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,10 @@ jobs:
python-version: ["3.9", "3.12"]

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

Expand Down
23 changes: 14 additions & 9 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ scikit-matter
=============
|tests| |codecov| |pypi| |conda| |docs| |doi|

A collection of scikit-learn compatible utilities that implement methods born out of the
materials science and chemistry communities.
A collection of ``scikit-learn`` compatible utilities that implement methods born out of
the materials science and chemistry communities.

For details, tutorials, and examples, please have a look at our `documentation`_.

Expand Down Expand Up @@ -72,7 +72,7 @@ Writing code is not the only way to contribute to the project. You can also:
.. _`examples and tutorials`: https://scikit-matter.readthedocs.io/en/latest/contributing.html#contributing-new-examples
.. _`new datasets`: https://scikit-matter.readthedocs.io/en/latest/contributing.html#contributing-datasets

.. marker-contributors
.. marker-citing
Citing scikit-matter
--------------------
Expand All @@ -81,7 +81,11 @@ If you use *scikit-matter* for your work, please cite:
Goscinski A, Principe VP, Fraux G et al. scikit-matter :
A Suite of Generalisable Machine Learning Methods Born out of Chemistry
and Materials Science. Open Res Europe 2023, 3:81.
`10.12688/openreseurope.15789.2 <https://doi.org/10.12688/openreseurope.15789.2>`_
`10.12688/openreseurope.15789.2`_

.. _`10.12688/openreseurope.15789.2`: https://doi.org/10.12688/openreseurope.15789.2

.. marker-contributors
Contributors
------------
Expand All @@ -90,18 +94,17 @@ Thanks goes to all people that make scikit-matter possible:
.. image:: https://contrib.rocks/image?repo=scikit-learn-contrib/scikit-matter
:target: https://github.com/scikit-learn-contrib/scikit-matter/graphs/contributors

.. |tests| image:: https://github.com/scikit-learn-contrib/scikit-matter/workflows/Test/badge.svg
.. |tests| image:: https://github.com/scikit-learn-contrib/scikit-matter/workflows/Tests/badge.svg
:alt: Github Actions Tests Job Status
:target: (https://github.com/scikit-learn-contrib/scikit-matter/\
actions?query=workflow%3ATests)
:target: action_

.. |codecov| image:: https://codecov.io/gh/scikit-learn-contrib/scikit-matter/branch/main/graph/badge.svg?token=UZJPJG34SM
:alt: Code coverage
:target: https://codecov.io/gh/scikit-learn-contrib/scikit-matter/

.. |docs| image:: https://img.shields.io/badge/documentation-latest-sucess
:alt: Python
:target: https://scikit-matter.readthedocs.io
:target: documentation_

.. |pypi| image:: https://img.shields.io/pypi/v/skmatter.svg
:alt: Latest PYPI version
Expand All @@ -113,4 +116,6 @@ Thanks goes to all people that make scikit-matter possible:

.. |doi| image:: https://img.shields.io/badge/DOI-10.12688-blue
:alt: ORE Paper
:target: https://doi.org/10.12688/openreseurope.15789.2
:target: `10.12688/openreseurope.15789.2`_

.. _`action`: https://github.com/scikit-learn-contrib/scikit-matter/actions?query=branch%3Amain
10 changes: 7 additions & 3 deletions docs/src/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,13 @@
:start-after: marker-issues
:end-before: marker-contributing

.. include:: ../../README.rst
:start-after: marker-citing
:end-before: marker-contributors

If you would like to contribute to scikit-matter, check out our :ref:`contributing`
page!

.. toctree::
:hidden:

Expand All @@ -84,6 +91,3 @@
contributing
changelog
bibliography

If you would like to contribute to scikit-matter, check out our :ref:`contributing`
page!
4 changes: 2 additions & 2 deletions examples/README.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Examples
========

For a thorough tutorial of the methods introduced in `scikit-matter`, we
For a thorough tutorial of the methods introduced in ``scikit-matter``, we
suggest you check out the pedagogic notebooks in our companion project
`kernel-tutorials <https://github.com/lab-cosmo/kernel-tutorials/>`_.

For running the examples locally install `scikit-matter` with the ``examples``
For running the examples locally install ``scikit-matter`` with the ``examples``
optional dependencies.

.. code-block:: bash
Expand Down
16 changes: 7 additions & 9 deletions src/skmatter/_selection.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,19 +592,17 @@ def _compute_pi(self, X, y=None):
the squares of the first :math:`k` components of the right singular vectors
.. math::
\\pi_j =
\\sum_i^k \\left(\\mathbf{U}_\\mathbf{C}\\right)_{ij}^2.
\pi_j = \sum_i^k \left(\mathbf{U}_\mathbf{C}\right)_{ij}^2.
where :math:`\\mathbf{C} = \\mathbf{X}^T\\mathbf{X}`.
where :math:`\mathbf{C} = \mathbf{X}^T\mathbf{X}`.
For sample selection, the importance score :math:`\\pi` is the sum over the
For sample selection, the importance score :math:`\pi` is the sum over the
squares of the first :math:`k` components of the right singular vectors
.. math::
\\pi_j =
\\sum_i^k \\left(\\mathbf{U}_\\mathbf{K}\\right)_{ij}^2.
\pi_j = \sum_i^k \left(\mathbf{U}_\mathbf{K}\right)_{ij}^2.
where :math:`\\mathbf{K} = \\mathbf{X}\\mathbf{X}^T`.
where :math:`\mathbf{K} = \mathbf{X}\mathbf{X}^T`.
Parameters
----------
Expand All @@ -615,7 +613,7 @@ def _compute_pi(self, X, y=None):
Returns
-------
pi : numpy.ndarray of (n_to_select_from_)
:math:`\\pi` importance for the given samples or features
:math:`\pi` importance for the given samples or features
"""
svd_kwargs = dict(k=self.k, random_state=self.random_state)
if self._axis == 0:
Expand Down Expand Up @@ -941,7 +939,7 @@ def get_distance(self):
For sample selection, this is a row-wise Euclidean distance, which can be
expressed in terms of the Gram matrix
:math:`\\mathbf{K} = \mathbf{X} \\mathbf{X} ^ T`
:math:`\mathbf{K} = \mathbf{X} \mathbf{X} ^ T`
.. math::
\operatorname{d}_r(i, j) = K_{ii} - 2 K_{ij} + K_{jj}.
Expand Down
4 changes: 2 additions & 2 deletions src/skmatter/datasets/descr/who_dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
WHO dataset
###########

`who_dataset.csv` is a compilation of multiple publically-available datasets
``who_dataset.csv`` is a compilation of multiple publically-available datasets
through data.worldbank.org. Specifically, the following versioned datasets are used:

- NY.GDP.PCAP.CD (v2_4770383) [1]_
Expand All @@ -17,7 +17,7 @@ through data.worldbank.org. Specifically, the following versioned datasets are u
- SP.DYN.LE00.IN (v2_4770556) [9]_
- SP.POP.TOTL (v2_4770385) [10]_

where the corresponding file names are `API_{dataset}_DS2_excel_en_{version}.xls`.
where the corresponding file names are ``API_{dataset}_DS2_excel_en_{version}.xls``.

This dataset, intended only for demonstration, contains 2020 country-year pairings and
the corresponding values above.
Expand Down
28 changes: 14 additions & 14 deletions src/skmatter/decomposition/_kernel_pcovr.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ class KernelPCovR(_BasePCA, LinearModel):
Parameters
----------
mixing : float, default=0.5
mixing parameter, as described in PCovR as :math:`{\\alpha}`
mixing parameter, as described in PCovR as :math:`{\alpha}`
n_components : int, float or str, default=None
Number of components to keep.
if n_components is not set all components are kept::
Expand All @@ -64,7 +64,7 @@ class KernelPCovR(_BasePCA, LinearModel):
run randomized SVD by the method of Halko et al.
regressor : {instance of `sklearn.kernel_ridge.KernelRidge`, `precomputed`, None}, default=None
The regressor to use for computing
the property predictions :math:`\\hat{\\mathbf{Y}}`.
the property predictions :math:`\hat{\mathbf{Y}}`.
A pre-fitted regressor may be provided.
If the regressor is not `None`, its kernel parameters
(`kernel`, `gamma`, `degree`, `coef0`, and `kernel_params`)
Expand Down Expand Up @@ -112,17 +112,17 @@ class KernelPCovR(_BasePCA, LinearModel):
pseudo-inverse of the latent-space projection, which
can be used to contruct projectors from latent-space
pkt_: numpy.ndarray of size :math:`({n_{samples}, n_{components}})`
the projector, or weights, from the input kernel :math:`\\mathbf{K}`
to the latent-space projection :math:`\\mathbf{T}`
the projector, or weights, from the input kernel :math:`\mathbf{K}`
to the latent-space projection :math:`\mathbf{T}`
pky_: numpy.ndarray of size :math:`({n_{samples}, n_{properties}})`
the projector, or weights, from the input kernel :math:`\\mathbf{K}`
to the properties :math:`\\mathbf{Y}`
the projector, or weights, from the input kernel :math:`\mathbf{K}`
to the properties :math:`\mathbf{Y}`
pty_: numpy.ndarray of size :math:`({n_{components}, n_{properties}})`
the projector, or weights, from the latent-space projection
:math:`\\mathbf{T}` to the properties :math:`\\mathbf{Y}`
:math:`\mathbf{T}` to the properties :math:`\mathbf{Y}`
ptx_: numpy.ndarray of size :math:`({n_{components}, n_{features}})`
the projector, or weights, from the latent-space projection
:math:`\\mathbf{T}` to the feature matrix :math:`\\mathbf{X}`
:math:`\mathbf{T}` to the feature matrix :math:`\mathbf{X}`
X_fit_: numpy.ndarray of shape (n_samples, n_features)
The data used to fit the model. This attribute is used to build kernels
from new data.
Expand Down Expand Up @@ -159,8 +159,8 @@ class KernelPCovR(_BasePCA, LinearModel):
[-0.18992219, 0.82064368],
[ 1.11923584, -1.04798016],
[-1.5635827 , 1.11078662]])
>>> print(round(kpcovr.score(X, Y), 5))
-0.52039
>>> round(kpcovr.score(X, Y), 5)
np.float64(-0.52039)
""" # NoQa: E501

def __init__(
Expand Down Expand Up @@ -246,15 +246,15 @@ def fit(self, X, Y, W=None):
It is suggested that :math:`\mathbf{X}` be centered by its column-
means and scaled. If features are related, the matrix should be scaled
to have unit variance, otherwise :math:`\\mathbf{X}` should be
to have unit variance, otherwise :math:`\mathbf{X}` should be
scaled so that each feature has a variance of 1 / n_features.
Y : numpy.ndarray, shape (n_samples, n_properties)
Training data, where n_samples is the number of samples and
n_properties is the number of properties
It is suggested that :math:`\\mathbf{X}` be centered by its column-
It is suggested that :math:`\mathbf{X}` be centered by its column-
means and scaled. If features are related, the matrix should be scaled
to have unit variance, otherwise :math:`\\mathbf{Y}` should be
to have unit variance, otherwise :math:`\mathbf{Y}` should be
scaled so that each feature has a variance of 1 / n_features.
W : numpy.ndarray, shape (n_samples, n_properties)
Regression weights, optional when regressor=`precomputed`. If not
Expand Down Expand Up @@ -420,7 +420,7 @@ def inverse_transform(self, T):
r"""Transform input data back to its original space.
.. math::
\mathbf{\\hat{X}} = \mathbf{T} \mathbf{P}_{TX}
\mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX}
= \mathbf{K} \mathbf{P}_{KT} \mathbf{P}_{TX}
Similar to KPCA, the original features are not always recoverable,
Expand Down
16 changes: 10 additions & 6 deletions src/skmatter/feature_selection/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -249,9 +249,9 @@ class CUR(_CUR):
>>> Xr = selector.transform(X)
>>> print(Xr.shape)
(3, 2)
>>> np.round(selector.pi_, 2) # importance scole
array([0. , 0. , 0.05])
>>> selector.selected_idx_ # importance scole
>>> np.round(selector.pi_) # importance score
array([0., 0., 0.])
>>> selector.selected_idx_
array([1, 0])
"""

Expand Down Expand Up @@ -332,6 +332,10 @@ class PCovCUR(_PCovCUR):
Counter tracking the number of selections that have been made
X_selected_ : numpy.ndarray,
Matrix containing the selected features, for use in fitting
pi_ : numpy.ndarray (n_features),
the importance score see :func:`_compute_pi`
selected_idx_ : numpy.ndarray
indices of selected features
Examples
--------
Expand All @@ -351,9 +355,9 @@ class PCovCUR(_PCovCUR):
>>> Xr = selector.transform(X)
>>> print(Xr.shape)
(3, 2)
>>> np.round(selector.pi_, 2) # importance scole
array([0. , 0. , 0.05])
>>> selector.selected_idx_ # importance scole
>>> np.round(selector.pi_) # importance score
array([0., 0., 0.])
>>> selector.selected_idx_
array([1, 0])
"""

Expand Down
2 changes: 1 addition & 1 deletion src/skmatter/preprocessing/_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ class KernelNormalizer(KernelCenterer):
where :math:`\phi` is a function mapping x to a Hilbert space.
KernelNormalizer centers (i.e., normalize to have zero mean) the data without
explicitly computing :math:`\phi(x)`.
It is equivalent to centering and scaling :math:`\\phi(x)` with
It is equivalent to centering and scaling :math:`\phi(x)` with
sklearn.preprocessing.StandardScaler(with_std=False).
Parameters
Expand Down
4 changes: 2 additions & 2 deletions src/skmatter/sample_selection/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ class PCovFPS(_PCovFPS):
----------
mixing: float, default=0.5
The PCovR mixing parameter, as described in PCovR as
:math:`{\\alpha}`
:math:`{\alpha}`
initialize: int or 'random', default=0
Index of the first selection. If 'random', picks a random value when fit starts.
n_to_select : int or float, default=None
Expand Down Expand Up @@ -350,7 +350,7 @@ class PCovCUR(_PCovCUR):
Parameters
----------
mixing: float, default=0.5
The PCovR mixing parameter, as described in PCovR as :math:`{\\alpha}`. Stored
The PCovR mixing parameter, as described in PCovR as :math:`{\alpha}`. Stored
in :py:attr:`self.mixing`.
recompute_every : int
number of steps after which to recompute the pi score defaults to 1, if 0 no
Expand Down
4 changes: 2 additions & 2 deletions src/skmatter/sample_selection/_voronoi_fps.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,14 +111,14 @@ def score(self, X=None, y=None):
def get_distance(self):
r"""Traditional FPS employs a column-wise Euclidean distance for feature
selection, which can be expressed using the covariance matrix
:math:`\\mathbf{C} = \\mathbf{X} ^ T \\mathbf{X}`.
:math:`\mathbf{C} = \mathbf{X} ^ T \mathbf{X}`.
.. math::
\operatorname{d}_c(i, j) = C_{ii} - 2 C_{ij} + C_{jj}.
For sample selection, this is a row-wise Euclidean distance, which can be
expressed in terms of the Gram matrix
:math:`\\mathbf{K} = \\mathbf{X} \\mathbf{X} ^ T`
:math:`\mathbf{K} = \mathbf{X} \mathbf{X} ^ T`
.. math::
\operatorname{d}_r(i, j) = K_{ii} - 2 K_{ij} + K_{jj}.
Expand Down
Loading

0 comments on commit 1f4ea8c

Please sign in to comment.