Merge branch 'main' into sparse-kde

scikit-learn-contrib · Aug 7, 2024 · 1f4ea8c · 1f4ea8c
2 parents 1f28aae + 3c784c9
commit 1f4ea8c
Show file tree

Hide file tree

Showing 17 changed files with 79 additions and 68 deletions.
diff --git a/.codecov.yml b/.codecov.yml
@@ -4,9 +4,9 @@ coverage:
   status:
     project:
       default:
-        target: 90%
+        target: 95%
     patch:
       default:
-        target: 90%
+        target: 95%
 
 comment: false
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -10,10 +10,10 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
 
     - name: Set up Python
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: "3.12"
 

diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -11,10 +11,10 @@ jobs:
   build:
     runs-on: ubuntu-latest
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
 
       - name: setup Python
-        uses: actions/setup-python@v4
+        uses: actions/setup-python@v5
         with:
           python-version: "3.12"
 

diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -9,10 +9,10 @@ jobs:
     runs-on: ubuntu-latest
 
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
 
     - name: Set up Python
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: "3.12"
 

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -15,10 +15,10 @@ jobs:
         python-version: ["3.9", "3.12"]
 
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
 
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ matrix.python-version }}
 

diff --git a/README.rst b/README.rst
@@ -2,8 +2,8 @@ scikit-matter
 =============
 |tests| |codecov| |pypi| |conda| |docs| |doi|
 
-A collection of scikit-learn compatible utilities that implement methods born out of the
-materials science and chemistry communities.
+A collection of ``scikit-learn`` compatible utilities that implement methods born out of
+the materials science and chemistry communities.
 
 For details, tutorials, and examples, please have a look at our `documentation`_.
 
@@ -72,7 +72,7 @@ Writing code is not the only way to contribute to the project. You can also:
 .. _`examples and tutorials`: https://scikit-matter.readthedocs.io/en/latest/contributing.html#contributing-new-examples
 .. _`new datasets`: https://scikit-matter.readthedocs.io/en/latest/contributing.html#contributing-datasets
 
-.. marker-contributors
+.. marker-citing
 
 Citing scikit-matter
 --------------------
@@ -81,7 +81,11 @@ If you use *scikit-matter* for your work, please cite:
 Goscinski A, Principe VP, Fraux G et al. scikit-matter :
 A Suite of Generalisable Machine Learning Methods Born out of Chemistry
 and Materials Science. Open Res Europe 2023, 3:81.
-`10.12688/openreseurope.15789.2 <https://doi.org/10.12688/openreseurope.15789.2>`_
+`10.12688/openreseurope.15789.2`_
+
+.. _`10.12688/openreseurope.15789.2`: https://doi.org/10.12688/openreseurope.15789.2
+
+.. marker-contributors
 
 Contributors
 ------------
@@ -90,18 +94,17 @@ Thanks goes to all people that make scikit-matter possible:
 .. image:: https://contrib.rocks/image?repo=scikit-learn-contrib/scikit-matter
    :target: https://github.com/scikit-learn-contrib/scikit-matter/graphs/contributors
 
-.. |tests| image:: https://github.com/scikit-learn-contrib/scikit-matter/workflows/Test/badge.svg
+.. |tests| image:: https://github.com/scikit-learn-contrib/scikit-matter/workflows/Tests/badge.svg
    :alt: Github Actions Tests Job Status
-   :target: (https://github.com/scikit-learn-contrib/scikit-matter/\
-                actions?query=workflow%3ATests)
+   :target: action_
 
 .. |codecov| image:: https://codecov.io/gh/scikit-learn-contrib/scikit-matter/branch/main/graph/badge.svg?token=UZJPJG34SM
    :alt: Code coverage
    :target: https://codecov.io/gh/scikit-learn-contrib/scikit-matter/
 
 .. |docs| image:: https://img.shields.io/badge/documentation-latest-sucess
    :alt: Python
-   :target: https://scikit-matter.readthedocs.io
+   :target: documentation_
 
 .. |pypi| image:: https://img.shields.io/pypi/v/skmatter.svg
    :alt: Latest PYPI version
@@ -113,4 +116,6 @@ Thanks goes to all people that make scikit-matter possible:
 
 .. |doi| image:: https://img.shields.io/badge/DOI-10.12688-blue
    :alt: ORE Paper
-   :target: https://doi.org/10.12688/openreseurope.15789.2
+   :target: `10.12688/openreseurope.15789.2`_
+
+.. _`action`: https://github.com/scikit-learn-contrib/scikit-matter/actions?query=branch%3Amain
diff --git a/docs/src/index.rst b/docs/src/index.rst
@@ -74,6 +74,13 @@
    :start-after: marker-issues
    :end-before: marker-contributing
 
+.. include:: ../../README.rst
+   :start-after: marker-citing
+   :end-before: marker-contributors
+
+If you would like to contribute to scikit-matter, check out our :ref:`contributing`
+page!
+
 .. toctree::
   :hidden:
 
@@ -84,6 +91,3 @@
   contributing
   changelog
   bibliography
-
-If you would like to contribute to scikit-matter, check out our :ref:`contributing`
-page!
diff --git a/examples/README.rst b/examples/README.rst
@@ -1,11 +1,11 @@
 Examples
 ========
 
-For a thorough tutorial of the methods introduced in `scikit-matter`, we
+For a thorough tutorial of the methods introduced in ``scikit-matter``, we
 suggest you check out the pedagogic notebooks in our companion project
 `kernel-tutorials <https://github.com/lab-cosmo/kernel-tutorials/>`_.
 
-For running the examples locally install `scikit-matter` with the ``examples``
+For running the examples locally install ``scikit-matter`` with the ``examples``
 optional dependencies.
 
 .. code-block:: bash

diff --git a/src/skmatter/_selection.py b/src/skmatter/_selection.py
@@ -592,19 +592,17 @@ def _compute_pi(self, X, y=None):
         the squares of the first :math:`k` components of the right singular vectors
 
         .. math::
-            \\pi_j =
-            \\sum_i^k \\left(\\mathbf{U}_\\mathbf{C}\\right)_{ij}^2.
+            \pi_j = \sum_i^k \left(\mathbf{U}_\mathbf{C}\right)_{ij}^2.
 
-        where :math:`\\mathbf{C} = \\mathbf{X}^T\\mathbf{X}`.
+        where :math:`\mathbf{C} = \mathbf{X}^T\mathbf{X}`.
 
-        For sample selection, the importance score :math:`\\pi` is the sum over the
+        For sample selection, the importance score :math:`\pi` is the sum over the
         squares of the first :math:`k` components of the right singular vectors
 
         .. math::
-            \\pi_j =
-            \\sum_i^k \\left(\\mathbf{U}_\\mathbf{K}\\right)_{ij}^2.
+            \pi_j = \sum_i^k \left(\mathbf{U}_\mathbf{K}\right)_{ij}^2.
 
-        where :math:`\\mathbf{K} = \\mathbf{X}\\mathbf{X}^T`.
+        where :math:`\mathbf{K} = \mathbf{X}\mathbf{X}^T`.
 
         Parameters
         ----------
@@ -615,7 +613,7 @@ def _compute_pi(self, X, y=None):
         Returns
         -------
         pi : numpy.ndarray of (n_to_select_from_)
-            :math:`\\pi` importance for the given samples or features
+            :math:`\pi` importance for the given samples or features
         """
         svd_kwargs = dict(k=self.k, random_state=self.random_state)
         if self._axis == 0:
@@ -941,7 +939,7 @@ def get_distance(self):
 
         For sample selection, this is a row-wise Euclidean distance, which can be
         expressed in terms of the Gram matrix
-        :math:`\\mathbf{K} = \mathbf{X} \\mathbf{X} ^ T`
+        :math:`\mathbf{K} = \mathbf{X} \mathbf{X} ^ T`
 
         .. math::
             \operatorname{d}_r(i, j) = K_{ii} - 2 K_{ij} + K_{jj}.

diff --git a/src/skmatter/datasets/descr/who_dataset.rst b/src/skmatter/datasets/descr/who_dataset.rst
@@ -3,7 +3,7 @@
 WHO dataset
 ###########
 
-`who_dataset.csv` is a compilation of multiple publically-available datasets
+``who_dataset.csv`` is a compilation of multiple publically-available datasets
 through data.worldbank.org. Specifically, the following versioned datasets are used:
 
 - NY.GDP.PCAP.CD (v2_4770383) [1]_
@@ -17,7 +17,7 @@ through data.worldbank.org. Specifically, the following versioned datasets are u
 - SP.DYN.LE00.IN (v2_4770556) [9]_
 - SP.POP.TOTL (v2_4770385) [10]_
 
-where the corresponding file names are `API_{dataset}_DS2_excel_en_{version}.xls`.
+where the corresponding file names are ``API_{dataset}_DS2_excel_en_{version}.xls``.
 
 This dataset, intended only for demonstration, contains 2020 country-year pairings and
 the corresponding values above.

diff --git a/src/skmatter/decomposition/_kernel_pcovr.py b/src/skmatter/decomposition/_kernel_pcovr.py
@@ -39,7 +39,7 @@ class KernelPCovR(_BasePCA, LinearModel):
     Parameters
     ----------
     mixing : float, default=0.5
-        mixing parameter, as described in PCovR as :math:`{\\alpha}`
+        mixing parameter, as described in PCovR as :math:`{\alpha}`
     n_components : int, float or str, default=None
         Number of components to keep.
         if n_components is not set all components are kept::
@@ -64,7 +64,7 @@ class KernelPCovR(_BasePCA, LinearModel):
             run randomized SVD by the method of Halko et al.
     regressor : {instance of `sklearn.kernel_ridge.KernelRidge`, `precomputed`, None}, default=None
         The regressor to use for computing
-        the property predictions :math:`\\hat{\\mathbf{Y}}`.
+        the property predictions :math:`\hat{\mathbf{Y}}`.
         A pre-fitted regressor may be provided.
         If the regressor is not `None`, its kernel parameters
         (`kernel`, `gamma`, `degree`, `coef0`, and `kernel_params`)
@@ -112,17 +112,17 @@ class KernelPCovR(_BasePCA, LinearModel):
            pseudo-inverse of the latent-space projection, which
            can be used to contruct projectors from latent-space
     pkt_: numpy.ndarray of size :math:`({n_{samples}, n_{components}})`
-           the projector, or weights, from the input kernel :math:`\\mathbf{K}`
-           to the latent-space projection :math:`\\mathbf{T}`
+           the projector, or weights, from the input kernel :math:`\mathbf{K}`
+           to the latent-space projection :math:`\mathbf{T}`
     pky_: numpy.ndarray of size :math:`({n_{samples}, n_{properties}})`
-           the projector, or weights, from the input kernel :math:`\\mathbf{K}`
-           to the properties :math:`\\mathbf{Y}`
+           the projector, or weights, from the input kernel :math:`\mathbf{K}`
+           to the properties :math:`\mathbf{Y}`
     pty_: numpy.ndarray of size :math:`({n_{components}, n_{properties}})`
           the projector, or weights, from the latent-space projection
-          :math:`\\mathbf{T}` to the properties :math:`\\mathbf{Y}`
+          :math:`\mathbf{T}` to the properties :math:`\mathbf{Y}`
     ptx_: numpy.ndarray of size :math:`({n_{components}, n_{features}})`
          the projector, or weights, from the latent-space projection
-         :math:`\\mathbf{T}` to the feature matrix :math:`\\mathbf{X}`
+         :math:`\mathbf{T}` to the feature matrix :math:`\mathbf{X}`
     X_fit_: numpy.ndarray of shape (n_samples, n_features)
         The data used to fit the model. This attribute is used to build kernels
         from new data.
@@ -159,8 +159,8 @@ class KernelPCovR(_BasePCA, LinearModel):
            [-0.18992219,  0.82064368],
            [ 1.11923584, -1.04798016],
            [-1.5635827 ,  1.11078662]])
-    >>> print(round(kpcovr.score(X, Y), 5))
-    -0.52039
+    >>> round(kpcovr.score(X, Y), 5)
+    np.float64(-0.52039)
     """  # NoQa: E501
 
     def __init__(
@@ -246,15 +246,15 @@ def fit(self, X, Y, W=None):
 
             It is suggested that :math:`\mathbf{X}` be centered by its column-
             means and scaled. If features are related, the matrix should be scaled
-            to have unit variance, otherwise :math:`\\mathbf{X}` should be
+            to have unit variance, otherwise :math:`\mathbf{X}` should be
             scaled so that each feature has a variance of 1 / n_features.
         Y : numpy.ndarray, shape (n_samples, n_properties)
             Training data, where n_samples is the number of samples and
             n_properties is the number of properties
 
-            It is suggested that :math:`\\mathbf{X}` be centered by its column-
+            It is suggested that :math:`\mathbf{X}` be centered by its column-
             means and scaled. If features are related, the matrix should be scaled
-            to have unit variance, otherwise :math:`\\mathbf{Y}` should be
+            to have unit variance, otherwise :math:`\mathbf{Y}` should be
             scaled so that each feature has a variance of 1 / n_features.
         W : numpy.ndarray, shape (n_samples, n_properties)
             Regression weights, optional when regressor=`precomputed`. If not
@@ -420,7 +420,7 @@ def inverse_transform(self, T):
         r"""Transform input data back to its original space.
 
         .. math::
-            \mathbf{\\hat{X}} = \mathbf{T} \mathbf{P}_{TX}
+            \mathbf{\hat{X}} = \mathbf{T} \mathbf{P}_{TX}
                               = \mathbf{K} \mathbf{P}_{KT} \mathbf{P}_{TX}
 
         Similar to KPCA, the original features are not always recoverable,

diff --git a/src/skmatter/feature_selection/_base.py b/src/skmatter/feature_selection/_base.py
@@ -249,9 +249,9 @@ class CUR(_CUR):
     >>> Xr = selector.transform(X)
     >>> print(Xr.shape)
     (3, 2)
-    >>> np.round(selector.pi_, 2)  # importance scole
-    array([0.  , 0.  , 0.05])
-    >>> selector.selected_idx_  # importance scole
+    >>> np.round(selector.pi_)  # importance score
+    array([0., 0., 0.])
+    >>> selector.selected_idx_
     array([1, 0])
     """
 
@@ -332,6 +332,10 @@ class PCovCUR(_PCovCUR):
         Counter tracking the number of selections that have been made
     X_selected_ : numpy.ndarray,
         Matrix containing the selected features, for use in fitting
+    pi_ : numpy.ndarray (n_features),
+        the importance score see :func:`_compute_pi`
+    selected_idx_ : numpy.ndarray
+        indices of selected features
 
     Examples
     --------
@@ -351,9 +355,9 @@ class PCovCUR(_PCovCUR):
     >>> Xr = selector.transform(X)
     >>> print(Xr.shape)
     (3, 2)
-    >>> np.round(selector.pi_, 2)  # importance scole
-    array([0.  , 0.  , 0.05])
-    >>> selector.selected_idx_  # importance scole
+    >>> np.round(selector.pi_)  # importance score
+    array([0., 0., 0.])
+    >>> selector.selected_idx_
     array([1, 0])
     """
 

diff --git a/src/skmatter/preprocessing/_data.py b/src/skmatter/preprocessing/_data.py
@@ -225,7 +225,7 @@ class KernelNormalizer(KernelCenterer):
     where :math:`\phi` is a function mapping x to a Hilbert space.
     KernelNormalizer centers (i.e., normalize to have zero mean) the data without
     explicitly computing :math:`\phi(x)`.
-    It is equivalent to centering and scaling :math:`\\phi(x)` with
+    It is equivalent to centering and scaling :math:`\phi(x)` with
     sklearn.preprocessing.StandardScaler(with_std=False).
 
     Parameters

diff --git a/src/skmatter/sample_selection/_base.py b/src/skmatter/sample_selection/_base.py
@@ -147,7 +147,7 @@ class PCovFPS(_PCovFPS):
     ----------
     mixing: float, default=0.5
         The PCovR mixing parameter, as described in PCovR as
-        :math:`{\\alpha}`
+        :math:`{\alpha}`
     initialize: int or 'random', default=0
         Index of the first selection. If 'random', picks a random value when fit starts.
     n_to_select : int or float, default=None
@@ -350,7 +350,7 @@ class PCovCUR(_PCovCUR):
     Parameters
     ----------
     mixing: float, default=0.5
-        The PCovR mixing parameter, as described in PCovR as :math:`{\\alpha}`. Stored
+        The PCovR mixing parameter, as described in PCovR as :math:`{\alpha}`. Stored
         in :py:attr:`self.mixing`.
     recompute_every : int
         number of steps after which to recompute the pi score defaults to 1, if 0 no

diff --git a/src/skmatter/sample_selection/_voronoi_fps.py b/src/skmatter/sample_selection/_voronoi_fps.py
@@ -111,14 +111,14 @@ def score(self, X=None, y=None):
     def get_distance(self):
         r"""Traditional FPS employs a column-wise Euclidean distance for feature
         selection, which can be expressed using the covariance matrix
-        :math:`\\mathbf{C} = \\mathbf{X} ^ T \\mathbf{X}`.
+        :math:`\mathbf{C} = \mathbf{X} ^ T \mathbf{X}`.
 
         .. math::
             \operatorname{d}_c(i, j) = C_{ii} - 2 C_{ij} + C_{jj}.
 
         For sample selection, this is a row-wise Euclidean distance, which can be
         expressed in terms of the Gram matrix
-        :math:`\\mathbf{K} = \\mathbf{X} \\mathbf{X} ^ T`
+        :math:`\mathbf{K} = \mathbf{X} \mathbf{X} ^ T`
 
         .. math::
             \operatorname{d}_r(i, j) = K_{ii} - 2 K_{ij} + K_{jj}.