Skip to content

Latest commit

 

History

History
99 lines (73 loc) · 6.12 KB

paper.md

File metadata and controls

99 lines (73 loc) · 6.12 KB
title tags authors affiliations date bibliography
xspline: A Python Package for Flexible Spline Modeling
Python
Splines
Derivatives
Integrals
Flexible Extraploation
Design matrix
name orcid affiliation
Peng Zheng
0000-0003-3313-215X
1
name orcid affiliation
Kelsey Maass
0000-0002-9534-8901
1
name orcid affiliation
Aleksandr Aravkin
0000-0002-1875-1801
1, 2
name index
Department of Health Metrics Sciences, University of Washington
1
name index
Department of Applied Mathematics, University of Washington
2
02.22.2024
paper.bib

Summary

Splines are a fundamental tool for describing and estimating nonlinear relationships [@de1978practical]. They allow nonlinear functions to be represented as linear combinations of spline basis elements. Researchers in physical, biological, and health sciences rely on spline models in conjunction with statistical software packages to fit and describe a vast range of nonlinear relationships.

A wide range of tools and packages exist to support modeling with splines. These tools include

Several important gaps remain in python packages for spline modeling. xspline is not a comprehensive tool that generalizes existing software. Instead, it provides key functionality that undergirds flexible interpolation and fitting, closing existing gaps in the available tools. xspline is currently widely used in global health applications [@murray2020global], undergidring the majority of spline modeling at the Institute of Health metrics and Evaluation (IHME).

Statement of Need

Current spline packages offer broad functionality in spline fitting, including:

  • Manipulating and estimating curves (scipy, splines), surfaces and volumes (splipy, pySpline)
  • Numerical derivatives (splipy, splines, scipy, pyspline, splinter)
  • Interpolation (splipy, splines, scipy, pyspline, splinter)
  • Spline derivatives, antiderivaties and numerical integrals (scipy)
  • Extrapolation (scipy, limited)

From this list, its apparent that scipy offers the most comprehensive features related to derivaties, integrals, and extrapolation. However, key limitations remain. First, while scipy provides derivative and anti-derivative spline objects, it still evaluates definite integrals numerically. In addition, while the first and last segments of the b-spline in scipy can be extrapolated, there is no option for the user to extrapolate a simpler functional form, e.g. a quadratic polynomial given a cubic spline.

This functionality is essential to risk modeling. For example, data reported by all studies focusing on risk-outcome pairs are ratios of definite integrals across different exposure intervals. Prior packages do not offer a direct way to fit spline functions to these nonlinear data, because they do not provide definite integrals of splines as spline objects. Spline derivatives are also needed to impose shape constraints on risk curves of interest. Finally, extrapolations are often required to areas with little to no data, while maintaining high-fidelity fits for regions with dense data. Theoretically, it is straightforward to extrapolate any fit of degree less than or equal to the degree of the ultimate segments (for example, using slope matching for first order, slope and curvature for second order, etc.) However, this functinoality is not available in other packages.

Core idea and structure of xspline

The main idea of xspline is to provide a python class that allows user to interact with basis splines, their derivatives and integrals and extrapolation options more easily.

The computation of splines is based on basis splines (B-splines), see [@de1978practical] for a canonical reference. Using this reference, we derived recursive relationships to compute both derivatives and definite integrals from recursive splie relationships.

To support the spline basis computation, we also created modules that provide a convenient interface with indicator and polynomial functions, and their derivatives and definite integrals of any order. All of these useful functions are bundled into a main interface class called XFunction, which allows the user to call the function with a specified order, where positive order represents derivatives and negative order represents definite integrals.

We also allow user to specify the way they want to extrapolate by matching the smoothness at the end knots. This is achieved by a class method of XFunction called append that will slice two instances of XFunction together.

With all of the above features, we created a easy to use spline package for statistical model building, which has been widely used in global health statistical analysis, see references below. For more examples please check here.

More information about the structure of the library can be found in documentation, while the mathematical use cases are extensively discussed in [@zheng2021trimmed] and [@zheng2022burden] in the context of fitting nonlinear dose-response relationships.

Ongoing Research and Dissemination

The xspline package is widely used in all spline modeling done at IHME. In paricular, the new functionality described above enabled a new set of dose-response analyses recently published by the institue, including analyses of chewing tobacco [@gil2024health], education [@balaj2024effects], second-hand smoke [@flor2024health], intimate partner violence [@spencer2023health], smoking [@dai2022health], blood pressure [@razo2022effects], vegetable consumption [@stanaway2022health], and red meat consumption [@lescinsky2022health]. The results of all of these analyses are now publicly available at https://vizhub.healthdata.org/burden-of-proof/.

References