English | 简体中文
pip install nopapy
import numpy as np
import nopapy as npp
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
ypred = npp.NWEstimate(x, y, 3.5) # supposed to be 7
print(ypred) # 6.910633194984344
You can experiment with various nonparametric methods...
- Kernel: Supports multiple built-in kernel functions and allows customization of kernel functions.
- Estimate: Includes various non-parametric estimation methods, allowing customization of kernel functions, smooth bandwidth, and order.
- Regression: Provides multiple smoothing methods for batch prediction.
- Scikit_like: Supports the generation of regression objects with specific parameters like scikit-learn and PyTorch.
First, you need to import some necessary libraries, including numpy and nopapy. Here, we only need GaussianKernel and LPEstimate. If you also want to plot, you need to import matplotlib.
To reproduce the results of this case, we recommend setting the seed and the style of matplotlib.
import numpy as np
from nopapy import GaussianKernel, LPEstimate
import matplotlib.pyplot as plt
np.random.seed(0)
plt.style.use('seaborn') # do not delete this, plz:)
plt.style.use('tableau-colorblind10')
The second step is to construct a function:
def f(x):
return 2 * np.sin(x) + x ** 0.8
sample_number = 50
x = np.sort(np.random.rand(sample_number) * 5)
y = f(x) + np.random.normal(loc=0, scale=0.3, size=(sample_number,))
xs0 = np.arange(0, 5.01, 0.01)
Now we can start fitting. Here, we use GaussianKernel and LPEstimate, and set the order of LPEstimate to 2. In fact, you can freely choose the kernel function and regression method. The only difference is that other regression methods do not need the parameter p.
kernel = GaussianKernel; h = 1; method = LPEstimate; p = 2
y_pred = [method(x, y, x0, h, k=kernel, p=p) for x0 in xs0]
The final step is to plot the result. The plotted result includes the sample points
plt.scatter(x, y, alpha=0.5, label='sample', c='darkorange')
plt.plot(xs0, f(xs0), '--', label='truth')
plt.plot(xs0, y_pred, label='pred')
plt.title('h={}, Kernel={}'.format(h, kernel.__name__))
plt.legend(loc='best', frameon=True, framealpha=1, shadow=True)
plt.show()
# plt.savefig('./LPR.svg', dpi=2000)
The plotted result should look like the following figure. You can adjust the bandwidth, change the kernel function, change the regression method, and even set the order to observe the differences in fitting effect.
Additionally, in the above example, we passed all possible parameters, but many of them have default values. The only required parameters that need to be passed are x, y, and x0.
For instance:
y_pred = [method(x, y, x0) for x0 in xs0]
. The Regression method we are going to introduce next has the same property. You can omit unnecessary parameters just like in the previous example.
The other steps are almost the same, but the Regression method is more powerful in that it can predict a series of target points simultaneously.
import numpy as np
from nopapy import GaussianKernel, LPRegression
import matplotlib.pyplot as plt
np.random.seed(0)
plt.style.use('seaborn') # do not delete this, plz:)
plt.style.use('tableau-colorblind10')
def f(x):
return 2 * np.sin(x) + x ** 0.8
sample_number = 50
x = np.sort(np.random.rand(sample_number) * 5)
y = f(x) + np.random.normal(loc=0, scale=0.3, size=(sample_number,))
xs0 = np.arange(0, 5.01, 0.01)
kernel = GaussianKernel; h = 1; method = LPRegression; p = 2
y_pred = LPRegression(x, y, xs0, h, k=kernel, p=p)
plt.scatter(x, y, alpha=0.5, label='sample', c='darkorange')
plt.plot(xs0, f(xs0), '--', label='truth')
plt.plot(xs0, y_pred, label='pred')
plt.title('h={}, Kernel={}'.format(h, kernel.__name__))
plt.legend(loc='best', frameon=True, framealpha=1, shadow=True)
plt.show()
# plt.savefig('./LPR.svg', dpi=2000)
You can also use Regression just like using Estimate to predict a single point, which won't cause any correctness issues in the code. However, we strongly discourage this approach as it may lead to some performance issues. If you have to do this, you will receive a warning:
UserWarning: LPRegression() expects xs0 to be np.ndarray or list.
Your code won't make any errors, but we still recommend that you use LPEstimate() instead.
warnings.warn(warn_msg)
You may want to use non-parametric regression for extrapolation prediction, but this is not a typical application scenario of non-parametric regression. Our code does not prohibit this behavior, because we are not sure if you really want to do this, for example:
It can be seen that except for GaussianKernel, all other regression methods are invalid because they indirectly use the indicator function
We provide multiple built-in kernel functions that can be directly used in the corresponding estimation and regression methods. They can also be used separately, which may be helpful for your research. You can use the code npp.kernel.kernel.all to view all kernel functions. Their usage is relatively flexible, and you can freely choose, for example, passing a scalar or an array to it.
from nopapy import GaussianKernel
x = np.arange(-3, 3, 0.01)
y = GaussianKernel(x)
plt.plot(x, y)
plt.show()
At the same time, we also support you to customize kernel functions. Kernel functions are a type of functions with special mathematical properties. Generally, they need to satisfy the following properties:
- Non-negativity:
$f(x) \geq 0$ - Symmetry:
$E(x) =\int xK(x) dx=0$ - Probability density:
$\int K(x) dx=1$ - Positive variance:
$D(x)=E(x^2)=\int x^2K(x) dx>0$
The kernel function
def EpanechnikovKernel(x):
return EPANECHNIKOV_COEFFICIENT * (1 - np.power(x, 2)) * I(x)
PS: GaussianKernel is an exception, it has elegant mathematical properties.
We also provide the function
ret = is_kernel(GaussianKernel)
print(ret) # True