Skip to content

Commit

Permalink
fix README
Browse files Browse the repository at this point in the history
  • Loading branch information
hoxo-m committed May 4, 2016
1 parent 5a77fe1 commit e854e3d
Show file tree
Hide file tree
Showing 13 changed files with 83 additions and 46 deletions.
6 changes: 3 additions & 3 deletions R/KLIEP.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Estimate Density Ratio p_nu(x)/p_de(y) by KLIEP (Kullback-Leibler Importance Estimation Procedure)
#' Estimate Density Ratio p(x)/q(y) by KLIEP (Kullback-Leibler Importance Estimation Procedure)
#'
#' @param x numeric vector or matrix as data from a numerator distribution p_nu(x).
#' @param y numeric vector or matrix as data from a denominator distribution p_de(y).
#' @param x numeric vector or matrix as data from a numerator distribution p(x).
#' @param y numeric vector or matrix as data from a denominator distribution q(y).
#' @param sigma positive numeric vector as a search range of Gaussian kernel bandwidth.
#' @param kernel_num positive integer as number of kernels.
#' @param fold positive integer as a numer of the folds of cross validation.
Expand Down
6 changes: 3 additions & 3 deletions R/densratio.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Estimate Density Ratio p_nu(x)/p_de(y)
#' Estimate Density Ratio p(x)/q(y)
#'
#' @param x numeric vector or matrix as data from a numerator distribution p_nu(x).
#' @param y numeric vector or matrix as data from a denominator distribution p_de(y).
#' @param x numeric vector or matrix as data from a numerator distribution p(x).
#' @param y numeric vector or matrix as data from a denominator distribution q(y).
#' @param method "uLSIF"(default) or "KLIEP".
#' @param sigma positive numeric vector as a search range of Gaussian kernel bandwidth.
#' @param lambda positive numeric vector as a search range of regularization parameter for uLSIF.
Expand Down
6 changes: 3 additions & 3 deletions R/uLSIF.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Estimate Density Ratio p_nu(x)/p_de(y) by uLSIF (unconstrained Least-Square Importance Fitting)
#' Estimate Density Ratio p(x)/q(y) by uLSIF (unconstrained Least-Square Importance Fitting)
#'
#' @param x numeric vector or matrix as data from a numerator distribution p_nu(x).
#' @param y numeric vector or matrix as data from a denominator distribution p_de(y).
#' @param x numeric vector or matrix as data from a numerator distribution p(x).
#' @param y numeric vector or matrix as data from a denominator distribution q(y).
#' @param sigma positive numeric vector as a search range of Gaussian kernel bandwidth.
#' @param lambda positive numeric vector as a search range of regularization parameter.
#' @param kernel_num positive integer as number of kernels.
Expand Down
14 changes: 11 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ library(mvtnorm)

## 1. Overview

**Density ratio estimation** is described as follows: for given two data samples `x` and `y` from unknown distributions `p_nu(x)` and `p_de(y)` respectively, estimate `w(x) = p_nu(x) / p_de(x)`, where `x` and `y` are d-dimensional real numbers.
**Density ratio estimation** is described as follows: for given two data samples `x` and `y` from unknown distributions `p(x)` and `q(y)` respectively, estimate `w(x) = p(x) / q(x)`, where `x` and `y` are d-dimensional real numbers.

The estimated density ratio function `w(x)` can be used in many applications such as the inlier-based outlier detection [1] and covariate shift adaptation [2].
Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].
Expand Down Expand Up @@ -76,10 +76,18 @@ For data samples `x` and `y`,
```{r eval=FALSE}
library(densratio)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)
result <- densratio(x, y)
```

In this case, `result$compute_density_ratio()` can compute estimated density ratio.
In this case, `result$compute_density_ratio()` can compute estimated density ratio.

```{r}
w_hat <- result$compute_density_ratio(y)
plot(y, w_hat)
```

### 3.2. Methods

Expand Down Expand Up @@ -117,7 +125,7 @@ result
- **Kernel type** is fixed by Gaussian RBF.
- The **number of kernels** is the number of kernels in the linear model. You can change by setting `kernel_num` parameter. In default, `kernel_num = 100`.
- **Bandwidth(sigma)** is the Gaussian kernel bandwidth. In default, `sigma = "auto"`, the algorithms automatically select the optimal value by cross validation. If you set `sigma` a number, that will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p_nu(x)`. You can find the whole values in `result$kernel_info$centers`.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p(x)`. You can find the whole values in `result$kernel_info$centers`.
- **Kernel weights** are alpha parameters in the linear model. It is optimaized by the algorithms. You can find the whole values in `result$alpha`.
- **The funtion to estimate density ratio** is named `compute_density_ratio()`.

Expand Down
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Koji MAKIYAMA (@hoxo_m)

## 1. Overview

**Density ratio estimation** is described as follows: for given two data samples `x` and `y` from unknown distributions `p_nu(x)` and `p_de(y)` respectively, estimate `w(x) = p_nu(x) / p_de(x)`, where `x` and `y` are d-dimensional real numbers.
**Density ratio estimation** is described as follows: for given two data samples `x` and `y` from unknown distributions `p(x)` and `q(y)` respectively, estimate `w(x) = p(x) / q(x)`, where `x` and `y` are d-dimensional real numbers.

The estimated density ratio function `w(x)` can be used in many applications such as the inlier-based outlier detection [1] and covariate shift adaptation [2].
Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) [3].
Expand Down Expand Up @@ -42,8 +42,6 @@ result
## Kernel Weights(alpha):
## num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...
##
## Regularization Parameter(lambda): 0.1
##
## The Function to Estimate Density Ratio:
## compute_density_ratio()
```
Expand All @@ -60,7 +58,7 @@ plot(estimated_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE)
legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)
```

![](README_files/figure-html/unnamed-chunk-1-1.png)
![](README_files/figure-html/unnamed-chunk-1-1.png)<!-- -->

## 2. How to Install

Expand Down Expand Up @@ -95,10 +93,21 @@ For data samples `x` and `y`,
```r
library(densratio)

x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)

result <- densratio(x, y)
```

In this case, `result$compute_density_ratio()` can compute estimated density ratio.
In this case, `result$compute_density_ratio()` can compute estimated density ratio.


```r
w_hat <- result$compute_density_ratio(y)
plot(y, w_hat)
```

![](README_files/figure-html/unnamed-chunk-5-1.png)<!-- -->

### 3.2. Methods

Expand Down Expand Up @@ -144,7 +153,7 @@ As the result, you can obtain `compute_density_ratio()`.
## Kernel Weights(alpha):
## num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...
##
## Regularization Parameter(lambda): 0.1
## Regularization Parameter(lambda):
##
## The Function to Estimate Density Ratio:
## compute_density_ratio()
Expand All @@ -153,7 +162,7 @@ As the result, you can obtain `compute_density_ratio()`.
- **Kernel type** is fixed by Gaussian RBF.
- The **number of kernels** is the number of kernels in the linear model. You can change by setting `kernel_num` parameter. In default, `kernel_num = 100`.
- **Bandwidth(sigma)** is the Gaussian kernel bandwidth. In default, `sigma = "auto"`, the algorithms automatically select the optimal value by cross validation. If you set `sigma` a number, that will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p_nu(x)`. You can find the whole values in `result$kernel_info$centers`.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p(x)`. You can find the whole values in `result$kernel_info$centers`.
- **Kernel weights** are alpha parameters in the linear model. It is optimaized by the algorithms. You can find the whole values in `result$alpha`.
- **The funtion to estimate density ratio** is named `compute_density_ratio()`.

Expand Down Expand Up @@ -218,7 +227,7 @@ contour(range, range, z_true, main = "True Density Ratio")
contour(range, range, z_hat, main = "Estimated Density Ratio")
```

![](README_files/figure-html/unnamed-chunk-7-1.png)
![](README_files/figure-html/unnamed-chunk-8-1.png)<!-- -->

The dimensions of `x` and `y` must be same.

Expand Down
Binary file added README_files/figure-html/unnamed-chunk-5-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added README_files/figure-html/unnamed-chunk-8-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions man/KLIEP.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/densratio.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/uLSIF.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions vignettes/densratio.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,15 @@ legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, l
## ----eval=FALSE----------------------------------------------------------
# library(densratio)
#
# x <- rnorm(200, mean = 1, sd = 1/8)
# y <- rnorm(200, mean = 1, sd = 1/2)
#
# result <- densratio(x, y)

## ----fig.width=5, fig.height=4-------------------------------------------
w_hat <- result$compute_density_ratio(y)
plot(y, w_hat)

## ----echo=FALSE----------------------------------------------------------
result

Expand Down
14 changes: 11 additions & 3 deletions vignettes/densratio.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@ library(mvtnorm)

## 1. Overview

**Density ratio estimation** is described as follows: for given two data samples $x$ and $y$ from unknown distributions $p_{nu}(x)$ and $p_{de}(y)$ respectively, estimate
**Density ratio estimation** is described as follows: for given two data samples $x$ and $y$ from unknown distributions $p(x)$ and $q(y)$ respectively, estimate

$$
w(x) = \frac{p_{nu}(x)}{p_{de}(x)}
w(x) = \frac{p(x)}{q(x)}
$$

where $x$ and $y$ are $d$-dimensional real numbers.
Expand Down Expand Up @@ -82,11 +82,19 @@ For data samples `x` and `y`,
```{r eval=FALSE}
library(densratio)
x <- rnorm(200, mean = 1, sd = 1/8)
y <- rnorm(200, mean = 1, sd = 1/2)
result <- densratio(x, y)
```

In this case, `result$compute_density_ratio()` is the function to compute estimated density ratio.

```{r fig.width=5, fig.height=4}
w_hat <- result$compute_density_ratio(y)
plot(y, w_hat)
```

### 3.2. Methods

`densratio()` has `method` parameter that you can pass `"uLSIF"` or `"KLIEP"`.
Expand Down Expand Up @@ -131,7 +139,7 @@ result
- **Kernel type** is fixed by Gaussian RBF.
- The **number of kernels** is the number of kernels in the linear model. You can change by setting `kernel_num` parameter. In default, `kernel_num = 100`.
- **Bandwidth(sigma)** is the Gaussian kernel bandwidth. In default, `sigma = "auto"`, the algorithms automatically select the optimal value by cross validation. If you set `sigma` a single number, it will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p_nu(x)`. You can find the whole values in `result$kernel_info$centers`.
- **Centers** are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution $p(x)$. You can find the whole values in `result$kernel_info$centers`.
- **Kernel weights** are the alpha parameters in the linear model. It was optimaized by the algorithms. You can find the whole values in `result$alpha`.
- **The funtion to estimate density ratio** is named `compute_density_ratio()`.

Expand Down
Loading

0 comments on commit e854e3d

Please sign in to comment.