Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
flystar233 committed Aug 27, 2024
1 parent c76366a commit 4895f62
Show file tree
Hide file tree
Showing 5 changed files with 598 additions and 11 deletions.
5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Authors@R:
role = c("aut", "cre"),
email = "flystar233@gmail.com")
Maintainer: Tengfei Xu <flystar233@gmail.com>
Description: Outqrf package provides a method to find the outlier in custom data by quantile random forests("Quantile Regression Forests", Journal of Machine Learning Research, 7(Jun), 983-999, 2006.). It directly calls the ranger function of the ranger package to perform data fitting and prediction.In outqrf, we also implement the evaluation of outlier prediction results.
Description: This package provides a method to find the outlier in custom data by quantile random forests("Quantile Regression Forests", Journal of Machine Learning Research, 7(Jun), 983-999, 2006.). It directly calls the ranger function of the ranger package to perform data fitting and prediction.In outqrf, we also implement the evaluation of outlier prediction results.
LazyData: true
License: MIT + file LICENSE
Depends:
Expand All @@ -24,8 +24,9 @@ Imports:
dplyr,
missRanger,
ggpubr,
ggplot2,
tidyr
URL: https://github.com/flystar233/outqrf, flystar233.github.io/outqrf
URL: https://github.com/flystar233/outqrf
BugReports: https://github.com/flystar233/outqrf/issues
Suggests:
renv,
Expand Down
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Generated by roxygen2: do not edit by hand

importFrom("stats", "predict", "rnorm", "sd")
S3method(plot,outqrf)
export(evaluateOutliers)
export(find_index)
Expand Down
13 changes: 5 additions & 8 deletions R/plot.r
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,6 @@
#' qrf <- outqrf(irisWithOutliers)
#' plot(qrf)
plot.outqrf<- function(qrf) {
library(ggpubr)
library(dplyr)
library(tidyr)
result_df <- data.frame()
data <- qrf$Data
for (i in seq_along(qrf$outMatrixs)) {
Expand All @@ -24,15 +21,15 @@ plot.outqrf<- function(qrf) {
}
}
names(result_df) = names(qrf$outMatrixs)
result_df <- mutate(result_df,tag = "predicted")
result_df <- dplyr::mutate(result_df,tag = "predicted")
numeric_features <- names(data)[sapply(data,is.numeric)]
data <- data[numeric_features]
data <- mutate(data,tag = "observed")
data <- dplyr::mutate(data,tag = "observed")
plot_in <-rbind(result_df,data)
plot_in_longer<- plot_in|>pivot_longer(!tag,names_to ="features",values_to ="value" )
p<- ggpaired(plot_in_longer, x="tag", y="value",
plot_in_longer<- plot_in|>tidyr::pivot_longer(!tag,names_to ="features",values_to ="value" )
p<- ggpubr::ggpaired(plot_in_longer, x="tag", y="value",
fill="tag", palette = "jco",
line.color = "grey", line.size =0.8, width = 0.4,short.panel.labs = FALSE)+
stat_compare_means(label = "p.format", paired = TRUE)+theme(legend.position = "none")+facet_wrap(~features, scales = "free")
ggpubr::stat_compare_means(label = "p.format", paired = TRUE)+ggplot2::theme(legend.position = "none")+ggplot2::facet_wrap(~features, scales = "free")
return(p)
}
83 changes: 83 additions & 0 deletions inst/doc/outqrf.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
title: "Using outqrf"
author: "lucian xu"
date: "`r Sys.Date()`"
link-citations: true
vignette: >
%\VignetteIndexEntry{Using 'outqrf'}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
output: html_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE,
message = FALSE,
fig.width = 6,
fig.height = 4,
fig.align = "center"
)
```

## Overview

outqrf is an R package used for outlier detection. Each numeric variable is regressed onto all other variables using a quantile random forest (QRF). We use ranger to perform the fitting and prediction of quantile regression forests (QRF). Next, we will compute the rank of the observed values in the predicted results' quantiles. If the rank of the observed value exceeds the threshold, the observed value is considered an outlier.

Since the same predicted value might be distributed across multiple quantiles in the predicted quantile results, this affects our location finding for the observed value. Therefore, we also used a method similar to the outForest package to compare the observed value with the 50% quantile value again to determine the final quantile result.

## Installation

```{r install, eval=FALSE, include=FALSE}
# Development version
devtools::install_github("flystar233/outqrf")
```

## Usage

```{r usage, echo=TRUE}
library(outqrf)
#Generate data with outliers in numeric columns
irisWithOutliers <- generateOutliers(iris, p = 0.05,seed =2024)
# Find outliers by quantile random forest regressions
out <- outqrf(irisWithOutliers,quantiles_type=400)
out$outliers
```

## Evaluation on diamonds (Small Dataset)

```{r Evaluation1, echo=TRUE}
library(outqrf)
irisWithOutliers <- generateOutliers(iris, p = 0.05,seed =2024)
qrf <- outqrf(irisWithOutliers,quantiles_type=400)
evaluateOutliers(iris,irisWithOutliers,qrf$outliers)
```



```{r Evaluation1_1, eval=FALSE}
plot(qrf)
```

```{r Evaluation1_2, echo=FALSE}
library(outqrf)
irisWithOutliers <- generateOutliers(iris, p = 0.05,seed =2024)
qrf <- outqrf(irisWithOutliers,quantiles_type=400)
plot(qrf)
```

## Evaluation on diamonds (Big Dataset)

```{r Evaluation2, echo=TRUE}
library(outqrf)
library(tidyverse)
data <- diamonds|>select(price,carat,cut,color,clarity)
data2 <- outqrf::generateOutliers(data, p = 0.001,seed =2024)
# 108
qrf <- outqrf(data2,num.threads=8,quantiles_type=400)
#The process can be slow because it needs to predict the value at 400|1000 quantiles for each observation.
evaluateOutliers(data,data2,qrf$outliers)
```
506 changes: 506 additions & 0 deletions inst/doc/outqrf.html

Large diffs are not rendered by default.

0 comments on commit 4895f62

Please sign in to comment.