Skip to content

An R package for building forecasting models using data from National Vulnerability Database (NVD).

Notifications You must be signed in to change notification settings

realerikrani/nvdr

Repository files navigation

nvdr

license Travis-CI Build Status

It is assumed that R is downloaded and installed on a UNIX platform, Windows or MacOS. In the case of Ubuntu or other Linux systems, sometimes other programs such as libcurl4-openssl-dev and libxml2-dev need to be installed as well (then the error messages mention missing "curl" or "xml2-config"). Eventually, the installation instructions can be followed from the R console.

Installation

Install nvdr from GitHub.

install.packages("remotes")
remotes::install_github("realerikrani/nvdr")

Documentation and User Guide

https://realerikrani.github.io/nvdr/

Examples

library(nvdr)
## Create a new CWE object.
c <- CWE$new()
## Set the data from CVE-2011 to CVE-2016 that come along with the package ...
c$setBaseData()
## ... or set data from selected XML Version 2.0 files downloaded from https://nvd.nist.gov/vuln/data-feeds#CVE_FEED .
c$setBaseData(c("nvdcve-2.0-2013.xml","nvdcve-2.0-2015.xml"))
## Get the year and month that mark the beginning of the interested time period.
c$getStartYear()
c$getStartMonth(as_number = T)
## Get the year and month that mark the end of the interested time period. See how to change the end and start at the end of the README.
c$getEndYear()
c$getEndMonth(as_number = T)
## Set monthly average CVSS time series for interesting CWEs ...
c$setTimeSeriesData(c$getInterestingMonthlyData())
## ... or set monthly average CVSS time series for specific CWEs.
c$setTimeSeriesData(c$getMonthlyData(c("CWE-119","CWE-78")))
## Create new forecasting object.
cf <- CVSSForecaster$new()
## Use the created CWE object with its time period and time series data.
cf$setCWEs(c)
## Create models and forecasts (creates the forecast models for the selected CWEs' time series and measures the accuracy).
cf$setBenchmark()
cf$setETS()
cf$setARIMA()
## Get forecast plots for specific type of models.
cf$getPlots(cf$getARIMA())
cf$getPlots(cf$getETS())
## Get all created ARIMA models
cf$getARIMA()
## Get ARIMA model by CWE that is among the selected CWEs.
cf$getARIMA("CWE-119")
## Merge a potentially good ARIMA model's training and test data into new training data to  forecast the unknown future of 9 months.
cf$getARIMA("CWE-119")$useModel(9)
## Save the unknown future results and plot them with ggplot2
u <- cf$getARIMA("CWE-119")$useModel(9)
ggplot2::autoplot(u)
## Get ARIMA forecast accuracy measures.
cf$getAssessments(cf$getARIMA())
## Get assessments of the most accurate models that have been created so far for each CWE
cf$getBestAssessments()
## Get all forecast accuracy measures so far
cf$getAllAssessments()
## Save a list of current models with best forecast accuracy.
cf$setBestModelList()
## Plot the best models list
cf$getPlots(cf$getBestModelList())
## Merge the best models' (for each CWE based on the best forecast accuracy) training and test data into new training data for forecasts of the unknown future of 9 months.
list_of_unseen_future_forecasts <- cf$useBest(9)
## Plot the used best models' new forecasts
cf$plotUseBest(list_of_unseen_future_forecasts, row_no = 5, col_no = 2)
## Assuming that some time has gone past and a new CWE object `c2`
## has been created containing time series test data, one can find out the forecast accuracy and add the obtained actual values to plots
cf$assessUseBest(list_of_unseen_future_forecasts, c2$getTimeSeriesData())
cf$plotUseBest(list_of_unseen_future_forecasts, row_no = 5, col_no = 2, actual = c2$getTimeSeriesData())

Look at the included binary data.

head(nvd)

License

This package is licensed under GPL-3.

Acknowledgments

The forecasts of nvdr rely on the methods provided by R package forecast.

@Manual{,
  title = {{forecast}: Forecasting functions for time series and linear models},
  author = {Rob Hyndman and Christoph Bergmeir and Gabriel Caceres and Mitchell O'Hara-Wild and Slava Razbash and Earo Wang},
  year = {2017},
  note = {R package version 8.3},
  url = {http://pkg.robjhyndman.com/forecast},
}

Furthermore, residuals are checked in nvdr by using the ideas from https://github.com/robjhyndman/forecast/blob/c87f33/R/checkresiduals.R. The exact way of lag calculations from that file are used in nvdr.

Notes

The package includes preprocessed data obtained from XML Version 2.0 data from https://nvd.nist.gov/vuln/data-feeds#CVE_FEED as of 7 October 2017 covering vulnerability entries from CVE-2011 to CVE-2016. Users have the possibility to extract data on their own without creating any objects as well with the package's function get_nvd_entries(for example, get_nvd_entries(c("nvdcve-2.0-2013.xml","nvdcve-2.0-2015.xml"))).