This R package provides functions to crawl the stock market from the platform http://www.ariva.de. Although Yahoo has (and maybe several other providers) an API to request financial (stock) data, those platforms lack some information, especially in conjunction with german stock market data. After some reasearch I found that ariva provides the most information for all international stocks and its website structure allows easy crawling. This package will grow iteratively with additonal functionalities. Nevertheless, the methods are strictly depending on the website structure of ariva and the GET request structure of the csv file!
Here is a list of features which will be added over time:
-
Get price information:
- Allow to choose stock market e.g. Xetra, Frankfurt etc.
-
Get stock information:
- listing main indices.
- list stocks from stock indices --> Ask about e.g. DAX or NASDAQ and retrieve their current stocks
- further stock information on ISIN with: Yahoo ticker, sector,, location etc. Also dividends, general meetings etc.
- Build an internal text file with (own) standardized stock names for easier request, such that ISIN can be replaced with the stock name. Therefore crawl all possible stocks (ISIN, Yahoo ticker and name) and put in a txt file. Should be able to be updated.
-
Get fundamental data:
- Crawl also fundamental data like PE ratio or equity ratio etc.
-
Quick Plot functions:
- Provide pre-defined plots for price and fundamental data
-
Preprocess Price and Fundamental Data:
- Add functions to process the data. E.g. calculate returns
When installing this package you should at least use the R version 3.3.0 (2016-05-03). For the library dependecies see the section below. You can easily install this R package by using the install_github()
function from the devtools
package:
library(devtools)
install_github("wagnertimo/smarketcrawlR")
Before using this R package, please check that you have installed the following R packages. Normally during the installation of the package those dependencies will also be installed. If not you have to do it manually.
httr
xml2
XML
lubridate
dplyr
ggplot2
stringr
logging
tidyr
To start with this R package it is convenient to first use the getStockMarketIndexList()
function. This returns a data.frame with all the stock indexes (or indices) from http://www.ariva.de/aktien/indizes.
Status Quo the function only returns the name of the stock index a link to its main page at ariva.de and its ISIN. Later on the goal is to also add its stocks.
At the end you will get a list of the most important stock indexes around the world with its current stocks (name and ISIN). From there on you can use the other functions like e.g. getDailyOHLC()
since you now have a list of stocks with names and the needed ISIN unique identifier. There will also be a function to convert the stock (with its ISIN) into a Yahoo ticker.
stockIndex <- getStockMarketIndexList()
head(stockIndex)
# Output:
# Name Link ISIN
# 1 DAX http://www.ariva.de/dax-30 DE0008469008
# 2 TecDAX http://www.ariva.de/tecdax DE0007203275
# 3 MDAX http://www.ariva.de/mdax DE0008467416
# 4 SDAX http://www.ariva.de/sdax DE0009653386
# 5 HDAX http://www.ariva.de/hdax-index/kursliste DE0008469016
# 6 CDAX http://www.ariva.de/cdax-index/kursliste DE0008469602
getDailyOHLC(searchDates, isin, input_df)
: This function retrieves daily price data (Open, High, Low, Close, Volume, Turnover) from Xetra (Status Quo, will be extended in the future). The price data is cleaned by splits and subscription rights (dividends are priced!, further options to toggle them on/off will be extended in the future). There are different ways to retrieve the data or specify its input parameters:- The first option is to simply declare a date string (date formats are dd.mm.YYYY) and a ISIN string to get the data of that stock on that date. The date can be a single date, a single month, a single year or date periods of days, months and years. So e.g. if you want price data from September 2014 till (including) November 2014, the date format is "09.2014-11.2014". The data for the whole year 2014 you get with "2014".
- The second option extends the first one by the possibility to define several date formats and stocks(ISINs) in an array. This is good if you need to compare different time periods for several stocks. All that data will be in one data.frame grouped by the stocks and sorted by the dates.
- The third option specifies an input data.frame. The data.frame needs dates and isin as column names. The rows are individual dates with an isin (e.g. "22.09.2014" "DE0007037145"). For this option you need to name the
getDailyOHLC(ìnput_df = df)
parameter name.
# Activate the package in the workspace
library(smarketcrawlR)
# You have to set logging to TRUE or FALSE if you want logs printed out and written in a file (Good for Debugging)
# No default yet. Will break if not set.
setLogging(TRUE)
# 1. Option: Time Period string and ISIN strings as input params
# Get the price data for RWE (DE0007037145) of week 22.09.2014-26.09.2014
prices <- getDailyOHLC("22.09.2014-26.09.2014", "DE0007037145")
head(prices)
# Output:
# Date Open High Low Close Volume Turnover ISIN
# 1 2014-09-22 23.955 24.355 23.955 24.005 18425 445699 DE0007037145
# 2 2014-09-23 24.250 24.600 23.790 24.070 127625 3096999 DE0007037145
# 3 2014-09-24 23.960 24.345 23.960 24.345 31045 749568 DE0007037145
# 4 2014-09-25 24.450 24.760 24.065 24.080 86060 2103094 DE0007037145
# 5 2014-09-26 24.220 24.220 23.900 24.195 26344 634047 DE0007037145
# 2. Option: Use an array of different request dates (or periods) and stocks
# Get the prices for RWE AG (DE0007037145) and VW VZ (DE0007664039) for different time periods or dates (see searchDates parameter above)
# Build the input arrays of dates (or time periods) and stocks (with isin)
searchDates <- c("20.09.2014-22.09.2014","20.08.2014","06.05.2014-16.05.2014")
isin <- c("DE0007037145","DE0007664039")
prices <- getDailyOHLC(searchDates = searchDates, isin = isin)
# Alternative: prices <- getDailyOHLC(searchDates, isin2)
head(prices)
# Output:
# Date Open High Low Close Volume Turnover ISIN
# 1 2014-05-06 20.360 20.775 20.331 20.640 187223 3867569 DE0007037145
# 2 2014-05-07 20.500 21.005 20.445 20.975 108269 2251327 DE0007037145
# 3 2014-05-08 20.990 21.270 20.900 21.195 68254 1442694 DE0007037145
# 4 2014-05-09 21.215 21.405 21.025 21.075 53477 1133517 DE0007037145
# 5 2014-05-12 21.075 21.290 20.850 21.000 84629 1779225 DE0007037145
# 6 2014-05-13 21.000 21.000 20.555 20.720 124451 2580944 DE0007037145
# 3. Option: Use an input data.frame (input_df)
# Get the prices for RWE AG (DE0007037145) of the 25.02.2008 and 27.02.2008
# Build the input data.frame with dates and isin columns of requested dates and stocks
dates <- c("25.02.2008","27.02.2008")
isin <- c("DE0007037145","DE0007037145")
input_df <- data.frame(dates, isin)
prices <- getDailyOHLC(input_df = input_df)
head(prices)
# Output:
# Date Open High Low Close Volume Turnover ISIN
# 1 2008-02-25 66.8938 67.3948 65.7217 67.2245 136828 9198202 DE0007037145
# 2 2008-02-27 67.1543 67.6252 66.7436 67.3146 122037 8214851 DE0007037145
For further processing the financial data you can calculate the logarithmic returns in percent with the addLogReturns
function. You can input a financial data.frame (e.g. be the getDailyOHLC
method) and specify values (columns of your financial input data.frame, e.g. Close
and/or Volume
) to add log returns to you input data. The first observation cannot have any return value since it has no previous value. Due to computational reasons the first observation is set to 0
.
# Get the financial data
prices <- getDailyOHLC("09.2014-11.2014", "DE0007037145")
# Calculate the log returns in percent
log.returns <- addLogReturns(prices, c("Close", "Volume"))
head(log.returns)
# Output:
# Date Open High Low Close Volume Turnover ISIN Close_LogReturn Volume_LogReturn
# 1 2014-09-01 23.155 23.440 23.155 23.300 20685 481957 DE0007037145 0.00 0.00
# 2 2014-09-02 23.270 23.850 23.265 23.480 50256 1183669 DE0007037145 0.77 88.77
# 3 2014-09-03 23.655 24.140 23.525 23.975 53952 1290523 DE0007037145 2.09 7.10
# 4 2014-09-04 24.000 24.435 23.680 24.330 67197 1622824 DE0007037145 1.47 21.95
# 5 2014-09-05 24.290 24.775 24.285 24.775 68326 1682513 DE0007037145 1.81 1.67
# 6 2014-09-08 24.680 24.775 24.580 24.695 52377 1291223 DE0007037145 -0.32 -26.58
The function getStockInfoFromIsin()
crawls for general information about the given stocks (ISIN). You can therefore specify a vector of ISIN characters. The returned data.frame
gives information about the stock type, business, genus, date since listed, the year of establishment, country of origin, the nominal value, sector, ticker, currency and ISIN.
# Build the input parameter
isin <- c("DE0007037145", "DE000ENAG999")
# Get the information about for RWE AG (DE0007037145) and EON (DE000ENAG999)
stockInfos <- getStockInfoFromIsin(isin = isin)
stockInfos
# Output
#
# StockType Business Genus ListedSince Established Country NominalValue Sector Ticker Currency
# 1 Inlandsaktie Energieversorger Vorzugsaktie 1922-12-01 1898 Deutschland <NA> Versorger RWE3 EUR
# 2 Inlandsaktie Energieversorger Namens-Stammaktie 1965-08-09 2000 Deutschland 1,00 Versorger EOAN EUR
# ISIN
# 1 DE0007037145
# 2 DE000ENAG999