Tools for reading, tokenizing, and (eventually) parsing R
code.
You can install sourcetools
from CRAN with:
install.packages("sourcetools")
Or, you can install the development version from GitHub with:
devtools::install_github("kevinushey/sourcetools")
sourcetools
comes with a couple fast functions for reading
files into R
.
Use read()
and read_lines()
to quickly read a file into
R
as character vectors. read_lines()
handles both
Windows style \r\n
line endings, as well as Unix-style
\n
endings. Performance is on par with the readers
provided by the
readr package.
text <- replicate(10000, {
paste(sample(letters, 200, TRUE), collapse = "")
})
file <- tempfile()
cat(text, file = file, sep = "\n")
mb <- microbenchmark::microbenchmark(times = 10,
base::readLines(file),
readr::read_lines(file),
sourcetools::read_lines(file)
)
sm <- summary(mb)
print(sm[c("expr", "mean", "median")], digits = 3)
## expr mean median
## 1 base::readLines(file) 17.29 16.22
## 2 readr::read_lines(file) 30.70 8.11
## 3 sourcetools::read_lines(file) 6.67 6.43
unlink(file)
sourcetools
provides the tokenize_string()
and
tokenize_file()
functions for generating a tokenized
representation of R code. These produce 'raw' tokenized
representations of the code, with each token's value as a
string, and a recorded row, column, and type:
tokenize_string("if (x < 10) 20")
## value row column type
## 1 if 1 1 keyword
## 2 1 3 whitespace
## 3 ( 1 4 bracket
## 4 x 1 5 symbol
## 5 1 6 whitespace
## 6 < 1 7 operator
## 7 1 8 whitespace
## 8 10 1 9 number
## 9 ) 1 11 bracket
## 10 1 12 whitespace
## 11 20 1 13 number