**.
## Installation
-First of all, make sure you installed the package by sourcing it (note that this is a working project, so no 'oficial' package has been created yet).
+You can install this version of GitHub (note that this is a working
+project):
-```r
-source('bibliometRics.R')
+``` r
+devtools::install_github("sbegueria/bibliometRics")
```
-
## Getting some bibliometric data
-So far the unique source of blibiometric information accepted by `bibliometRics` is the Web of Science (WoS) by Thomson Reuters, but other sources such as Scopus or Google Scholar can be added in the future.
-Publications can be selected for a given author (easiest if you know its author ID) or a group (such a research group or a department, for instance).
-The WoS allows to generate a citation report, stating the number of citations received by each item, every year after its publication.
-So once you are good with the selection of publications you want to analyze, you can click on the 'create citation report' button and then select the 'save to text file' option.
-This will generate a text file and download it to your computer.
-You may want to edit the AUTHOR field on the first line of this text file, although this is not strictly necessary for doing the analysis.
-
-An example citation report file from the WoS is the file [sbegueria.txt](sbegueria.txt), which contains data of papers I have authored, as of January 2017.
-
-The core function for reading WoS citation report data is, not surprisingly, `read.wos`.
-Its only argument is the name of the data file you want to read:
-
-
-```r
+So far the unique source of blibiometric information accepted by
+`bibliometRics` is the Web of Science (WoS) by Thomson Reuters, but
+other sources such as Scopus or Google Scholar can be added in the
+future. Publications can be selected for a given author (easiest if you
+know its author ID) or a group (such a research group or a department,
+for instance). The WoS allows to generate a citation report, stating the
+number of citations received by each item, every year after its
+publication. So once you are good with the selection of publications you
+want to analyze, you can click on the ‘create citation report’ button
+and then select the ‘save to text file’ option. This will generate a
+text file and download it to your computer. You may want to edit the
+AUTHOR field on the first line of this text file, although this is not
+strictly necessary for doing the analysis.
+
+An example citation report file from the WoS is the file
+[sbegueria.txt](sbegueria.txt), which contains data of papers I have
+authored, as of January 2017.
+
+The core function for reading WoS citation report data is, not
+surprisingly, `read.wos`. Its only argument is the name of the data file
+you want to read:
+
+``` r
bib <- read.wos('sbegueria.txt')
str(bib)
```
-```r
+``` r
## List of 3
## $ author : chr "BEGUERIA, SANTIAGO"
## $ reference: chr "sbegueria"
-## $ pubs :'data.frame': 88 obs. of 39 variables:
+## $ pubs :'data.frame': 88 obs. of 39 variables:
## ..$ Title : chr [1:88] "Estimating erosion rates using Cs-137 measurements and WATEM/SEDEM in a Mediterranean cultivated field" "Recent changes and drivers of the atmospheric evaporative demand in the Canary Islands" "Mid and late Holocene forest fires and deforestation in the subalpine belt of the Iberian range, northern Spain" "Use of disdrometer data to evaluate the relationship of rainfall kinetic energy and intensity (KE-I)" ...
## ..$ Authors : chr [1:88] "Quijano, Laura; Begueria, Santiago; Gaspar, Leticia; Navas, Ana" "Vicente-Serrano, Sergio M.; Azorin-Molina, Cesar; Sanchez-Lorenzo, Arturo; El Kenawy, Ahmed; Martin-Hernandez, Natalia; Pena-Ga"| __truncated__ "Garcia-Ruiz, Jose M.; Sanjuan, Yasmina; Gil-Romera, Graciela; Gonzalez-Samperiz, Penelope; Begueria, Santiago; Arnaez, Jose; Co"| __truncated__ "Angulo-Martinez, M.; Begueria, S.; Kysely, J." ...
## ..$ Corporate Authors: logi [1:88] NA NA NA NA NA NA ...
@@ -84,44 +110,53 @@ str(bib)
## ..$ 2017 : int [1:88] 0 0 0 0 0 0 0 0 0 0 ...
```
-The result is a list with the following three elements:
-* `author`, a character object with the name of the author being analysed;
-* `reference`, a character object; and
-* `pubs`, a data frame with the publications in rows, and the data referring to each publications in columns, including the number of citations received, year by year.
-
+The result is a list with the following three elements: \* `author`, a
+character object with the name of the author being analysed; \*
+`reference`, a character object; and \* `pubs`, a data frame with the
+publications in rows, and the data referring to each publications in
+columns, including the number of citations received, year by year.
## Analyzing bibliometric data
-The main function for analyizing these data is `bibliometric`.
-It takes an object with the bibliometric data resulting from a call to `read.wos` and returns a data.frame with a number of bibliometric indices.
+The main function for analyizing these data is `bibliometric`. It takes
+an object with the bibliometric data resulting from a call to `read.wos`
+and returns a data.frame with a number of bibliometric indices.
-In order to compute some metrics which are based on the number of citations received by each publication, data on the percentile baselines for each scientific discipline is requireed.
-This data can be found on another Thomson Reuters product, the 'Essential Science Indicators' database.
-You need to navigate to the 'Field Baselines' tab, and then select 'Percentiles' to get the field percentile baselines (FPB).
-You can then download the FPB table in csv format.
+In order to compute some metrics which are based on the number of
+citations received by each publication, data on the percentile baselines
+for each scientific discipline is requireed. This data can be found on
+another Thomson Reuters product, the ‘Essential Science Indicators’
+database. You need to navigate to the ‘Field Baselines’ tab, and then
+select ‘Percentiles’ to get the field percentile baselines (FPB). You
+can then download the FPB table in csv format.
-An example FPB table as of January 2017 can be found in file [BaselinePercentiles.csv](BaselinePercentiles.csv).
-This table can be read with the function 'read.baselines':
+An example FPB table as of January 2017 can be found in file
+[BaselinePercentiles.csv](BaselinePercentiles.csv). This table can be
+read with the function ‘read.baselines’:
-```r
+``` r
base <- read.baselines('BaselinePercentiles.csv')
```
-The resulting object is a list with as many items as disciplines.
-For each discipline, a data.frame is stored containing the threshold number of citations corresponding to each percentile, depending on the publication year.
-For instance, this is are the baseline thresholds for the discipline 'Geosciences':
+The resulting object is a list with as many items as disciplines. For
+each discipline, a data.frame is stored containing the threshold number
+of citations corresponding to each percentile, depending on the
+publication year. For instance, this is are the baseline thresholds for
+the discipline ‘Geosciences’:
-```r
+``` r
base$GEOSCIENCES
```
-We can now use the function `bibliometric`, especifying the, the field baselines, and the discipline (the table for 'all fields' will be used as a default if no discipline is specified).
+We can now use the function `bibliometric`, especifying the, the field
+baselines, and the discipline (the table for ‘all fields’ will be used
+as a default if no discipline is specified).
-```r
+``` r
bibliometric(bib, base, 'GEOSCIENCES')
```
-```r
+``` r
## name ini years pubs lead pubs_year hin hin_year
## 1 BEGUERIA, SANTIAGO 2000 16 88 17 5.5 33 2.06
## gin gin_year cit_tot cit_year cit_art ifact2 ifact5 i10 i25 i50
@@ -130,78 +165,79 @@ bibliometric(bib, base, 'GEOSCIENCES')
## 1 616 32 7 11 4145 187
```
-These are the following:
-
- Label | Meaning
----------------| ----------------------------------------------------
- name | Name of the author, group, department, etc.
- ini | Initial year of the publication record
- span | Time span (years) of the publication record
- pubs | Total number of publications
- lead | Total number of publications, as the lead (first) author
- pubs_year | Mean number of publications per year
- hin | Hirsch's h-index
- hin_year | h-index per year
- gin | Egghe's g-index
- gin_year | g-index per year
- cit_tot | Total number of citations
- cit_year | Mean number of citations per year
- cit_art | Mean number of citations per article
- ifact2 | Impact factor, computed over the last two years
- ifact5 | Impact factor, computed over the last five years
- i10 | Number of publications with 10 or more citations
- i25 | Number of publications with 25 or more citations
- i50 | Number of publications with 50 or more citations
- cit_max | Number of citations of the most citated publications
- pubs09 | Number of publications over the 90th percentile in its discipline
- pubs09_lead | Number of publications over the 90th percentile in its discipline, as lead author
- pubs099 | Number of publications over the 99th percentile in its discipline
- iscore | i-score
- iscore_lead | i-score, as lead author
-
-There are functions for computing some of the indices, such as the Hirsch and the Egghe indices.
-These can be computed for the whole period analyized, or up to a given year.
-
-
-```r
+These are the
+following:
+
+| Label | Meaning |
+| ------------ | --------------------------------------------------------------------------------- |
+| name | Name of the author, group, department, etc. |
+| ini | Initial year of the publication record |
+| span | Time span (years) of the publication record |
+| pubs | Total number of publications |
+| lead | Total number of publications, as the lead (first) author |
+| pubs\_year | Mean number of publications per year |
+| hin | Hirsch’s h-index |
+| hin\_year | h-index per year |
+| gin | Egghe’s g-index |
+| gin\_year | g-index per year |
+| cit\_tot | Total number of citations |
+| cit\_year | Mean number of citations per year |
+| cit\_art | Mean number of citations per article |
+| ifact2 | Impact factor, computed over the last two years |
+| ifact5 | Impact factor, computed over the last five years |
+| i10 | Number of publications with 10 or more citations |
+| i25 | Number of publications with 25 or more citations |
+| i50 | Number of publications with 50 or more citations |
+| cit\_max | Number of citations of the most citated publications |
+| pubs09 | Number of publications over the 90th percentile in its discipline |
+| pubs09\_lead | Number of publications over the 90th percentile in its discipline, as lead author |
+| pubs099 | Number of publications over the 99th percentile in its discipline |
+| iscore | i-score |
+| iscore\_lead | i-score, as lead author |
+
+There are functions for computing some of the indices, such as the
+Hirsch and the Egghe indices. These can be computed for the whole period
+analyized, or up to a given year.
+
+``` r
hirsch(bib)
```
-```r
+``` r
## [1] 33
```
-```r
+``` r
egghe(bib)
```
-```r
+``` r
## [1] 57
```
-```r
+``` r
hirsch(bib, 2010)
```
-```r
+``` r
## [1] 13
```
-```r
+``` r
egghe(bib, 2010)
```
-```r
+``` r
## [1] 45
```
There is also a function for ranking the publications in quantiles:
-```r
+``` r
rank(bib, q=base$GEOSCIENCES)
```
-```r
+``` r
## [1] >q0 >q0 >q0 >q0 >q0.9 >q0.9 >q0.99 >q0
## [9] >q0.8 >q0.8 >q0.9 >q0.9 >q0.99 >q0.99 >q0.99 >q0.999
## [17] >q0 >q0.99 >q0.99 >q0.5 >q0.9 >q0.9 >q0.999 >q0
@@ -216,56 +252,77 @@ rank(bib, q=base$GEOSCIENCES)
## Levels: >q0.9999 >q0.999 >q0.99 >q0.9 >q0.8 >q0.5 >q0
```
-```r
+``` r
table(rank(bib, q=base$GEOSCIENCES))
```
-```r
+
+``` r
## >q0.9999 >q0.999 >q0.99 >q0.9 >q0.8 >q0.5 >q0
## 0 3 8 21 20 11 13
```
-
-A specific plotting function makes it easy to resume most of this information in graphic form.
-The plots also inform about the temporal evolution of the bibliometric indices, which may be useful for evaluating the scientific career of the evaluated.
-
-
-```r
-biblioplot(bib, q=base$GEOSCIENCES)
+A specific plotting function makes it easy to resume most of this
+information in graphic form. The plots also inform about the temporal
+evolution of the bibliometric indices, which may be useful for
+evaluating the scientific career of the evaluated.
+
+``` r
+library(bibliometRics)
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+base <- read.baselines()
+biblioplot(bib, quant = base$GEOSCIENCES)
```
-
-
-The first plot reflects the productivity (quantity) of the author, as well as its impact (citations received). It shows the cumulative number of publications, with distinction between the publications as lead author (black bars) and those as co-author (white bars). The plot also showcases the number of citations for all the publications (white circles) and for those as lead author (black circles). There is a fixed ratio of 1:10 between the left (publications) and the right (citations) axis, allowing for easy comparison across authors, groups, etc. This ratio implies assuming a mean citation rate of 10 citations per article publishes (a rate that is, of course, arbitrary).
-
-The second plot focuses on the impact of the publications. It shows the annual evolution of the Hirsch's h-index (white circles) and the Egghe's g-index (black circles), with a fixed ratio of 1:2 between them. Evolution of the h-index is compared with an 1:1 evolution (dashed line), since it is usually assumed that the h-index grows, as an average, at a rate of 1 per year.
-
-The third plot attempts at evaluating the excelence of the publications. It shows the number of publications classified by quantiles, according to the ISI-WoK Scientific Indicators per discipline. For each quantile, the total number of publications is shown (white bars), as well as the publications as lead author (black bars).
-
-A formatted list of all the publications is also produced.
-
-```r
+![A graphic bibliometric
+analysis](man/figures/README-unnamed-chunk-1-1.png)
+
+The first plot reflects the productivity (quantity) of the author, as
+well as its impact (citations received). It shows the cumulative number
+of publications, with distinction between the publications as lead
+author (black bars) and those as co-author (white bars). The plot also
+showcases the number of citations for all the publications (white
+circles) and for those as lead author (black circles). There is a fixed
+ratio of 1:10 between the left (publications) and the right (citations)
+axis, allowing for easy comparison across authors, groups, etc. This
+ratio implies assuming a mean citation rate of 10 citations per article
+publishes (a rate that is, of course, arbitrary).
+
+The second plot focuses on the impact of the publications. It shows the
+annual evolution of the Hirsch’s h-index (white circles) and the Egghe’s
+g-index (black circles), with a fixed ratio of 1:2 between them.
+Evolution of the h-index is compared with an 1:1 evolution (dashed
+line), since it is usually assumed that the h-index grows, as an
+average, at a rate of 1 per year.
+
+The third plot attempts at evaluating the excelence of the publications.
+It shows the number of publications classified by quantiles, according
+to the ISI-WoK Scientific Indicators per discipline. For each quantile,
+the total number of publications is shown (white bars), as well as the
+publications as lead author (black bars).
+
+A formatted list of all the publications is also
+produced.
+
+``` r
format_pub(cbind(bib$pubs, rank(bib,q=base$GEOSCIENCES))[1,], au=bib$au)
```
-```r
+``` r
## [1] "\\item Quijano, Laura;\\textbf{ Begueria, Santiago}; Gaspar, Leticia; Navas, Ana. Estimating Erosion Rates using Cs-137 Measurements and Watem/Sedem in a Mediterranean Cultivated Field. \\textit{CATENA} 138: 38--51. 2016. (cit: 0; $7$)\n"
```
-
## Automated bibliometric reports
-The package also contains a template `bibliometRics.Rtex` file, useful for creating automated reports.
-You'll need to load the package `knitr` in order to produce the report.
+The package also contains a template `bibliometRics.Rtex` file, useful
+for creating automated reports. You’ll need to load the package `knitr`
+in order to produce the report.
-```r
+``` r
require(knitr)
```
-
-```r
+``` r
infile <- 'sbegueria.txt'
outfile <- 'sbegueria.Rtex'
@@ -277,11 +334,12 @@ knit(outfile)
# Compile the resulting .tex file and create a .pdf from it
system(paste('/Library/TeX/texbin/pdflatex ',
- gsub('.txt','',infile),'.tex',sep=''))
+ gsub('.txt','',infile),'.tex',sep=''))
# Remove unnecesary intermediate files
kk <- list.files('.',paste(gsub('.txt','', infile)))
file.remove(kk[-c(grep('.pdf',kk),grep('.txt',kk))])
```
-An example report generated by the above code chunk can be found in the file [sbegueria.pdf]('./sbegueria.pdf').
+An example report generated by the above code chunk can be found in the
+file [sbegueria.pdf]('./sbegueria.pdf').
diff --git a/appveyor.yml b/appveyor.yml
new file mode 100644
index 0000000..87e9fec
--- /dev/null
+++ b/appveyor.yml
@@ -0,0 +1,38 @@
+init:
+ ps: |
+ $ErrorActionPreference = "Stop"
+ Invoke-WebRequest http://raw.github.com/krlmlr/r-appveyor/master/scripts/appveyor-tool.ps1 -OutFile "..\appveyor-tool.ps1"
+ Import-Module '..\appveyor-tool.ps1'
+install:
+ ps: Bootstrap
+cache: C:\RLibrary
+build_script: travis-tool.sh install_deps
+test_script: travis-tool.sh run_tests
+on_failure:
+ - 7z a failure.zip *.Rcheck\*
+ - appveyor PushArtifact failure.zip
+artifacts:
+ - path: '*.Rcheck\**\*.log'
+ name: Logs
+ - path: '*.Rcheck\**\*.out'
+ name: Logs
+ - path: '*.Rcheck\**\*.fail'
+ name: Logs
+ - path: '*.Rcheck\**\*.Rout'
+ name: Logs
+ - path: \*_*.zip
+ name: Bits
+environment:
+ global:
+ WARNINGS_ARE_ERRORS: 1.0
+ USE_RTOOLS: yes
+ R_CHECK_INSTALL_ARGS: --install-args=--build --no-multiarch
+deploy:
+ provider: GitHub
+ description: Windows Binary
+ auth_token:
+ secure: tc2Va9OWLH9H/uKPUKCUmv+q+wQpzRVJd6t8ckXjqzGnOo9eWyiQCmuuUjcuM3b8
+ draft: no
+ prerelease: no
+ 'on':
+ appveyor_repo_tag: true
diff --git a/bibliometRics.R b/bibliometRics.R
deleted file mode 100644
index 94c36d3..0000000
--- a/bibliometRics.R
+++ /dev/null
@@ -1,411 +0,0 @@
-# Functions to read and process Web of Science citation report data.
-
-read.wos <- function(infile) {
-
- # author
- au <- read.table(infile, sep=',', nrows=1, stringsAsFactors=FALSE)
- #au <- paste(gsub('AUTHOR: \\(','',au[1]),gsub('\\)','',au[2]),sep=', ')
- au <- paste(gsub('(AUTHOR)( IDENTIFIERS)*(: \\()','',au[1]), gsub('\\)','',au[2]),sep=', ')
- au <- toupper(au)
-
- # reference
- re <- gsub('.txt','',rev(strsplit(infile,'/')[[1]])[1])
-
- # read publications
- nskip <- ifelse (read.table(infile,sep=',',nrows=1,skip=1)=='null', 5, 4)
- dat <- read.table(infile, sep=',', stringsAsFactors=FALSE, skip=nskip)
- colnames(dat) <- read.table(infile,sep=',',stringsAsFactors=FALSE,skip=nskip-1,nrows=1)
- o <- rev(order(dat[,'Publication Year']))
- dat <- dat[o,]
- start <- min(dat[,'Publication Year'])
- w <- which(colnames(dat)==start)
- dat <- dat[,c(1:21,w:ncol(dat))]
-
- return(list(author=au,reference=re,pubs=dat))
-}
-#bib <- read.isiwok('isiwok_MAM.txt')
-
-
-# read.scopus <- function(infile) {
-
- # # author
- # au <- read.table(infile,sep=',',nrows=1,stringsAsFactors=FALSE)[,2]
- # au <- substr(au,9,nchar(au))
-
- # # read publications
- # dat <- read.table(infile,sep=',',stringsAsFactors=FALSE,skip=7)[,c(1:7,9,12)]
- # colnames(dat) <- c('year','title','authors','issn','journal','volume','issue','cit','cittotal')
-
- # return()
-# }
-# bib <- read.scopus('scopus_SBP_win.csv')
-
-# Read field percentile baselines (FPB) from the Essential Science Indicators database.
-read.baselines <- function(infile) {
- fpb <- read.table(infile, sep=',', skip=1, stringsAsFactors=FALSE, head=TRUE)
- fields <- fpb$RESEARCH.FIELDS[-grep('%|©', fpb$RESEARCH.FIELDS)]
- baselines <- list(NULL)
- for (i in 1:length(fields)) {
- w <- which(fpb$RESEARCH.FIELDS==fields[i])
- baselines[[i]] <- fpb[w+1:6,]
- colnames(baselines[[i]])[1] <- 'percentiles'
- names(baselines)[i] <- fields[i]
- }
- return(baselines)
-}
-
-# Hirsch's h-index
-hirsch <- function(x,y=NULL) {
- x <- x$pubs
- if (is.null(y)) {
- #y <- max(x[,'Publication Year'])
- y <- rev(colnames(x))[1]
- }
- w <- which(x[,'Publication Year']<=y)
- ww <- which(colnames(x)<=y)
- if (length(ww)>1) {
- cit <- rowSums(x[w,ww])
- } else {
- cit <- x[w,ww]
- }
- #cit <- x[,'Total Citations']
- hin <- cbind(cit[order(cit,decreasing=TRUE)],1:length(cit))
- w <- (hin[,1]-hin[,2])>0
- if (sum(w)==0) return(0)
- return(max(hin[w,2]))
-}
-#hirsch(bib)
-#hirsch(bib, 2010)
-
-
-# Egghe's g-index
-egghe <- function(x,y=NULL) {
- x <- x$pubs
- if (is.null(y)) {
- y <- max(x[,'Publication Year'])
- }
- w <- x[,'Publication Year']<=y
- cit <- x[w,'Total Citations']
- gin <- cbind(
- 1:length(cit),
- cumsum(cit[order(cit,decreasing=TRUE)]),
- (1:length(cit))^2)
- w <- (gin[,2]-gin[,3])>0
- return(max(gin[w,1]))
-}
-#egghe(bib)
-
-# citation ranking (quantiles)
-rank <- function(x, w=NULL, q=NULL) {
- x <- x$pubs
- if (!is.null(w)) {
- x <- x[w,]
- }
- r <- data.frame(a=x[,'Publication Year'],b=x[,'Total Citations'],c='')
- r$c <- factor(r$c,levels=c('>q0.9999','>q0.999','>q0.99','>q0.9','>q0.8','>q0.5','>q0'))
- for (i in 1:nrow(r)) {
- w <- grep(r[i,1],names(q))
- if (length(w)==0) next()
- ww <- which(r[i,2]-q[,w]>=0)[1]
- if (is.na(ww)) {
- r[i,3] <- levels(r$c)[7]
- } else {
- r[i,3] <- levels(r$c)[ww]
- }
- }
- return(r[,3])
-}
-#rank(bib, c(9,10), quant)
-#rank(bib, quant)
-
-# impact factor
-ifactor <- function(x, y=NULL, n=2) {
- year <- x[,'Publication Year']
- if (is.null(y)) {
- y <- max(year)-1
- }
- w <- which(year>={y-n} & year=n))
- }
- i10 <- icit(dat)
- i25 <- icit(dat,25)
- i50 <- icit(dat,50)
-
- # h-index
- hin <- hirsch(bib)
-
- # g-index
- gin <- egghe(bib)
-
- # pubs >0.9, >0.99
- r <- rank(bib, q=quant)
- p09 <- sum(as.numeric(r)<=4,na.rm=TRUE)
- p099 <- sum(as.numeric(r)<=3,na.rm=TRUE)
-
- # pubs >0.9, as lead author
- r_lead <- rank(bib, pubs_lead, quant)
- p09_lead <- sum(as.numeric(r_lead)<=4,na.rm=TRUE)
-
- # i-score
- scores <- c(1/(1-0.9999),1/(1-0.999),1/(1-0.99),1/(1-0.9),1/(1-0.8),1/(1-0.5),1/1)
- iscore <- sum(scores[as.numeric(r)],na.rm=TRUE)
-
- # i-score, as lead author
- isc_lead <- sum(scores[as.numeric(r_lead)],na.rm=TRUE)
-
- # output
- out <- as.data.frame(t(c(name=au,
- ini,span,pubs,length(pubs_lead),round(pubs/span,2),
- hin,round(hin/span,2),gin,round(gin/span,2),
- cit_tot,round(cit_tot/span,2),cit_art,ifact2,ifact5,
- i10,i25,i50,cit_max,p09,p09_lead,p099,round(iscore,2),round(isc_lead,2))))
- colnames(out) <- c('name','ini','years','pubs','lead','pubs_year','hin','hin_year',
- 'gin','gin_year','cit_tot','cit_year','cit_art','ifact2','ifact5','i10','i25',
- 'i50','cit_max','pubs09','pubs09_lead','pubs099','iscore','iscore_lead')
- return(out)
-}
-#bibliometric(bib, baseline)
-
-biblioplot <- function(bib, quant) {
-
- par(mfrow=c(1,3))
-
- au <- bib$author
- dat <- bib$pubs
-
- # Publication range
- pubrange <- range(dat[,'Publication Year'])
-
- # Leading authors
- leads <- as.character(lapply(dat[,'Authors'],function(x) {strsplit(x,';')[[1]][1]}))
- w_lead <- grep(strsplit(gsub('Ñ','N',au),',')[[1]][1], leads, ignore.case=TRUE)
-
- # Publications and citations
- # pubs per year
- years <- dat[,'Publication Year']
- years <- factor(years,levels=as.character(min(years):max(years)))
- years_nolead <- years[-w_lead]
- years_lead <- years[w_lead]
- pubs <- rbind(table(years_lead),table(years_nolead))
- rownames(pubs) <- c('pubs_lead','pubs_nolead')
- # cumulative pubs
- pubs_cum <- t(apply(pubs,1,cumsum))
- # citations per year
- cits_lead <- t(as.matrix(colSums(bib$pubs[w_lead,22:ncol(bib$pubs)])))
- cits_nolead <- t(as.matrix(colSums(bib$pubs[-w_lead,22:ncol(bib$pubs)])))
- cits <- rbind(cits_lead,cits_nolead)
- rownames(cits) <- c('cits_lead','cits_nolead')
- w <- colnames(cits)>=pubrange[1] & colnames(cits)<=pubrange[2]
- cits <- cits[,w]
- # cumulative citations
- cits_cum <- t(apply(cits,1,cumsum))
- # plot
- par(mar=c(5, 4, 4, 4)+0.1)
- ylabs <- pretty(c(0,max(max(colSums(pubs_cum)),ceiling(max(colSums(cits_cum))/10))))
- ylims <- c(min(ylabs),max(ylabs))
- pubbar <- barplot(pubs_cum,ylim=ylims,ylab='',yaxt='n')
- abline(h=ylabs,col='lightgray',lty='dotted')
- par(new=TRUE)
- pubbar <- barplot(pubs_cum,ylim=ylims,ylab='',yaxt='n')
- lines(x=pubbar,y=cits_cum['cits_lead',]/10)
- points(x=pubbar,y=cits_cum['cits_lead',]/10,pch=19)
- lines(x=pubbar,y=colSums(cits_cum)/10)
- points(x=pubbar,y=colSums(cits_cum)/10,pch=19,col='lightgray')
- points(x=pubbar,y=colSums(cits_cum)/10,pch=21)
- axis(2)
- axis(4,at=pretty(ylims),labels=pretty(ylims)*10)
- mtext('Pubs',2,line=2.5,cex=0.75)
- mtext('Citations',4,line=2.5,cex=0.75)
-
- # h- and g-index
- years <- c(min(dat[,'Publication Year']):max(dat[,'Publication Year']))
- hh <- NULL
- gg <- NULL
- for (y in years) {
- hh <- c(hh,hirsch(bib,y))
- gg <- c(gg,egghe(bib,y))
- }
- # plot
- ylabs <- pretty(c(0,max(max(hh),ceiling(max(gg)/2))))
- ylims <- c(min(ylabs),max(ylabs))
- plot(x=years,y=hh,type='n',ylab='',xlab='',ylim=ylims)
- abline(h=pretty(hh),col='lightgray',lty='dotted')
- lines(years,0:{length(years)-1},lty='dashed')
- lines(x=years,y=hh)
- points(x=years,y=hh,pch=19)
- mtext('h-index',2,line=2.5,cex=0.75)
- par(new=TRUE)
- lines(x=years,y=gg/2)
- points(x=years,y=gg/2,pch=19,col='lightgray')
- points(x=years,y=gg/2,pch=21)
- axis(4,at=pretty(ylims),labels=pretty(ylims)*2)
- mtext('g-index',4,line=2.5,cex=0.75)
-
- # Excelence
- exc <- rbind(rev(table(rank(bib, w_lead, q=quant))),rev(table(rank(bib, -w_lead, q=quant))))[,-7]
- rownames(exc) <- c('pubs_lead','pubs_nolead')
- # plot
- barplot(exc)
- abline(h=pretty(colSums(exc)),col='lightgray',lty='dotted')
- par(new=TRUE)
- barplot(exc)
- mtext('Counts',2,line=2.5,cex=0.75)
-}
-#biblioplot(bib)
-
-biblioplot2 <- function(bib) {
-
- par(mfrow=c(1,2))
-
- au <- bib$author
- dat <- bib$pubs
-
- # Publication range
- pubrange <- range(dat[,'Publication Year'])
-
- # Leading authors
- leads <- as.character(lapply(dat[,'Authors'],function(x) {strsplit(x,';')[[1]][1]}))
-
- # Publications and citations
- # pubs per year
- years <- dat[,'Publication Year']
- years <- factor(years,levels=as.character(min(years):max(years)))
- pubs <- table(years)
- # cumulative pubs
- pubs_cum <- cumsum(pubs)
- # citations per year
- cits <- t(as.matrix(colSums(bib$pubs[,22:ncol(bib$pubs)])))
- rownames(cits) <- c('cits')
- w <- colnames(cits)>=pubrange[1] & colnames(cits)<=pubrange[2]
- cits <- cits[,w]
- # cumulative citations
- cits_cum <- cumsum(cits)
- # plot
- par(mar=c(5, 4, 4, 4)+0.1)
- ylabs <- pretty(c(0,max(max(pubs_cum),ceiling(max(cits_cum)/10))))
- ylims <- c(min(ylabs),max(ylabs))
- pubbar <- barplot(pubs_cum,ylim=ylims,ylab='',yaxt='n')
- abline(h=ylabs,col='lightgray',lty='dotted')
- par(new=TRUE)
- pubbar <- barplot(pubs_cum,ylim=ylims,ylab='',yaxt='n')
- lines(x=pubbar,y=cits_cum/10)
- points(x=pubbar,y=cits_cum/10,pch=19)
- axis(2)
- axis(4,at=pretty(ylims),labels=pretty(ylims)*10)
- mtext('Pubs',2,line=2.5,cex=0.75)
- mtext('Citations',4,line=2.5,cex=0.75)
- title('Publications and citations')
-
- # h-index
- years <- c(min(dat[,'Publication Year']):max(dat[,'Publication Year']))
- hh <- NULL
- for (y in years) {
- hh <- c(hh,hirsch(bib,y))
- gg <- c(gg,egghe(bib,y))
- }
- # plot
- ylabs <- pretty(c(0,max(max(hh))))
- ylims <- c(min(ylabs),max(ylabs))
- plot(x=years,y=hh,type='n',ylab='',xlab='',ylim=ylims)
- abline(h=pretty(hh),col='lightgray',lty='dotted')
- lines(years,0:{length(years)-1},lty='dashed')
- lines(x=years,y=hh)
- points(x=years,y=hh,pch=19)
- mtext('h-index',2,line=2.5,cex=0.75)
- title('h-index')
-}
-
-
-format_pub <- function(x,au) {
- # format authors
- if (substr(x['Authors'],nchar(x['Authors']),nchar(x['Authors']))!='.') {
- x['Authors'] <- paste(x['Authors'],'.',sep='')
- }
- auths <- strsplit(as.character(x['Authors']),';')[[1]]
- w <- grep(strsplit(gsub('Ñ','N',au),',')[[1]][1],auths,ignore.case=TRUE)
- auths[w]<- paste('\\textbf{',auths[w],'}',sep='')
- auths <- paste0(auths,sep=';',collapse='')
- auths <- substr(auths,1,nchar(auths)-1)
- # format citations rank
- w <- grep('rank',names(x))
-# if (is.na(x[,w]) | x[,w]=='') {
- if (is.na(x[w])) {
- rank <- ''
- } else {
-# rank <- paste('; $',x[,w],'$',sep='')
- rank <- paste('; $',x[w],'$',sep='')
- }
- # format title
- require(tools)
- title <- gsub('&','\\\\&',toTitleCase(tolower(as.character(x['Title']))))
- # format source
- source <- gsub('&','\\\\&',x['Source Title'])
- # format volume and pages
- if (is.na(x['Volume'])) {
- vol <- ''
- } else {
- vol <- paste(' ',x['Volume'],sep='')
- }
- if (is.na(x['Issue']) | x['Issue']=='') {
- issue <- ''
- } else {
- issue <- paste('(',x['Issue'],')',sep='')
- }
- if (is.na(x['Beginning Page']) | x['Beginning Page']=='' |
- is.na(x['Ending Page']) | x['Ending Page']=='') {
- if (is.na(x['Article Number']) | x['Article Number']=='') {
- pages <- ''
- } else {
- pages <- paste(': ', x['Article Number'],sep='')
- }
- } else {
- pages <- paste(': ',x['Beginning Page'],'--',x['Ending Page'],sep='')
- }
- # put everything together
- paste('\\item ',auths,' ',title,'. ',
- '\\textit{',source,'}',vol,issue,pages,'. ',
- x['Publication Year'],'. ',
- '(cit: ',x['Total Citations'],rank,')\n', sep='')
-}
-
diff --git a/bibliometRics.Rproj b/bibliometRics.Rproj
new file mode 100644
index 0000000..64565bc
--- /dev/null
+++ b/bibliometRics.Rproj
@@ -0,0 +1,22 @@
+Version: 1.0
+
+RestoreWorkspace: No
+SaveWorkspace: No
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: knitr
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
+
+BuildType: Package
+PackageUseDevtools: Yes
+PackageInstallArgs: --no-multiarch --with-keep.source
+PackageCheckArgs: --as-cran
+PackageRoxygenize: rd,collate,namespace
diff --git a/biblioplot_example.pdf b/biblioplot_example.pdf
deleted file mode 100644
index 2764e40..0000000
Binary files a/biblioplot_example.pdf and /dev/null differ
diff --git a/codecov.yml b/codecov.yml
new file mode 100644
index 0000000..8f36b6c
--- /dev/null
+++ b/codecov.yml
@@ -0,0 +1,12 @@
+comment: false
+
+coverage:
+ status:
+ project:
+ default:
+ target: auto
+ threshold: 1%
+ patch:
+ default:
+ target: auto
+ threshold: 1%
diff --git a/BaselinePercentiles.csv b/inst/BaselinePercentiles.csv
similarity index 100%
rename from BaselinePercentiles.csv
rename to inst/BaselinePercentiles.csv
diff --git a/bibliometRics.Rtex b/inst/bibliometRics.Rtex
similarity index 100%
rename from bibliometRics.Rtex
rename to inst/bibliometRics.Rtex
diff --git a/sbegueria.txt b/inst/sbegueria.txt
similarity index 100%
rename from sbegueria.txt
rename to inst/sbegueria.txt
diff --git a/man/bibliometric.Rd b/man/bibliometric.Rd
new file mode 100644
index 0000000..9dc1875
--- /dev/null
+++ b/man/bibliometric.Rd
@@ -0,0 +1,35 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/bibliometRics.R
+\name{bibliometric}
+\alias{bibliometric}
+\title{Calculate Bibliometrics}
+\usage{
+bibliometric(bib, base = read.baselines(), discipline = NULL)
+}
+\arguments{
+\item{bib}{A list with an element called \code{pubs}, which is a
+\code{data.frame} with columns \code{'Publication Year'} and
+\code{'Total Citations'}. Usually
+from \code{\link{read.wos}}}
+
+\item{base}{Baseline percentages}
+
+\item{discipline}{Name of Discipline, must be in column names of
+\code{base}}
+}
+\value{
+A \code{data.frame} of metric
+}
+\description{
+Calculate Bibliometrics
+}
+\examples{
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+bib$author = gsub('[ñÑ]', 'N', bib$author)
+bib$pubs$Authors = gsub('[ñÑ]', 'N', bib$pubs$Authors)
+
+base = read.baselines()
+discipline = 'GEOSCIENCES'
+bibliometric(bib, base, 'GEOSCIENCES')
+}
diff --git a/man/biblioplot.Rd b/man/biblioplot.Rd
new file mode 100644
index 0000000..e824832
--- /dev/null
+++ b/man/biblioplot.Rd
@@ -0,0 +1,29 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/plot.R
+\name{biblioplot}
+\alias{biblioplot}
+\alias{biblioplot2}
+\title{Plot Bibliometrics}
+\usage{
+biblioplot(bib, quant)
+
+biblioplot2(bib)
+}
+\arguments{
+\item{bib}{A list with an element called \code{pubs}, which is a
+\code{data.frame} with columns \code{'Publication Year'} and
+\code{'Total Citations'}. Usually
+from \code{\link{read.wos}}}
+
+\item{quant}{Quantile, passed to \code{\link{citation_rank}}}
+}
+\description{
+Plot Bibliometrics
+}
+\examples{
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+base = read.baselines()
+biblioplot(bib, q=base$GEOSCIENCES)
+bib$author = gsub('Ñ', 'N', bib$author)
+}
diff --git a/man/citation_rank.Rd b/man/citation_rank.Rd
new file mode 100644
index 0000000..34548fe
--- /dev/null
+++ b/man/citation_rank.Rd
@@ -0,0 +1,34 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/citation_rank.R
+\name{citation_rank}
+\alias{citation_rank}
+\title{Calculate citation ranking (quantiles)}
+\usage{
+citation_rank(x, w = NULL, quant = NULL)
+}
+\arguments{
+\item{x}{A list with an element called \code{pubs}, which is a
+\code{data.frame} with columns \code{'Publication Year'} and
+\code{'Total Citations'}. Usually
+from \code{\link{read.wos}}}
+
+\item{w}{column indicators of which quantiles to grab}
+
+\item{quant}{A \code{data.frame}? maybe}
+}
+\value{
+A scalar number
+}
+\description{
+Calculate citation ranking (quantiles)
+}
+\examples{
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+hirsch(bib)
+hirsch(bib, 2010)
+egghe(bib)
+egghe(bib, 2010)
+base = read.baselines()
+citation_rank(bib, quant=base$GEOSCIENCES)
+}
diff --git a/man/figures/README-unnamed-chunk-1-1.png b/man/figures/README-unnamed-chunk-1-1.png
new file mode 100644
index 0000000..62f124d
Binary files /dev/null and b/man/figures/README-unnamed-chunk-1-1.png differ
diff --git a/man/format_pub.Rd b/man/format_pub.Rd
new file mode 100644
index 0000000..512a41e
--- /dev/null
+++ b/man/format_pub.Rd
@@ -0,0 +1,32 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/format_pub.R
+\name{format_pub}
+\alias{format_pub}
+\title{Format Publication data}
+\usage{
+format_pub(x, au)
+}
+\arguments{
+\item{x}{A list with an element called \code{pubs}, which is a
+\code{data.frame} with columns \code{'Publication Year'} and
+\code{'Total Citations'}. Usually
+from \code{\link{read.wos}}}
+
+\item{au}{Author to grab publications}
+}
+\value{
+A character vector
+}
+\description{
+Format Publication data
+}
+\examples{
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+base = read.baselines()
+au = bib$au
+au = gsub('Ñ', 'N', au)
+format_pub(cbind(bib$pubs,
+citation_rank(bib,quant=base$GEOSCIENCES))[1,],
+ au=au)
+}
diff --git a/man/hirsch.Rd b/man/hirsch.Rd
new file mode 100644
index 0000000..47b69cf
--- /dev/null
+++ b/man/hirsch.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/hirsch.R
+\name{hirsch}
+\alias{hirsch}
+\alias{egghe}
+\alias{ifactor}
+\title{Calculate Hirsch's h-index or Egghe's g-index}
+\usage{
+hirsch(bib, year = NULL)
+
+egghe(bib, year = NULL)
+
+ifactor(bib, year = NULL, n = 2)
+}
+\arguments{
+\item{bib}{A list with an element called \code{pubs}, which is a
+\code{data.frame} with column \code{'Publication Year'}. Usually
+from \code{\link{read.wos}}}
+
+\item{year}{Maximum year to include for calculation}
+
+\item{n}{number of years}
+}
+\value{
+A scalar number
+}
+\description{
+Calculate Hirsch's h-index or Egghe's g-index
+}
+\examples{
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+hirsch(bib)
+hirsch(bib, 2010)
+egghe(bib)
+egghe(bib, 2010)
+ifactor(bib) # 2-year impact factor
+ifactor(bib,2013) # 2-year impact factor, year 2013
+ifactor(bib,n=5) # 5-year impact factor#'
+}
diff --git a/man/read.baselines.Rd b/man/read.baselines.Rd
new file mode 100644
index 0000000..de77ad0
--- /dev/null
+++ b/man/read.baselines.Rd
@@ -0,0 +1,23 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/read.baselines.R
+\name{read.baselines}
+\alias{read.baselines}
+\title{Read field percentile baselines (FPB) from the Essential Science
+Indicators database.}
+\usage{
+read.baselines(infile = system.file("BaselinePercentiles.csv", package =
+ "bibliometRics"))
+}
+\arguments{
+\item{infile}{Baseline percentages for each field}
+}
+\value{
+A \code{list} of percentages
+}
+\description{
+Read field percentile baselines (FPB) from the Essential Science
+Indicators database.
+}
+\examples{
+res = read.baselines()
+}
diff --git a/man/read.wos.Rd b/man/read.wos.Rd
new file mode 100644
index 0000000..671c170
--- /dev/null
+++ b/man/read.wos.Rd
@@ -0,0 +1,21 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/read.wos.R
+\name{read.wos}
+\alias{read.wos}
+\title{Read and process Web of Science citation report data.}
+\usage{
+read.wos(infile)
+}
+\arguments{
+\item{infile}{Input text file from Web of Science}
+}
+\value{
+A list of information
+}
+\description{
+Read and process Web of Science citation report data.
+}
+\examples{
+wos_file = system.file('sbegueria.txt', package = "bibliometRics")
+bib <- read.wos(wos_file)
+}
diff --git a/sbegueria.pdf b/sbegueria.pdf
deleted file mode 100644
index 635c4c5..0000000
Binary files a/sbegueria.pdf and /dev/null differ
diff --git a/tests/testthat.R b/tests/testthat.R
new file mode 100644
index 0000000..5708d70
--- /dev/null
+++ b/tests/testthat.R
@@ -0,0 +1,4 @@
+library(testthat)
+library(bibliometRics)
+
+test_check("bibliometRics")
diff --git a/tests/testthat/test-biblio.R b/tests/testthat/test-biblio.R
new file mode 100644
index 0000000..c307144
--- /dev/null
+++ b/tests/testthat/test-biblio.R
@@ -0,0 +1,15 @@
+testthat::context("Just a simple test")
+
+
+testthat::test_that("bibliometric works", {
+
+ wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+ testthat::expect_silent({
+ bib <- read.wos(wos_file)
+ })
+ base = read.baselines()
+ discipline = 'GEOSCIENCES'
+ testthat::expect_silent({
+ bibliometric(bib, base, 'GEOSCIENCES')
+ })
+})
diff --git a/vignettes/.gitignore b/vignettes/.gitignore
new file mode 100644
index 0000000..097b241
--- /dev/null
+++ b/vignettes/.gitignore
@@ -0,0 +1,2 @@
+*.html
+*.R
diff --git a/vignettes/example_report.Rmd b/vignettes/example_report.Rmd
new file mode 100644
index 0000000..31d79e9
--- /dev/null
+++ b/vignettes/example_report.Rmd
@@ -0,0 +1,75 @@
+---
+title: "Vignette Title"
+author: "Vignette Author"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+ %\VignetteIndexEntry{Vignette Title}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+```{r setup, include = FALSE}
+knitr::opts_chunk$set(
+ collapse = TRUE,
+ comment = "#>"
+)
+```
+
+
+
+
+```{r bibdata, include=FALSE}
+library(bibliometRics)
+wos_file = system.file("sbegueria.txt", package = "bibliometRics")
+bib <- read.wos(wos_file)
+quant <- read.baselines()
+x <- bibliometric(bib, quant, 'GEOSCIENCES')
+publist <- apply(
+ cbind(bib$pubs,
+ citation_rank(bib, quant = quant$GEOSCIENCES)),
+ 1, format_pub, au =
+ bib$au)
+bib$ref
+bib$au
+```
+
+
+
+## Indicators
+
+```{r biblioplot, cairo-scatter, echo=FALSE, message=FALSE, warning=FALSE, fig.width=12, fig.height=4}
+biblioplot(bib, quant = quant$GEOSCIENCES)
+```
+
+
+## Table
+
+\begin{table}[h]
+\centering
+\begin{tabular}{llll}
+\hline
+Publications (total) & Pubs. (as lead author) & Years & Pubs. per year \\
+`r x$pubs` & `r x$lead` & `r x$years` & `r x$pubs_year` \\
+\hline
+Citations (total) & Citations per year & Citations per pub & Citations (highest pub) \\
+`r x$cit_tot` & `r x$cit_year` & `r x$cit_art` & `r x$cit_max` \\
+\hline
+h-index & h-index per year & g-index & g-index per year \\
+`r x$hin` & `r x$hin_year` & `r x$gin` & `r x$gin_year` \\
+\hline
+Pubs $>$q0.9 & Pubs $>$q0.9 (as lead) & i-score & i-score (as lead) \\
+`r x$pubs09` & `r x$pubs09_lead` & `r x$iscore` & `r x$iscore_lead` \\
+\hline
+\end{tabular}
+\end{table}
+
+
+# Publication list
+
+\begin{enumerate}
+`r publist`
+\end{enumerate}
+
+
+\end{document}