Skip to content

A set of functions that politely scrapes Pro Football Reference IDs and other data from PFR index pages

License

Notifications You must be signed in to change notification settings

friscojosh/pfrscrapr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pfrscrapr

The pfrscrapr package is a set of functions that politely scrapes Pro Football Reference IDs and other data from PFR index pages.

A dependency is RSelenium. You can learn about how to set up RSelenium here.

Currently the function expects a Selenium server to be running locally on your machine on port 4445. In the future I would like to update the code to accommodate spinning up a Selenium server on demand.

Installation

You can install the development version of pfrscrapr from GitHub with:

# install.packages("devtools")
devtools::install_github("friscojosh/pfrscrapr")

Examples

You can scrape the NFL Season By Season Passing page

library(pfrscrapr)
passing <- get_league_passing_by_year()

You can load all IDs

pfrscrapr::load_ids()

You can get the PFR ID for a single player by converting their name to all lowercase and removing punctuation and suffixes:

pfrscrapr::scrape_player_id("joe montana")
#> ℹ Selenium browser loaded
#> ✔ Selenium browser loaded [855ms]
#> 
#> ℹ joe montana scraped with MontJo01 returned
#> ✔ joe montana scraped with MontJo01 returned [17ms]
#> 
#> [1] "MontJo01"

Or you can get the IDs of all players with a surname beginning with a letter:

pfrscrapr::scrape_pfr_player_ids_by_letter("A")
#> ℹ Starting PFR scrape of players with surnames beginning with A
#> ✔ Starting PFR scrape of players with surnames beginning with A [15.4s]
#> 
#> ℹ Letter A scrape complete
#> ✔ Letter A scrape complete [17ms]
#> 
#> # A tibble: 878 × 3
#>    name                                    ids        letter
#>    <chr>                                   <chr>      <chr> 
#>  1 Isaako Aaitui (NT) 2013-2013            AaitIs00   A     
#>  2 Joe Abbey (E) 1948-1949                 AbbeJo20   A     
#>  3 Fay Abbott (BB-FB-TB-QB-WB-E) 1921-1929 AbboFa20   A     
#>  4 Vince Abbott (K) 1987-1988              abbotvin01 A     
#>  5 Jared Abbrederis (WR) 2015-2017         AbbrJa00   A     
#>  6 Duke Abbruzzi (HB-DB) 1946-1946         AbbrDu20   A     
#>  7 Mehdi Abdesmad (DE) 2016-2016           AbdeMe00   A     
#>  8 Karim Abdul-Jabbar (RB) 1996-2000       AbduKa00   A     
#>  9 Isa Abdul-Quddus (S) 2011-2016          AbduIs00   A     
#> 10 Ameer Abdullah (RB) 2015-2023           AbduAm00   A     
#> # ℹ 868 more rows

Finally you can scrape all IDs and other data. This takes a long time – at least three minutes. The total number of requests to PFR is low however: just 26, one for each letter in the alphabet. The rest of the time is spent running regular expressions. This scape also gives the positions each player has played, which is useful for matching.

# do not run
# pfrscrapr::get_pfr_ids()

About

A set of functions that politely scrapes Pro Football Reference IDs and other data from PFR index pages

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages