‘DemografixeR’ allows to estimate gender, age & nationality from a name. The package is an API wrapper of all 3 ‘Demografix’ API’s - all three APIs supported in one package:
- https://genderize.io - Gender estimation based on a name
- https://agify.io - Age estimation based on a name
- https://nationalize.io - Nationality estimation based on a name
You can find all the necessary documentation about the package here:
You can install the CRAN release version of DemografixeR following this
R
command:
install.packages("DemografixeR")
You can also install the development version of DemografixeR following
these R
commands:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("matbmeijer/DemografixeR")
These are basic examples, which shows you how to estimate nationality, gender and age by a given name with & without specifying a country. The package takes care of multiple background tasks:
- API pagination
- Duplicated names (one request made per name)
- Missing values
- Workflow integration (e.g. with
dplyr
ordata.table
)
library(DemografixeR)
#Simple example without country_id
names<-c("Ben", "Allister", "Lucie", "Paula")
genderize(name = names)
#> [1] "male" "male" "female" "female"
nationalize(name = names)
#> [1] "AU" "ZA" "CZ" "PT"
agify(name = names)
#> [1] 48 44 24 50
#Simple example with
genderize(name = names, country_id = "US")
#> [1] "male" "male" "female" "female"
agify(name = names, country_id = "US")
#> [1] 67 46 65 70
#Workflow example with dplyr with missing values and multiple different countries
df<-data.frame(names=c("Ana", NA, "Pedro",
"Francisco", "Maria", "Elena"),
country=c(NA, NA, "ES",
"DE", "ES", "NL"), stringsAsFactors = FALSE)
df %>% dplyr::mutate(guessed_nationality=nationalize(name = names),
guessed_gender=genderize(name = names, country_id = country),
guessed_age=agify(name = names, country_id = country)) %>%
knitr::kable()
names | country | guessed_nationality | guessed_gender | guessed_age |
---|---|---|---|---|
Ana | NA | PT | female | 58 |
NA | NA | NA | NA | NA |
Pedro | ES | PT | male | 69 |
Francisco | DE | CL | male | 58 |
Maria | ES | CY | NA | 59 |
Elena | NL | CC | female | 69 |
#Detailed data.frame example:
genderize(name = names, simplify = FALSE, meta = TRUE) %>% knitr::kable()
name | type | gender | probability | count | api_rate_limit | api_rate_remaining | api_rate_reset | api_request_timestamp | |
---|---|---|---|---|---|---|---|---|---|
2 | Ben | gender | male | 0.95 | 77991 | 1000 | 959 | 46192 | 2020-05-14 11:10:07 |
1 | Allister | gender | male | 0.98 | 129 | 1000 | 959 | 46192 | 2020-05-14 11:10:07 |
3 | Lucie | gender | female | 0.99 | 85580 | 1000 | 959 | 46192 | 2020-05-14 11:10:07 |
4 | Paula | gender | female | 0.98 | 74130 | 1000 | 959 | 46192 | 2020-05-14 11:10:07 |
- This package is in no way affiliated to the Demografix ApS company, the owner of the ‘genderize.io’, ‘agify.io’ and ‘nationalize.io’ APIs.
- An open mind towards gender & gender diversity is promoted, warning that the results from the ‘genderize.io’ API reflect an oversimplification of gender identity, gender roles and the meaning of ‘gender’. For more information visit the active discussion in the following Wikipedia article.
Please note that the ‘DemografixeR’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.