- 0.1.11 10mar2021
(Since 0.1.0):- bug fixes for encoding issues, diacritics
- handles moving of data with R package
haven
- improved error checking and handling
- adds codelist option to get available codes from
countrycode
- set default arguments for options
from
,to
andgen
- 0.1.0 16feb2019:
- first version of the command
This command uses rcall
to call R's countrycode
. It is a substitute for the kountry
package from SSC.
I'd like to thank the authors of both packages:
countrycode
was written by Vincent Arel-Bundock, Nils Enevoldsen, and CJ Yetman.rcall
was written by E. F. Haghish
There are a few advantages to using rcallcountrycode
relative to kountry
:
- it gets the functionalities of
countrycode
for R, which has a broader and more up-to-date coverage on country names in different formats and languages - it scales well with the size of the dataset. See the benchmark (Win7/Linux) for a comparison between
rcallcountrycode
andkountry
.- Because it asks R to convert only the unique strings in the dataset (which should not exceed the number of countries in the world in most use cases), applying it to a dataset of 200 or 200000 observations makes little difference. The current version of
kountry
does not scale well in large datasets. - In the current benchmark (v0.1.5), I add repeated country names 2000 times to a list of 196 countries.
rcallcountrycode
, which is much slower in the small dataset, only takes around 10%/50% (Win7/Linux) longer to run in the larger dataset.kountry
takes 1000+ times longer than it does in the small dataset.
- Because it asks R to convert only the unique strings in the dataset (which should not exceed the number of countries in the world in most use cases), applying it to a dataset of 200 or 200000 observations makes little difference. The current version of
Machine | Dataset | rcallcountrycode |
kountry |
---|---|---|---|
Win 7, 4-core 3.60GHz, 32GB RAM | 196 countries | 4.09 sec | 0.02 sec |
Win 7, 4-core 3.60GHz, 32GB RAM | 196 countries x 2000 | 4.54 sec | 17.88 sec |
Ubuntu, 2-core 2.20GHz, 16GB RAM | 196 countries | 1.44 sec | 0.01 sec |
Ubuntu, 2-core 2.20GHz, 16GB RAM | 196 countries x 2000 | 2.19 sec | 21.27 sec |
The main disadvantage is that rcallcountrycode
requires additional dependencies, while kountry
can be run directly after installing it from SSC without any additional work.
- Install R first (see below how)
- Install
rcall
with the method recommended by its author: install thegithub
package for Stata and then installrcall
:
net install github, from("https://haghish.github.io/github/") replace
gitget rcall
- Install
rcallcountrycode
:
github install luispfonseca/stata-rcallcountrycode
These steps should take care of all the dependencies automatically.
- Install R first (see below how)
- Install this package:
cap ado uninstall rcallcountrycode
local github "https://raw.githubusercontent.com"
net install rcallcountrycode, from(`github'/luispfonseca/stata-rcallcountrycode/master/)
- Make sure you install all the dependencies
For this command to work, you need the following:
You need to have R installed. You can download RStudio here, which will install R on your computer and give you a graphical interface.
If you are not using github install
to install rcallcountrycode
, you also need to install the countrycode
and haven
packages in R:
install.packages("countrycode")
install.packages("haven")
Install rcall
following the instructions in the page. The following commands currently work:
net install github, from("https://haghish.github.io/github/") replace
gitget rcall
Some commands from gtools
by Mauricio Caceres Bravo are used to speed up this command when available, but are not required. Follow the instructions in the link to install, especially if you are dealing with large datasets.
input str20 country
"portugal"
"united kingdom"
"france"
"italy"
"spain"
"germany"
"germany"
"italy"
"switzerland"
"curaçao"
"côte d'ivoire"
"namibia"
""
"not a real country"
end
compress
* standardize country names stored in a variable named country (both are equivalent)
rcallcountrycode country, gen(countryname_en)
rcallcountrycode country, from(country.name) to(country.name) gen(countryname_en)
* get the ISO2 country codes
rcallcountrycode country, from(country.name) to(iso2c) gen(iso2code)
* get the country names in german
rcallcountrycode country, from(country.name) to(country.name.de) gen(countryname_de)
* get list of available codes from R
rcallcountrycode codelist
- Provide better diagnostics for non-matches
Luís Fonseca
London Business School
lfonseca london edu
https://luispfonseca.com