The code and data in this replication package reproduces all tables and figures in the paper, including the analysis of 10 recent papers in the AER using arcsinh(Y) in Section 2.3, and the empirical illustrations in Section 5. All of the data used in the paper come from publicly-available replication packages on the AEA webpage.
As mentioned above, all of the data used in the paper are from replication packages downloaded from the AEA website.
- I certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
- I certify that the author(s) of the manuscript have documented permission to redistribute/publish the data contained within this replication package.
Replication code/data for Logs with Zeros? Some Problems and Solutions by Jiafeng Chen and Jonathan Roth is licensed under CC BY 4.0
- All data are publicly available.
- Some data cannot be made publicly available.
- No data can be made publicly available.
The _output/
directory contains the data from the 10 papers replicated in Section 2.3. The replication packages were are all downloaded from the AEA website for the relevant papers in the AER, and the code was lightly modified to save the data used for the relevant specification. The replication packages for the 10 papers are contained in replications.zip
.
The sequeira_2016/
directory contains the replication package from Sequeira (2016), also downloaded from the AEA website, which is used in Section 5.
The replication code uses a combination of R and Stata. Package installation is handled in the scripts r-package-setup.R
and stata-package-setup.do
.
The code was last run on a 2019 Macbook Pro with 64GB of RAM running MacOS Version 11.6, using StataSE 17 and R 4.3.1. Approximate runtime is 1 hour for the main script, and an additional 1 hour if re-running the original replication files.
- The bash script
main.sh
runs all of the programs to reproduce the analysis. It calls the following programs:r-package-setup.R
andstata-package-setup.do
install the packages needed in R and Statarecalculate-with-rescaled-outcomes.do
runs the analyses in the 10 papers studied in Section 2.3 and assesses sensisitvity to units. This script calls the files titledpaper1.do
,...,paper10.do
, which run the code from the original papers.summary-tables.R
produces the tables and figures in Section 2.3 based on the analysis of the 10 paperscarranza-application.R
,sequeira-2016-application.R
, andberkouwer-dean-application.R
produce the results for the three applications in Section 5
- The bash script
replications/run_all.sh
re-runs all of the code from the 10 replication packages used for the analysis in Section 2.3. (This is not necessary to run to reproduce the main results, as the results of the replications are already saved in the folder_output/
)
- The bash script
main.sh
should run all of the code needed to reproduce the files in the paper on Unix-based systems such as Mac or Linux (or in PowerShell in Windows).- Before running the
main.sh
script, ensure that Stata and R are added to the PATH. For example, if the Stata application is located in/Applications/Stata/StataMP.app/Contents/MacOS/
runexport PATH=$PATH:/Applications/Stata/StataMP.app/Contents/MacOS/
in the Terminal. - The
main.sh
script assumes thatstataME
is installed. If the user has a different vintage of Stata, e.g.stataSE
, simply replace all occurrences ofstataME
withstataSE
inmain.sh
. - Users who do not wish to run
main.sh
can also sequentially run all of the R and Stata files listed inmain.sh
- Before running the
- The bash script
main.sh
takes the output from the original replication packages stored in the_output/
directory as given. Users who wish to reproduce this output from the original replication files can unzipreplications.zip
and then run the bash scriptreplications/run_all.sh
.- As with
main.sh
, please ensure that Stata/R are added to the path, and that the script references the appropriate vintage of Stata. - Running
replications/run_all.sh
will produce output toreplications/_output
. We copied the output from there to the folder_output/
in the main replication directory, which is referenced in the main code above.
- As with
The provided code reproduces:
- All numbers provided in text in the paper
- All tables and figures in the paper
- Selected tables and figures in the paper, as explained and justified below.
Below is a list of the tables and figures in the main text, and where they are reproduced. (Table 2 contains no data and thus is not listed).
Figure/Table # | Program | Line Number | Output file | Note |
---|---|---|---|---|
Table 1 | summary-tables.R | 47 | tables/change-from-a100-table.tex | |
Table 3 | carranza-application.R | 37 | tables/carranza-asinh-results.tex | |
Table 4 | carranza-application.R | 162 | Tables/carranza-poisson.tex | |
Table 5 | carranza-application.R | 118 | tables/carranza-lee-bounds.tex | |
Table 6 | sequeira-2016-application.R | 255 | tables/sequeira-2016-poisson.tex | |
Table 7 | sequeira-2016-application.R | 297 | tables/sequeira-2016-log0-x.tex | |
Figure 1 | summary-tables.R | 101 | figures/change-from-a100-actual-vs-predicted.eps | |
Figure 2 | sequeira-2016-application.R | 312 | figures/sequeira-density.eps |