The following table provides some quick translations of Stata commands into R. Since R supports multiple data sets, we need to specify a specific data set to manipulate when using data accessing/modifying commands. We use mydata as the default data set to target.
Stata | R | Description |
---|---|---|
insheet using "foo.csv", comma | mydata <- read.csv("foo.csv") | Read csv file |
cd "mydirectory" | setwd("mydirectory") | Change working directories |
reg y x1 x2 | summary(lm(y~x1+x2, data=mydata)) | Ordinary least squares with constant |
reg y x1 x2, nocon | summary(lm(y~x1+x2-1, data=mydata)) | Ordinary least squares without constant |
if (x==y) {...} | if (x==y) {...} | Initial line condition use to evaluate whether a command(s) should be exectuted |
reg y x if (x>0) | lm(y~x, data=mydata[mydata$x>0,]) | Select a conditional subset of data |
forvalues i=1/100 {...} | for (i in 1:100) {...} | Loop through integer values of i from 1 to 100 |
foreach i in "a" "b" "c" {...} | for (i in c("a","b","c")) {...} | Loop through a list of items |
di "Hello World" | print("Hello World") | Prints "hello world" on screen |
do "mydofile.do" | source("myRscript.R") | Call and run code file |
use "mydata.dta", clear | load("mydata.Rdata") | Load saved workspace/data |
save "mydata.dta", replace | save.image("mydata.Rdata") | Save current workspace/data |
di 2345^2 | 2345^2 | Calculate 2345 squared |
logit y x | summary(glm(y~x,data=mydata,family="binomial")) | Perform logit maximum likelihood estimation |
probit y x | summary(glm(y~x,data=mydata,family=binomial(link = "probit"))) | Perform probit maximum likelihood estimation |
sort x y | mydata[order(mydata$x, mydata$y),] | Sort the data frame by variable x |
cor x y | cor(x,y) | Produce a table of correlates between x and y |
help command | 1. ?command 2. help(command) |
Load the help file on a command |
edit | edit(mydata) | Open data editor window (not recommended) |
summarize | summary(mydata) | Provide summary values for data |
table x y | table(mydata$x,mydata$y) | Two way table |
hist x | hist(mydata$x) | Histogram of variable x |
scatter x y | plot x y | Scatter plot of x on y |
list | mydata | Print to screen all of the values of the data frame |
list in 1/5 | 1. head(mydata) 2. mydata[1:5,] |
Print to screen first 5 rows of data |
generate x2=x^2 | mydata$x2 <- mydata$x^2 | Create a new variable x2 which is the square of x |
replace x=y1+y2 | 1. mydata$x <- mydata$y1 + mydata$y2 2. mydata$x <- with(mydata, y1 + y2) |
Change the x value of data to be equal to y1+y2 |
for i=1/10 { di `i' } |
for (i in 1:10) print(i) | Print count from 1 to 10 |
replace x=0 if x<0 | mydata$x[mydata$x<0] <- 0 | Replace all values of x less than 0 with zero |
drop if x>100 | mydata <- mydata[!mydata$x>100,] | Drop observations with x greater than 100 |
keep if x<100 | mydata <- mydata[mydata$x<100,] | Keep observations with x less than 100 |
drop x | mydata$x <- NULL | Drop variable x from the data |
keep x | mydata <- mydata$x | Keep only x in the data |
append using "mydata2.dta" | mydata <- rbind(mydata, mydata2) | Append mydata2 to mydata |
merge 1:1 index using "mydata2.dta" | merge(mydata,mydata2,index) | Merge two data sets together by index variable(s) |
set obs 1000 gen x=rnormal() |
mydata$x <- rnorm(1000) | Generate 1000 random normal draws |
set obs 1000 gen x=runiform() |
mydata$x <- runif(1000) | Generate 1000 random uniform draws |
set obs 1000 gen x=rbinomial(10,.1) |
mydata$x <- rbinom(1000, 10, .1) | Generate 1000 random binomial (10,.1) draws |
count | nrow(mydata) | Count the number of observations in the data |
foreach v of varlist * { rename `v' `v'old } |
names(mydata) <- paste0(names(mydata),"old") | Rename all of the variables in the data ...old |
clear set obs 100 gen x=rnormal(100) gen y=x*2 + rnormal(100)*5 |
mydata<-data.frame(x=x<-rnorm(100), y=x*2 + rnorm(100)*5) | Simulate a new data set with y dependent upon x |