This chapter walks through a few basics of working with MLR models. The last section at the bottom, [Log transformations], asks you to create a few graphs, estimate a few models, and formally interpret a few coefficients.
You'll need to make sure you have the following packages installed:
library(tidyverse)
library(stargazer)
library(multcomp)
library(pander)
library(car)
library(ggplot2)
The dataset ceo.csv
has the following variables:
Variable | Description |
---|---|
salary |
1990 compensation in dollars ($) |
age |
age of CEO in years |
comten |
years CEO with company |
ceoten |
years as ceo with company |
sales |
1990 firm sales in $millions |
profits |
1990 profits in $millions |
mktval |
1990 market value in $millions |
Note that the table above has custom formatting applied by adding an HTML class name "variableTable" (via a "div" element that surrounds the table), formatting the table header (th) to be left-aligned and have an orange font so you can easily see what was affected.
<style> .sumstats td { text-align: right; color: green; } </style>The stargazer
package provides an easy way to display summary statistics. You need the result='asis'
code chunk option so that it displays the html table in the knit output file (and you need the type="html"
stargazer option to produce the html table). The warning=FALSE
and message=FALSE
keep it from displaying some of the messages it displays by default.
mydata <- read.csv("https://raw.githubusercontent.com/econ380w21/380data/main/ceosal.csv")
stargazer(mydata, type = "html",summary.stat = c("mean","sd", "min", "p25", "median", "p75", "max"))
Statistic | Mean | St. Dev. | Min | Pctl(25) | Median | Pctl(75) | Max |
salary | 865,864.400 | 587,589.300 | 100,000 | 471,000 | 707,000 | 1,119,000 | 5,299,000 |
age | 56.429 | 8.422 | 33 | 52 | 57 | 62 | 86 |
comten | 22.503 | 12.295 | 2 | 12 | 23 | 33 | 58 |
ceoten | 7.955 | 7.151 | 0 | 3 | 6 | 11 | 37 |
sales | 3,529.463 | 6,088.654 | 29 | 561 | 1,400 | 3,500 | 51,300 |
profits | 207.831 | 404.454 | -463 | 34 | 63 | 208 | 2,700 |
mktval | 3,600.316 | 6,442.276 | 387 | 644 | 1,200 | 3,500 | 45,400 |
In addition to providing you with an example of how to display a summary statistics table, this section also provides an example of how you can control aspects of how the table displays using HTML/CSS. In bookdown, the horizontal space is already modified for you, but if you use this in a regular RMD file that you knit, there will be no spacing at all between cells (so the numbers bump right up next to each other). To fix this, you can add padding (e.g., padding-left, padding-right). Often you want to right-align summary statistics. To do this, I've added an HTML class name "sumstats" to this section by including .sumstats in curly brackets in the section header (the part started with ##). I included CSS styling to align the text right. I also make the font green so that you could easily see what was modified.
Beyond simply formatting a table, this example, and the one above that uses a "div" and class name, are the two main ways you can control the formatting of specific parts of your book (without affecting the entire book).
Now back to econometrics...
Consider four models:
model1 <- lm(salary~sales+mktval+profits,data=mydata)
model2 <- lm(salary~sales+mktval+profits+age,data=mydata)
model3 <- lm(salary~sales+mktval+profits+ceoten,data=mydata)
model4 <- lm(salary~sales+mktval+profits+ceoten+age,data=mydata)
The package you should use for 380 to report the results of multiple models side-by-side is the stargazer
package. Each column is a separate model. If a variable does not have a coefficient displayed in a column, that means it was not included as an explanatory variable in that model.
stargazer(model1, model2, model3, model4,
type = "html",
report=('vc*p'),
keep.stat = c("n","rsq","adj.rsq"),
notes = "<em>*p<0.1;**p<0.05;***p<0.01</em>",
notes.append = FALSE)
Dependent variable: | ||||
salary | ||||
(1) | (2) | (3) | (4) | |
sales | 15.984 | 15.369 | 18.233* | 18.061 |
p = 0.152 | p = 0.169 | p = 0.100 | p = 0.106 | |
mktval | 23.831 | 23.842 | 21.157 | 21.231 |
p = 0.137 | p = 0.137 | p = 0.183 | p = 0.183 | |
profits | 31.703 | 28.100 | 48.636 | 47.528 |
p = 0.909 | p = 0.920 | p = 0.859 | p = 0.863 | |
age | 4,528.026 | 823.966 | ||
p = 0.353 | p = 0.873 | |||
ceoten | 12,730.860** | 12,390.430** | ||
p = 0.026 | p = 0.042 | |||
Constant | 717,062.400*** | 464,424.700* | 613,958.800*** | 570,743.300** |
p = 0.000 | p = 0.093 | p = 0.000 | p = 0.042 | |
Observations | 177 | 177 | 177 | 177 |
R2 | 0.178 | 0.182 | 0.201 | 0.202 |
Adjusted R2 | 0.163 | 0.163 | 0.183 | 0.178 |
Note: | *p<0.1;**p<0.05;***p<0.01 |
Just like for the summary statistics table, you need the result='asis'
code chunk option so that it displays the html table in the knit output file (and you need the type="html"
stargazer option to produce the html table), and you want the warning=FALSE
and message=FALSE
code chunk options to keep it from displaying some of the messages it displays by default.
For displaying regression results for 380, please use the report=('vc*p')
stargazer option to display p-values and the keep.stat = c("n","rsq","adj.rsq")
stargazer option to display the sample size (it's important to know the sample size for each model so you can see if it changes, which happens if some variables have missing values), r-squared, and adjusted r-squared.
The final two options (notes and notes.append) fix a bug in stargazer. The table is supposed to display a note that explains the stars (asterisks) used to denote statistical significance (one star means statistically significant at the 10% level, two stars mean statistically significant at the 5% level, and 3 stars mean statistically significant at the 1% level). However, asterisks are also how you make words italic and bold in RMarkdown, and that's what happens in the table footnote (try removing those options and building to see what I mean). The solution is to use the character code 42 (*) instead. So, just use the options above and it should work correctly.^[Credit to this Stack Exchange answer for this fix.] This option also highlights why you often want to have easily-accessible examples. You're probably not going to remember all that to type it yourself, but if you have this chapter in your BP, you can easily search for stargazer, find this example, and copy/paste into your project or wherever you want to report results.
Recall the definition of
The denominator measures the total variation in the
Adjusted
Look at the models above. All four models include measures of the company, including sales, market value, and profits. Models 2-4 add variables measuring characteristics of the CEO. Compare models 1 and 2. Adding the CEO's age increases age
coefficient is 0.352, meaning that the effect of the CEO's age on salary is not statistically significant at any standard level of significance, providing further evidence that including age does not improve the model).
Comparing models 1 and 3, both ceoten
coefficient is 0.0251, meaning that the effect of CEO's tenure on salary is statistically significant at the 5% level. The increase in adjusted
Comparing models 3 and 4, we again see that adding the CEO's age does not improve the model;
Sometimes two variables are both important in determining
model5 <- lm(salary~sales+ceoten,data=mydata)
model6 <- lm(salary~sales+ceoten+mktval,data=mydata)
model7 <- lm(salary~sales+ceoten+profits,data=mydata)
model3 <- lm(salary~sales+ceoten+mktval+profits,data=mydata)
stargazer(model5, model6, model7, model3,
type = "html",
report=('vc*p'),
keep.stat = c("n","rsq","adj.rsq"),
notes = "<em>*p<0.1;**p<0.05;***p<0.01</em>",
notes.append = FALSE,
model.numbers = FALSE,
column.labels = c("(5)","(6)", "(7)", "(3)"))
Dependent variable: | ||||
salary | ||||
(5) | (6) | (7) | (3) | |
sales | 37.801*** | 19.019* | 19.667* | 18.233* |
p = 0.00000 | p = 0.061 | p = 0.076 | p = 0.100 | |
ceoten | 13,925.610** | 12,703.360** | 13,296.830** | 12,730.860** |
p = 0.016 | p = 0.025 | p = 0.020 | p = 0.026 | |
mktval | 23.400** | 21.157 | ||
p = 0.015 | p = 0.183 | |||
profits | 341.028** | 48.636 | ||
p = 0.041 | p = 0.859 | |||
Constant | 621,670.900*** | 613,436.100*** | 619,800.500*** | 613,958.800*** |
p = 0.000 | p = 0.000 | p = 0.000 | p = 0.000 | |
Observations | 177 | 177 | 177 | 177 |
R2 | 0.173 | 0.201 | 0.193 | 0.201 |
Adjusted R2 | 0.164 | 0.187 | 0.179 | 0.183 |
Note: | *p<0.1;**p<0.05;***p<0.01 |
Following similar logic to our discussion above about using adjusted mktval
) and profits. Comparing models 5 and 6, adjusted mktval
is statistically significant at the 5% level (p-value of 0.0146). Thus, market value should be added to the model to join sales and CEO tenure in explaining CEO salary. Similarly, comparing models 5 and 7, adjusted profits
is statistically significant at the 5% level (p-value of 0.0402), indicating that profits should be added to the model to join sales and CEO tenure in explaining CEO salary.
However, look at what happens when we include both market value (mktval
) and profits. Comparing models 7 and 3, adjusted mktval
and profits
are not statistically significant at any standard level of significance in model 3 (p-values are 0.1821 and 0.859, respectively).
How can adding mktval
improve model 5 and adding profits
improve model 5, but adding both doesn't? Often this happens when there is correlation between explanatory variables. Sometimes this happens when several explanatory variables essentially measure the same concept. If the variables are very similar measures of the same underlying concept, it often makes sense to include only one or the other (sometimes papers will include "robustness checks" that show results are similar if you use alternative measures).
Sometimes, however, variables measure different concepts but are still highly correlated with each other. Correlation between explanatory variables is called multicollinearity. When we do a regression, we're looking at variation in the explanatory variables and how that explains variation in the outcome. When two explanatory variables tend to move together (i.e., are correlated), it's hard for OLS to tell which explanatory variable is driving the changes in the outcome. Pretend the index finger on your left hand represents
Formally, we can measure this relationship between explanatory variables with a regression. Here is the regression of profits on market value:
mktvalAndProfits <- lm(profits~mktval,data=mydata)
pander(summary(mktvalAndProfits))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3038 13.85 0.02194 0.9825
Observations Residual Std. Error
177 160.7 0.843 0.8421
Table: Fitting linear model: profits ~ mktval
The
The Variance Inflation Factor (VIF) is a common measure used to gauge how much multicollinearity might be affecting the precision with which we are able to measure a particular coefficient. The VIF is one part of the equation used to calculate the standard error of a coefficient.^[See Wooldridge section 3.4 for a full discussion of the variance of OLS coefficients, and section 3.4a for a discussion of multicollinearity, including the VIF. For these explanations I am assuming homoskedasticity, but the same concepts and intuition apply when there is heteroskedasticity.]
Consider the general regression equation: $$ y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots + \beta_k x_{ki} + u_i $$
For variable
Consider two extreme cases. First, if there is no variation in
At the other end if the spectrum is the best case for precisely estimating a coefficient
The only other term in
Let's examine the VIF for our specific example. In the example above, we focused on just two variables, profits and market value, but the regression in question (model 3) also includes sales and CEO tenure. Thus, we want to know how much of the variation in market value can be explained by variation in the other explanatory variables. We find this via this regression:
modelProfits <- lm(profits~sales+ceoten+mktval,data=mydata)
pander(summary(modelProfits))
Estimate Std. Error t value Pr(>|t|)
(Intercept) -10.75 18.19 -0.5907 0.5555
**sales** 0.01615 0.002805 5.76 3.775e-08
ceoten -0.5654 1.567 -0.3608 0.7187
Observations Residual Std. Error
177 147.7 0.8689 0.8666
Table: Fitting linear model: profits ~ sales + ceoten + mktval
The
0 | 1 |
0.1 | 1.111 |
0.2 | 1.25 |
0.3 | 1.429 |
0.4 | 1.667 |
0.5 | 2 |
0.6 | 2.5 |
0.7 | 3.333 |
0.8 | 5 |
0.9 | 10 |
0.99 | 100 |
0.999 | 1000 |
The VIF value of 10 corresponds with
So should we only include one or the other? Not necessarily. Both market value and profits affect salary. If we leave one out, we'll have omitted variable bias. Look at how the coefficient on profits changes between models 7 and 3: it falls by 292 (from 341 down to 48.63609). However, due to the multicollinearity in model 3, we get very imprecise estimates of mktval
and profits
. Multicollinearity often makes it difficult to estimate effects precisely, but if we don't include both variables, we introduce omitted variable bias. A larger sample size might help,^[In fact, the
A common approach is to include several models for comparison, both with and without the variables (like the table above with models 5, 6, 7, and 3). Then, do a formal test of whether the two variables are jointly significant in determining linearHypothesis
function from the car
package:
jointTest <- linearHypothesis(model3, c("mktval=0", "profits=0"))
jointTest
## Linear hypothesis test
##
## Hypothesis:
## mktval = 0
## profits = 0
##
## Model 1: restricted model
## Model 2: salary ~ sales + ceoten + mktval + profits
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 174 5.0244e+13
## 2 172 4.8526e+13 2 1.7174e+12 3.0436 0.05024 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is 0.0502. The interpretation is the same as for other hypothesis tests. Here, we reject the null hypothesis at the 10% level. This provides evidence that market value and profits together are important in determining salary.
You can see section 4-5a in Wooldridge 6e for more on joint hypothesis tests. This test can be beneficial when you have a situation where you have reason to believe two (or more) variables are important in determining
It's common for people to look at regression results and make comparisons between coefficients. For example, looking at models 6 and 3 above, the coefficient on market value is larger than the coefficient on sales. Does this mean market value is more important that sales in terms of its effect on salary? To see how we answer this question, let's consider model 3 for the rest of this section, reprinted here for convenience:
pander(summary(model3))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 613959 65486 9.375 3.95e-17
**sales** 18.23 11.01 1.656 0.09949
ceoten 12731 5636 2.259 0.02515
mktval 21.16 15.79 1.34 0.1821
Observations Residual Std. Error
177 531159 0.2014 0.1828
Table: Fitting linear model: salary ~ sales + ceoten + mktval + profits
There's two main things to keep in mind when attempting to compare coefficients in the same model. First, pay attention to the units. In this case, both market value and sales are measured in millions of dollars, so the coefficients are both in terms of a $1 million increase. Thus, the comparison is between a $1 million increase in market value versus a $1 million increase in sales, which increase CEO salary by $18.23 and $21.16, respectively (holding constant the other explanatory variables). This is an apples to apples comparison that is easy to understand. Be extra careful that the scale is the same. Here, the scale is the same because both are in millions, but watch out for when one variable is measured in millions and another is measured in thousands, etc. If the variables have different units, be extra clear about what you are comparing. For example, if you were trying to compare sales to CEO tenure, you'd be comparing the effects on salary of an additional $1 million in sales and 1 additional year of tenure as CEO.
The other thing to keep in mind when comparing coefficients is statistical significance. A coefficient could be larger by chance, so we need to run a hypothesis test to see if we have evidence to believe the difference is not just occurring randomly. We need to do a hypothesis test on the null hypothesis that $\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)=0$ versus the alternative that it does not. Formally, our test statistic is: $$ t=\frac{\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)-0}{\text{SE}\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)} $$
Thus, we need to calculate not just the difference in coefficients, $\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)$, but also the standard error of the difference, $\text{SE}\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)$. To calculate the standard error of the difference, we need to know not just the standard error of $\hat{\beta}{profit}$ and $\hat{\beta}{sales}$, but also the correlation between the two estimates (if both coefficients tend to be big together and small together, the coefficients tend to stay closer together, making the difference smaller, but if they are negatively correlated, one tends to be bigger when the other one is smaller and the difference tends to be larger). Thus, calculating $\text{SE}\left(\hat{\beta}{profit}-\hat{\beta}{sales}\right)$ requires the full variance-covariance matrix. Instead, we'll use the glht
function from the multcomp
package to perform the test for us:
testOfDif <- glht(model3, linfct = c("profits - sales = 0"))
summary(testOfDif)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = salary ~ sales + ceoten + mktval + profits, data = mydata)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## profits - sales == 0 30.4 278.0 0.109 0.913
## (Adjusted p values reported -- single-step method)
We can interpret the p-value just like the p-value for a hypothesis test of a single coefficient. The p-value is 0.9130323, so we fail to reject the null hypothesis that the two coefficients are the same at all standard levels of significance. In other words, we can't reject that the difference is due to chance. We could also form a confidence interval. Here's the 90% confidence interval for the difference in coefficients:
confint(testOfDif,level = 0.9)
##
## Simultaneous Confidence Intervals
##
## Fit: lm(formula = salary ~ sales + ceoten + mktval + profits, data = mydata)
##
## Quantile = 1.6538
## 90% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## profits - sales == 0 30.4026 -429.2872 490.0925
This 90% confidence interval contains 0; thus, we fail to reject the null hypothesis that the difference is 0 at all standard levels of significance.
So you can see what it looks like when we reach the opposite conclusion, here's the same test for the difference between the coefficient on ceoten
and the coefficient on sales
:
testOfDif2 <- glht(model3, linfct = c("ceoten - sales = 0"))
summary(testOfDif2)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: lm(formula = salary ~ sales + ceoten + mktval + profits, data = mydata)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## ceoten - sales == 0 12713 5635 2.256 0.0253 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
confint(testOfDif2)
##
## Simultaneous Confidence Intervals
##
## Fit: lm(formula = salary ~ sales + ceoten + mktval + profits, data = mydata)
##
## Quantile = 1.9739
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## ceoten - sales == 0 12712.6292 1590.0204 23835.2380
The p-value is 0.0253289, so we reject the null hypothesis that the coefficients are the same at the 5% level. In other words, we have evidence (at the 5% level) supporting the claim that the effect on CEO salary of an additional year of CEO tenure is larger than the effect of an additional $1 million in sales, holding constant market value and profits.
If you include formal hypothesis tests like those just discussed in the [Joint hypotheses test] and [Testing linear combinations of parameters] sections in a paper, you will typically not include the complete test output. Instead, you'll just say what the test is, report the p-value, and explain the conclusion of the test. The detailed output of the test is included here as part of the BP because it is useful to see all the output as a reference later.
This section has several things you need to do (and after you complete everything you need to do in this section, you can submit this chapter of your BP).
This data provides a good example of why visualizing your data can be helpful. By visualizing the data using histograms and scatter plots, it is easy to see how skewed the data is and how log transformations can help center it. You will focus on two variables, sales and salary. You will first make histograms and then make scatter plots showing the relationship between these two variables.
Make four histograms:
-
salary
-
log(salary)
-
sales
-
log(sales)
You can format them however you'd like, but make sure they are clearly labeled.
salary
salary_histogram <- ggplot(mydata, aes(salary)) + geom_histogram()
salary_histogram
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
log(salary)
log_salary_histogram <- ggplot(mydata, aes(log(salary))) + geom_histogram()
log_salary_histogram
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
sales
sales_histogram <- ggplot(mydata, aes(sales)) + geom_histogram()
sales_histogram
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
log(sales)
log_sales_histogram <- ggplot(mydata, aes(log(sales))) + geom_histogram()
log_sales_histogram
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
You can also see how log transformations spread out the data by looking at scatter plots of salary
or log(salary)
(y) versus sales
or log(sales)
corresponding with the four different combinations. Specifically, create four scatter plots:
-
salary
(y) vs.sales
(x) -
log(salary)
(y) vs.sales
(x) -
salary
(y) vs.log(sales)
(x) -
log(salary)
(y) vs.log(sales)
(x)
Also add the following to each of the four plots:
-
A best fit (geom_smooth) "lm" line.
-
Just to get a bit more experience with data visualization, color the points by
ceoten
using whatever scale_color_gradient (or scale_color_gradientn) you'd like.
salary
(y) vs.sales
(x)
salary_sales <- ggplot(mydata, aes(sales, salary, color = ceoten)) + geom_point() + geom_smooth(method = "lm", se=FALSE)
salary_sales
## `geom_smooth()` using formula 'y ~ x'
log(salary)
(y) vs.sales
(x)
log_salary_sales <- ggplot(mydata, aes(sales, log(salary), color = ceoten)) + geom_point() + geom_smooth(method = "lm", se=FALSE) + scale_color_gradient2(high = "black")
log_salary_sales
## `geom_smooth()` using formula 'y ~ x'
salary
(y) vs.log(sales)
(x)
salary_log_sales <- ggplot(mydata, aes(log(sales), salary, color = ceoten)) + geom_point() + geom_smooth(method = "lm", se=FALSE) + scale_color_gradient2(high = "red")
salary_log_sales
## `geom_smooth()` using formula 'y ~ x'
log(salary)
(y) vs.log(sales)
(x)
salary_log_sales <- ggplot(mydata, aes(log(sales), log(salary), color = ceoten)) + geom_point() + geom_smooth(method = "lm", se=FALSE) + scale_color_gradient2(high = "blue")
salary_log_sales
## `geom_smooth()` using formula 'y ~ x'
To make sure you know how to estimate models and display the results using stargazer, you should estimate the following 4 models and display the results.
-
model 3 (the same one discussed above...either version is fine)
-
model 3, but with
log(salary)
instead ofsalary
-
model 3, but with
log(sales)
instead ofsales
-
model 3, but with
log(salary)
instead ofsalary
andlog(sales)
instead ofsales
Display all four models in the same stargazer table. And yes, this just requires you to copy/paste what you can find above in this file, but part of the point is to make sure you know how to use the above examples and modify them as needed.
Finally, to demonstrate that you understand how to formally interpret regression coefficients, you will type formal interpretations of coefficients. To make sure you understand what is required of you when formally interpreting regression coefficients for Econ 380, you should use "LN3.3 (MLR Interpretations).pptx", "Rules for Interpreting Regression Results.pdf", "Interpretations Rubric (all 4 level log cases).pdf", and the example PDFs posted in Moodle.
The interpretations you type must include formal interpretation of coefficients that include the following combinations of level/log:
-
One coefficient where
$y$ is a level and$x$ is a level -
One coefficient where
$y$ is a log and$x$ is a level -
One coefficient where
$y$ is a level and$x$ is a log -
One coefficient where
$y$ is a log and$x$ is a log
Your interpretations must also include:
-
One coefficient that is statistically significant at the 1% level
-
One coefficient that is statistically significant at the 5% level
-
One coefficient that is statistically significant at the 10% level
-
One coefficient that is not statistically significant at any standard level of significance
Note that you can combine these where possible (i.e., one interpretation can be both a log y / level x and a particular level of significance) so you don't need to type out 8 separate interpretations. I don't believe you can do it with only 4 interpretations, but I believe you can do it with 5.
Please say which model you're looking at for each interpretation.
-
model 3 (the same one discussed above...either version is fine)
-
model 3, but with
log(salary)
instead ofsalary
-
model 3, but with
log(sales)
instead ofsales
-
model 3, but with
log(salary)
instead ofsalary
andlog(sales)
instead ofsales
model3 <- lm(salary~sales+mktval+profits+ceoten,data=mydata)
model3_logsalary <- lm(log(salary)~sales+mktval+profits+ceoten,data=mydata)
model3_logsales <- lm(salary~log(sales)+mktval+profits+ceoten,data=mydata)
model3_logsalarylogsales <- lm(log(salary)~log(sales)+mktval+profits+ceoten,data=mydata)
stargazer(model3, model3_logsalary, model3_logsales, model3_logsalarylogsales,
type = "html",
report=('vc*p'),
keep.stat = c("n","rsq","adj.rsq"),
notes = "<em>*p<0.1;**p<0.05;***p<0.01</em>",
notes.append = FALSE,
model.numbers = FALSE,
column.labels = c("level-level","log-level", "level-log", "log-log"))
Dependent variable: | ||||
salary | log(salary) | salary | log(salary) | |
level-level | log-level | level-log | log-log | |
sales | 18.233* | 0.00003** | ||
p = 0.100 | p = 0.022 | |||
log(sales) | 124,542.800*** | 0.193*** | ||
p = 0.0004 | p = 0.00000 | |||
mktval | 21.157 | 0.00002 | 19.715 | 0.00002 |
p = 0.183 | p = 0.276 | p = 0.200 | p = 0.314 | |
profits | 48.636 | 0.00002 | 21.408 | -0.00004 |
p = 0.859 | p = 0.935 | p = 0.932 | p = 0.873 | |
ceoten | 12,730.860** | 0.011* | 12,594.850** | 0.011** |
p = 0.026 | p = 0.056 | p = 0.023 | p = 0.042 | |
Constant | 613,958.800*** | 13.240*** | -210,327.500 | 11.957*** |
p = 0.000 | p = 0.000 | p = 0.385 | p = 0.000 | |
Observations | 177 | 177 | 177 | 177 |
R2 | 0.201 | 0.206 | 0.247 | 0.312 |
Adjusted R2 | 0.183 | 0.187 | 0.229 | 0.296 |
Note: | *p<0.1;**p<0.05;***p<0.01 |
- One coefficient where
$y$ is a level and$x$ is a level & One coefficient that is statistically significant at the 5% level
From model3
: Holding the company’s sales, the company’s market value, and the company's profits constant, if the CEO’s tenure as CEO at the company increases by 1 year, the CEO’s salary is predicted to increase by $12,730.86, on
average. This is statistically significant at the 5% level.
- One coefficient where
$y$ is a log and$x$ is a level & One coefficient that is statistically significant at the 10% level
From model3_logsalary
: Holding the company’s sales, the company’s market value, and the company's profits constant, if the CEO’s tenure as CEO at the company increases by 1 year, the CEO’s salary is predicted to increase by 1.1%, on
average. This is statistically significant at the 10% level.
- One coefficient where
$y$ is a level and$x$ is a log & One coefficient that is statistically significant at the 1% level
From model3_logsales
: Holding the company’s market value, the company's profits, and the CEO's tenure constant, if the company's sales increases by 1%, the CEO’s salary is predicted to increase by $1245.428, on
average. This is statistically significant at the 1% level.
- One coefficient where
$y$ is a log and$x$ is a log
From model3_logsalarylogsales
: Holding the company’s market value, the company's profits, and the CEO's tenure constant, if the company's sales increases by 1%, the CEO’s salary is predicted to increase by 0.193%, on
average. This is statistically significant at the 1% level.
- One coefficient that is not statistically significant at any standard level of significance
From model3_logsalarylogsales
: Holding the company’s sales, the company's profits, and the CEO's tenure constant, if the company's market value increases by 1%, the CEO’s salary is predicted to increase by 0.00002%, on
average. This is not statistically significant at any standard level.