Skip to content

This study compares incomes of unmarried men and women under equal conditions.

Notifications You must be signed in to change notification settings

feelosophy13/incomeGenderAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis of Income and Gender

Introduction

On September 17, 2013, the Huffington Post published a controversial article titled Women Still Earned 77 Cents On Men's Dollar In 2012: Report. The article drew a multitude of criticisms from people (mostly men) who strongly voiced that gender discrimination didn't exist or that the income disparity arose from various factors, such as women working less hours, women taking more leaves, their unwillingness to take on more lucrative but also demanding jobs (long hours, physically intensive, irregular schedule), and their inferior salary negotiation skills.

While I didn't personally think that women had "inferior negotiation skills" than men, I was interested in the topic and decided to conduct my own data analysis and see if income gap existed between men and women when compared under equal conditions. For the purpose of this analysis, I decided to compare individuals who have never married.

Methods

Tool

This analysis was performed using the R programming language.

About the Data

The AdultUCI dataset in arules package in R contains information about individuals' information from census data. Information about individuals include age, education, marital status, occupation, relationship, race, sex, work hours per week, native country, income, etc. Income is classified as either 'small' (less than $50K/year salary) or 'large' (greater than or equal to $50K/year).

CRAN package URL: http://cran.r-project.org/web/packages/arules/index.html

** Note that the dataset does not provide the actual salaries of the sample individuals. Instead, it only reports whether the individuals' income is 'large' (greater than or equal to $50K/year) or 'small' (less than $50K/year).

Processing the Data

AdultUCI data (a processed version of census income data) was further processed for the following reasons:
  1. To eliminate any rows with missing values.
  2. To rename the column names by removing - characters and replacing with _ characters.
  3. To add a new column, age_group (in addition to the age column).

The final dataset for the analysis was stored inside the data folder as incomeData.RData.

The script used to process the AdultUCI data can be seen here.

Individuals' Data Selection: Subsetting

For the purpose of this analysis, I decided to analyze individuals who have never been married before.

Below R code was used to load the pre-processed AdultUCI data and subset males and females who have never been married.

> load('./data/incomeData.RData')  # load pre-processed data
> m.never_married <- subset(incomeData, sex=='Male' & marital_status=='Never-married')  # select all males who never married
> f.never_married <- subset(incomeData, sex=='Female' & marital_status=='Never-married')  # select all females who never married
> mf.never_married <- rbind(m.never_married, f.never_married)  # select all males and females who never married

Statistical Methods Used

  • Chi-square test
  • Fisher's exact test
  • Mann-Whitney U test
  • Levene's test

Results

Note: "Men and women" from here and on refer to individuals who have never married.

Sample sizes

After subsetting for individuals who have never married, there were more men (5414) than women (4312) in the dataset.
![Alt text](./images/individuals_who_never_married_by_income_and_gender.png)

Age difference between genders

There didn't appear to be any significant age difference between men and women in the dataset.

Average age of men who never married: 28.4
Average age of women who never married: 28.5
Median age of men who never married: 26
Median age of women who never married: 25

Alt text
Alt text

Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the median age difference between men and women was significant. The test rendered a p-value of 0.162 (greater than 0.05), which allowed me to retain the null hypothesis and conclude that there exists no statistically significant difference in average age between men and women.

Work hours per week by gender

There appeared to be some noteable difference in the number of work hours per week between men and women.

Average work hours per week for men who have never married: 38.7
Average work hours per week for women who have never married: 35.3
Median work hours per week for men who have never married: 40
Median work hours per week for women who have never married: 40

Alt text

Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the difference between men and women's work hours per week was statistically significant. The test rendered a p-value less than 2.2e-16 (much, much less than 0.05), which allowed me to conclude that there exists a statistically significant difference in the number of work hours per week between men and women (that men tend to work longer hours than women).

Education by gender

There appeared to be some noteable difference in the average number in school between men and women.

Average number of years in school for men who have never married: 9.8
Average number of years in school for women who have never married: 10.3
Median number of years in school for men who have never married: 10
Median number of years in school for women who have never married: 10

Mann-Whitney U test (a non-parametric median comparison test) was performed to see if the difference between men and women's number of years in school was statistically significant. The test rendered a p-value less than 2.2e-16 (much, much less than 0.05), which allowed me to conclude that there exists a statistically significant difference in the number of years in school between men and women (and that women tend to stay longer in school).

Occupation by gender

![Alt text](./images/individuals_who_never_married_by_occupation_type_and_gender.png)

Comparing men and women under fixed conditions I

Men and women's income levels (large or small) were counted after fixing the following variables:
  • age group
  • number of work hours per week
  • occupational field
  • educational background (highest education)
Individuals in executive/managerial field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_exec-managerial_field_who_never_married_and_work_40hpw.png)
(a) (b) (c) (d)
p = 0.439 p = 1 p = 1 p = 1
Female Male Female Male Female Male Female Male
small 17 23 19 14 11 12 10 5
large 1 0 0 0 0 1 0 0

All four sub-groups (a, b, c, d) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in executive/managerial field who never married, worked 40 hours per week, and:

  • were in their 20s and had attended college (no degree), p > 0.05, (a)
  • were in their 20s and had graduated from high school, p > 0.05, (b)
  • were in their 30s and had attended college (no degree), p > 0.05, (c)
  • were in their 30s and had graduated from high school, p > 0.05, (d)
Individuals in other service field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_other-service_field_who_never_married_and_work_40hpw.png)
(e) (f) (g) (h)
p = 1 p = 1 p = 1 p = 1
Female Male Female Male Female Male Female Male
small 38 36 56 65 13 12 32 33
large 1 0 0 0 0 0 0 0

All four sub-groups (e, f, g, h) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married, worked 40 hours per week, and:

  • were in their 20s and had attended college (no degree), p > 0.05, (e)
  • were in their 20s and had graduated from high school, p > 0.05, (f)
  • were in their 30s and had attended college (no degree), p > 0.05, (g)
  • were in their 30s and had graduated from high school, p > 0.05, (h)
Individuals in professional specialty field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_prof-specialty_field_who_never_married_and_work_40hpw.png)
(i) (j) (k) (l)
p = 1 p = 1 p = 1 p = 1
Female Male Female Male Female Male Female Male
small 11 13 11 10 5 6 1 5
large 0 1 1 0 0 1 0 1

All four sub-groups (i, j, k, l) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in professional specialty field who never married, worked 40 hours per week, and:

  • were in their 20s and had attended college (no degree), p > 0.05, (i)
  • were in their 20s and had graduated from high school, p > 0.05, (j)
  • were in their 30s and had attended college (no degree), p > 0.05, (k)
  • were in their 30s and had graduated from high school, p > 0.05, (l)
Individuals in sales field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_sales_field_who_never_married_and_work_40hpw.png)
(m) (n) (o) (p)
p = 1 p = 1 p = 1 p = 0.28
Female Male Female Male Female Male Female Male
small 39 36 23 43 9 5 18 6
large 0 0 0 0 0 0 0 1

All four sub-groups (m, n, o, p) displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in sales field who never married, worked 40 hours per week, and:

  • were in their 20s and had attended college (no degree), p > 0.05, (m)
  • were in their 20s and had graduated from high school, p > 0.05, (n)
  • were in their 30s and had attended college (no degree), p > 0.05, (o)
  • were in their 30s and had graduated from high school, p > 0.05, (p)
  • .

Comparing men and women under fixed conditions II

Men and women's income levels (large or small) were counted after fixing the following variables:
  • age group
  • number of work hours per week
  • occupational field

The following variable was not fixed:

  • educational background (highest education)
Individuals in executive/managerial field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_exec-managerial_field_who_never_married_and_work_40hpw_2.png)
(q) (r)
p = 0.6209 p = 0.03211
Female Male Female Male
small 89 89 55 45
large 3 1 4 12

Fisher's exact test performed on the 30s group (r) resulted in a p-value of 0.03211 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in executive/managerial field who never married and worked 40 hours per week when there exists no fix for educational background.

Individuals in other service field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_other-service_field_who_never_married_and_work_40hpw_2.png)
(s) (t)
p = 0.4667 p = 1
Female Male Female Male
small 125 144 65 69
large 1 0 0 0

Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married, worked 40 hours per week, and:

  • were in their 20s, p > 0.05, (s)
  • were in their 30s, p > 0.05, (t)
Individuals in professional specialty field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_prof-specialty_field_who_never_married_and_work_40hpw_2.png)
(u) (v)
p = 0.4381 p = 0.0003111
Female Male Female Male
small 134 102 73 56
large 11 5 4 20

Fisher's exact test performed on the 30s group (v) resulted in a p-value of 0.0003111 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in professional specialty field who never married and worked 40 hours per week when there exists no fix for educational background.

Individuals in sales field who never married and worked 40 hours per week
![Alt text](./images/individuals_in_sales_field_who_never_married_and_work_40hpw_2.png)
(w) (x)
p = 0.5022 p = 0.007476
Female Male Female Male
small 108 128 49 28
large 0 2 1 7

Fisher's exact test performed on the 30s group (x) resulted in a p-value of 0.007476 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in sales field who never married and worked 40 hours per week when there exists no fix for educational background.

Comparing men and women under fixed conditions III

Men and women's income levels (large or small) were counted after fixing the following variables:
  • age group
  • occupational field

The following variables were not fixed:

  • number of work hours per week
  • educational background (highest education)
Individuals in executive/managerial field who never married
![Alt text](./images/individuals_in_exec-managerial_field_who_never_married.png)
(y) (z)
p = 0.26 p = 0.08727
Female Male Female Male
small 176 177 96 97
large 6 12 15 29

Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in executive/managerial field who never married and:

  • were in their 20s, (y)
  • were in their 30s, (z)
Individuals in other service field who never married
![Alt text](./images/individuals_in_other-service_field_who_never_married.png)
(aa) (ab)
p = 0.7146 p = 1
Female Male Female Male
small 390 364 141 127
large 3 2 1 0

Both sub-groups displayed p-values greater than 0.05. Therefore, I retained the null hypothesis that there is no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women among individuals working in other service field who never married and:

  • were in their 20s, p > 0.05, (aa)
  • were in their 30s, p > 0.05, (ab)
Individuals in professional specialty field who never married
![Alt text](./images/individuals_in_prof-specialty_field_who_never_married.png)
(ac) (ad)
p = 0.1283 p = 0.01119
Female Male Female Male
small 308 241 138 125
large 14 20 25 47
Chi-square test performed on the 30s group (ad) resulted in a p-value of 0.01119 (less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (in favor of men) among individuals in their 30s working in professional specialty field who never married when there exists no fix for educational background or the number of hours worked.
Individuals in sales field who never married
![Alt text](./images/individuals_in_sales_field_who_never_married.png)
(ae) (af)
p = 0.001935 p = 0.00316
Female Male Female Male
small 320 341 93 93
large 1 13 5 21

Fisher's exact test performed on the 20s group (ae) and the 30s group (af) resulted in p-values of 0.001935 and 0.00316, respectively (both less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (both in favor of men) among individuals in their 20s and 30s working in sales field who never married when there exists no fix for educational background or the number of hours worked.

Comparing men and women under fixed conditions IV

Men and women's income levels (large or small) were counted after fixing the following variable:
  • occupational field

The following variables were not fixed:

  • age group
  • number of work hours per week
  • educational background (highest education)
Individuals who never married
![Alt text](./images/individuals_who_never_married.png)
</tr>
<tr>
	<td>large</td>
	<td>42</td>
	<td>70</td>
	<td>5</td>
	<td>7</td>
	<td>71</td>
	<td>100</td>
	<td>10</td>
	<td>48</td>
</tr>
(ag) (ah) (ai) (aj)
p = 0.01027 p = 0.568 p = 0.0006787 p = 8.112e-08
Female Male Female Male Female Male Female Male
small 343 328 830 766 578 457 667 565

Fisher's exact test performed on the executive/managerial group (ae) and chi-square tests performed on the professional specialty group (ai) and the sales group (aj) resulted in p-value of 0.01027, 0.0006787, and 8.112e-08, respectively (all less than 0.05). Therefore, I rejected the null hypothesis and concluded that there is a statistically significant difference in the proportions of 'large' income-earners (more than $50K/year) between men and women (all three in favor of men) among individuals working in executive/managerial, professional specialty, and sales field when there exists no fix for age group, educational background, or the number of hours worked.

Conclusion

When men and women were compared after fixing for marital status (never married), age group, occupational field, number of hours worked, and educational background, there seemed to be no difference in the proportions of 'large' income-earners (more than $50K/year) between men and women in all 16 comparisons (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p).

When men and women were compared after fixing for marital status (never married), age group, occupational field and number of hours worked but not for educational background, there were three out of eight comparisons (r, v, x) that showed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women.

When men and women were compared after fixing for marital status (never married), age group, and occupational field but not for number of hours worked or educational background, there were three out of eight comparisons (ad, ae, af) that showed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women.

When men and women were compared after fixing for only marital status and occupational field, there existed statistically significant differences in the proportions of 'large' income-earners (more than $50K/year) between men and women in all four comparisons (ag, ah, ai, aj) among never-married individuals working in executive/managerial, other service, professional specialty, and sales field.

This suggests that the differences in the proportions of 'large' income-earners arise when individuals are compared without fixing for variables that are related to their pay. When men and women's income are counted and compared after adjusting for individuals' marital status, occupational field, age group, educational background, and number of work hours per week, there didn't appear to be any statistically significant differences in the proportions of 'large' income-earners between men and women.

Considerations for Further Analysis

For future analysis, it would be great to use a dataset that contains individuals' actual income (as opposed to a binary variable that classifies income as either 'large' or 'small'). In addition,

I would also like to examine individuals in other marital status categories (e.g. married to civilian, divorced, etc.) to see if the same no income proportional differences are observed when occupational field, age group, number of work hours per week, and educational background variables are fixed.

Lastly, the basis for claiming that there existed no statistically significant difference in the proportions of 'large' income-earner (more than $50K/year) between men and women with fixed variables (marital status, occupational field, number of work hours per week, age group, and educational background) relied on Fisher's exact tests conducted on figures (a) through (p). Unfortunately, many of those cases contained cell counts less than or equal to 5, which forced me to perform Fisher's exact tests as opposed to chi-square tests, which are deemed more accurate. In fact, in all 16 cases from (a) through (p), the cells counts for 'large' income-earner counts were either 0 or 1 (mostly 0s). The analysis would have been much more robust if all of the 'large' income-earner counts were greater than 5 and if I could have performed chi-square tests that yielded the same no income proportionality differences between genders. Hence, for the next analysis, it would be great to either focus on sample subsets that can yield higher numbers of 'large' income-earners after fixing for variables or use a much bigger dataset.

About

This study compares incomes of unmarried men and women under equal conditions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages