A set of R scripts to load, clean, and merge anonymized annual Statistics Canada income tax data for British Columbia.
This repo contains the following contents:
- Documents in
/Documentation/
folder - Data in
/Data/
folder and/Data/IND/
or/Data/FAM/
subfolders - R scripts in
/R/
folder - Exercise folder in
/Scratch/
The scripts in this repository wrangle and tidy purchased anonymized annual Statistics Canada data similar to 'Tax filers and dependants with income by source of income' Table: 11-10-0007-01. The annual data are provided as sheets in .xls format under the Statistics Canada Open Licence, one .xls file for each year for anonymized individuals and anonymized families. Metadata provided by Statistics Canada is in /Documentation/
folder.
Untidy .xls files per year for anonymized individual and family income tax data must be placed in the /Data/
folder either as /Data/IND/
or /Data/FAM/
sub-folders to house the individual or family files respectively.
The .xls files were converted to .xlsx and their empty spaces in the names were removed for more convenient data wrangling. This can also be done with readxl and tidyxl package in RStudio.
Tidied .csv equivalent files are written to a /Tmp/
folder. The /R/
folder contains all the R scripts used to read xlsx, clean, merge, and output the merged csv files.
There are six scripts located in the /Scratch/
folder that are not required for the analysis, and are written after the R scripts were generated and before functions were discovered. The aim is to generate a reproducible workflow for file load, clean, merge, and output in one go.
- 01_load.R # only an example to load files for one year and the sample column names
- 02_clean.R # how to extract BC geographies
- 03_merge.R # merging all sub-sheets into one
- 04_output.R # write out the merged csv's
- 05_encrypt.R # how to encrypt files for security (was not used, also note warning and latest version when using)
- 06_scratch.R # functions to use for efficiency (i.e. use to load, clean, merge, and output in one go)
To report bugs/issues/feature requests, please file an issue.
If you would like to contribute, please see our CONTRIBUTING guidelines.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Copyright 2019 Province of British Columbia
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and limitations under the License.
This repository is maintained by Integrated Data Division (OCIO).
This project was created using the bcgovr package.