Ontario Trillium Foundation (OTF) is the largest grant foundation in Canada, and as a foundation entirely dependent on Ontario government funding, it is an instrument of government policy through granting priorities in the social economy. One of OTF's goals is to encourage best practices in the social economy in Ontario. One of these best practices is transparency, and one of the OTFs transparency initiatives has been publication of its grant administration information as Open Data Ontario Trillium Foundation Open Data. The data published by OTF is useful for a number of reasons. First, the information available on OTF grants is considered to be administrative data, i.e. it is data used for administration of grants. Second, the data published by OTF used in this analysis is Open Data, i.e. it is available for review and analysis by anybody, in machine readable format. Third, OTF grant data is Population data, in this case it is not a population of people, but a population of grants. i.e. this is not a "sample" of grants, its all grants for the organization OTF, within particular time periods and specific grant streams. This makes the analysis of data relatively simple and less reliant on parametric statistics. The OTF grant datasets provide a comprehensive view of funding through the foundation to social economy organizations serving equity seeking groups such as individuals with disabilities in Ontario.
- Is there a difference between grants for organizations serving individuals with disabilities, and other grants, for the following parameters:
- Number of grants
- Dollar value of grants
- Number of organizations
- Do the datasets allow for intersectional analysis, e.g. grants supporting both disability and race?
- Do the available fields in the database e.g. Recipient Org:Charitable Registration Number allow combination with other datasets such as Revenue Canada’s T3010 database for analysis?
Source of files
The downloadable OTF data sets were retrieved from: Ontario Trillium Foundation Open Data on June 26, 2020. There is an OTF Data Inventory which provides a list of all the datasets available from the same website Filename: otf_data_inventory_en
Revenue Canada T3010 data is another dataset that requires more attention from social scientists, particularly social scientists researching charities in Canada and more broadly social economy organizations. Note that Revenue Canadas T3010 database is published as a searchable data base available to the public at: Government of Canada - List of Charities - basic search. In this document, a set of files available on request from Revenue Canada was utilized rather than the search pages. The matching of data completed here is much more efficiently accomplished using the data files. The Revenue Canada T3010 data file was published 17 July 2020, and subsequently received through postal mail in the subsequent week. Revenue Canada T3010 data is not required for the majority of analysis completed here, and is not required to answer the primary and secondary research questions. Revenue Canada T3010 data is required to answer the tertiary question, proving it can be done here, and not the focus of effort.
The following files are available for to answer the research question and subquestions:
- Grants for the period between 2015 to 2020
- Filename: otf_granting_data_since_april_1_2015
- 3,105 grants (rows) of data
- Youth Opportunities Fund Grants for the period between 2013 to 2020
- Filename: OTF_yof-granting-data
- 240 grants (rows) of data
- Grants for the period between 1999 to 2015
- Filename: otf_granting_data_fiscal_year_2000-2015_20150717
- 22,686 grants (rows) of data
- Cities in Ontario matched to an OTF catchment area.
- Filename: otf_catchment_area-cities_concordance_file
- 1129 cities (rows) of data
- o The OTF catchment area is a geographical area in the province of Ontario that roughly corresponded to historical Local Health Integration Networks (LHINs)
- (Statistics Canada) Census Divisions in Ontario matched to an OTF catchment area.
- Filename: otf_catchment_area-census_division_concordance_file
- 21 Census Divisions (rows) of data
The most relevant OTF files for analysis to be completed for here are:
- otf_granting_data_since_april_1_2015
- OTF_yof-granting-data
- otf_granting_data_fiscal_year_2000-2015_20150717
- T3010_2017ONBasicContactInfo
- T3010_2017ONDirectors
- T3010_2017ONFinancialInfo
- T3010_2017ONGeneralInfo
- T3010_2017ONNewOngoingPrograms
- T3010_2017ONSchedule1
- T3010_2017ONSchedule2
- T3010_2017ONSchedule3
- T3010_2017ONSchedule5
- T3010_2017ONSchedule7PADescription
- T3010_2017ONSchedule7PAOutsideCanada
- T3010_2017ONSchedule7PAResources
The most relevant Revenue Canada files for analysis to be completed here are:
- T3010_2017ONBasicContactInfo
- T3010_2017ONFinancialInfo
All the Revenue Canada T3010 files are related to each other through the BN/Registration Number field. Therefore in connecting any other datasets to the Revenue Canada T3010 set of files, the BN/Registration Number has to be accurate and match Revenue Canada format.
Some conventions:
- In the remainder of this document, the files are referenced by their filename e.g. name_of_file.xyz rather than a log form description
- In the remainder of this document, the English field names foreach field will be used for brevity, assuming that the French field names simply contain the same information, but in French.
List of fields in each file as Appendices
- The complete list of fields available in otf_granting_data_since_april_1_2015 are listed in Appendix B
- The complete list of fields available in OTF_yof-granting-data are listed in Appendix C
- The complete list of fields available in otf_granting_data_fiscal_year_2000-2015_20150717 are listed in Appendix D
- The complete list of fields available in: T3010_2017ONBasicContactInfo are listed in Appendix E
- The complete list of fields available in: T3010_2017ONFinancialInfo are listed in Appendix F
It is useful to note here that the fields available and content available in each field both enable and limit the analysis possible @williamsCrimeSensingBig2017. For example where the “Population Served” field contains the content “Disability”, we can perform analysis on grants that can be separated out by this parameter. On the other hand, if the “Population Served” field contains the content “Disability”, but the field does not include any other content e.g. “Intellectual Disability”, or “Developmental Disability”, then no further analysis can be performed on Intellectual or Developmental Disability. As we shall see, the different datasets available from OTF allow different types of analysis.
OTF grant data is not perfect or ‘clean’ data. There are missing fields, and incorrectly entered data. But it is mostly complete and mostly accurate data, and in the absence of other sources of equivalent data, cleaning the dataset and filling in the missing values as much as possible is the best option. The steps required to clean the dataset to a reasonable level of utility are outlined here. Although some of the steps may seem pedantic, applying Open Science methodology is similar to a physics, chemistry or biology science experiment, where the steps if rigorously reported in detail, can be replicated by other people to verify and validate the result. One of the benefits of Open Data is the ability for multiple researchers to access to the same dataset and do different analyses, or do the same analyses to verify each others results @petersPROSPECTSOPENSCIENCE2013. Social science researchers are notorious for not providing enough data or a detailed enough methodology that enables replication by others. OTF will benefit from the investment it has already made in releasing its grant data as open data by more researchers using the data. In some ways, reporting the detailed steps taken to clean up data in this thesis as a published and public document will encourage OTF to apply the same cleanup steps to upcoming releases for the dataset to make it cleaner and more accessible for all future researchers. More broadly OTF’s objective of transparency in the social economy will be supported by demonstrating the type of analysis that can be done by researchers on OTF Grant Open Data.
The file otf_granting_data_since_april_1_2015 is the cleanest dataset, or the dataset with the least amount of effort required to transform it into a usable dataset since it is the most recent. The reason the cleanest dataset is selected is that data cleanup processes are activities with diminishing returns. If we can perform analysis with a lower level of cleanup, the dataset more valuable on the utility vs effort scale. We can also take lessons learned from analyzing the cleanest dataset and apply them to datasets that require more effort for cleanup and analysis
OpenRefine is a free, Open Source tool available to everyone for data analysis at OpenRefine. Since OpenRefine is a tool specifically designed for cleaning up and analyzing data, it provides a number of powerful features. For example the Facet feature allows easy clustering of data that is similar, where the data entered is slightly different, but belongs in the same group. The same results are achievable using MS Excel and the Pivot Tables function but it takes more steps and therefore more time.
Why clustering within the same field is required
Most of the differences between common descriptions are either errors in data entry, or a change in a common term is used in grant applications over time. As OPen Refine describes it "This feature helps you find groups of different cell values that might be alternative representations of the same thing. For example, the two strings "New York" and "new york" are very likely to refer to the same concept and just have capitalization differences, and "Gödel" and "Godel" probably refer to the same person." For example, the following terms mean the same thing for the purposes of analysis here:
- People with disabilities
- Individuals with disabilities And therefore when counting the number of records, we need to be able to count all records with both these term together, in other words “cluster” them together.
e.g. In the following step in OpenRefine Facet clustering in the field Geographical Area Served clusters together records with “Waterloo, Wellington, Dufferin” with “Waterloo, Wellington, & Dufferin” so that all records are updated to “Waterloo, Wellington, Dufferin”.
Figure 1: Clustering data using Open Refine in Field Geographical Area Served
e.g. In the following step in OpenRefine Facet clustering in the field Organizational Name clusters together records with content “FarmStart” with records with content “Farmstart” so that all records are updated to “FarmStart”. Through the same dialog OpenRefine Facet clustering in the field Organizational Name clusters together records with content “Community Innovation Lab” with records with content “community innovation lab” so that all records are updated to “Community Innovation Lab”. Therefore OpenRefine clustering can be used to cluster multiple similar records together during the same step.
Figure 2: Clustering data using Open Refine in Field Organization Name
Clustering of similar records should be completed with caution, so that differences that are relevant for analysis are not lost through clustering. Since this thesis is focused on people with disabilities, therefore some of the other nuances in data can be reduced.
##Data cleanup effort vs. utility##
All fields in each data set do not require an equivalent level of data cleanup effort. In some cases, the fields are already “clean” and do not require any further cleanup in the form of clustering or transformations. For example, in the file otf_granting_data_since_april_1_2015, the fields Fiscal Year and Grant Programme do not appear to have any obvious errors, and therefore do not require further cleanup.
In other cases the entire field contains either erroneous or unusable data. For example, in the file otf_granting_data_since_april_1_2015, the last column or field contains error values including formulas such as “=AC609-SUM(#REF!,#REF!)”, which is probably some artifact of an internal process, that was not removed when publishing the dataset. This field was deleted in its entirety to ensure it does not confound any further analysis.
In still other cases the field contains data that is not valuable for the purpose of the analysis to be completed. For example in the file otf_granting_data_since_april_1_2015, the field Funding Org has the same value across all records: “Ontario Trillium Foundation”. Similarly, the field Country Served: has the same value across all records: “Canada”. These fields can easily be excluded from any further analysis.
Finally, the fields that are used in further analysis can be classified further classification by level of effort to be invested in cleanup. For example, in the file otf_granting_data_since_april_1_2015, the following fields require different levels of cleanup effort: • Organization name • Recipient Org:Incorporation Number • Recipient Org:Charitable Registration Number Organization Name typically contains many data entry errors, so a low level of effort was invested in cleaning up the Organization Name field (clusters only). Other identifying fields including Incorporation Number and Charitable Registration Number will be used to differentiate organizations, in the context of this thesis which is focused on social economy organizations. Recipient Org:Incorporation Number is a useful identifier for linking to other business related data, but no additional linked data analysis is planned for this thesis using this field, and therefore, a medium level of effort was invested in cleaning up the Recipient Org:Incorporation Number field (clusters and transformations). Recipient Org:Charitable Registration Number is an important field as a unique identifier for linking to Revenue Canada T3010 charity data. As previously mentioned, valuable healthcare research involving individuals with intellectual and developmental disabilities has been conducted using linked data (Arim, et. al. 2019; Shooshtari, 2017). Since some linked data analysis is planned as a tertiary question as part of this thesis, a high level of effort was invested in cleaning up the Recipient Org:Charitable Registration Number field (clusters and transformations and manual edits). A note of caution, that aiming for perfection or no errors is often not an attainable without unlimited time and other resources. Therefore there is a level of “cleanliness” or “error-freeness which is adequate for analysis. For example, for the Recipient Org:Charitable Registration Number field, a 95% clean or 5% detectable error rate was targeted. The detailed steps for cleanup of the Recipient Org:Charitable Registration Number field is provided in the appendix of JSON outputs from OpenRefine, where the steps were captured. Appendix G
Data in all fields used for analysis need to be cleaned up. Therefore, to minimize unnecessary cleanup effort it is useful to identify fields which will not be used for further analysis. The following fields are mostly common across all grants, so they are not included for further analysis
- Funding Org: commonly “Ontario Trillium Foundation”
- Country Served: commonly “Canada”
The following are internal OTF administrative data such as a unique identifier for each grant, that is not for the purposes of this analysis
- Identifier
- Organization name
- Submission date
- Approval date
- Cross-catchment indicator
- Geographical Area Served
- Recipient Org:City
- Recipient Org:Postal Code
- Co-Application
- Rescinded/Recovered
- Rescinded/Recovered Initiated By
- Amount Rescinded/Recovered
- Last modified
One might think “Organization name” is an important field in determining for example the number of grants received per organization. Since organization names are prone to data entry errors, the more exact Recipient Org:Incorporation Number and Recipient Org:Charitable Registration Number are used for organization level analysis. Therefore more cleanup time was invested into the fields Org:Incorporation Number and Recipient Org:Charitable Registration Number, than into Organization Name.
Similarly, one might think Postal code is a useful field e.g. the United Way report, Poverty by Postal Code: The geography of neighbourhood poverty (https://www.unitedwaygt.org/research-and-reports), was a seminal report. But in this case, the postal code provided in the file otf_granting_data_since_april_1_2015 is the head office of the grant recipient organization. Geographical area served better indicator, but it is a wide area based on LHINs, which is too broad a geographical area for analysis.
In the file otf_granting_data_since_april_1_2015:¬ The primary field including data to determine if the grant is related to disability is:
- Population Served
Grants where the Population Served was “People with Disabilities” was analyzed for differences in the following fields.
- Fiscal Year
- Grant Programme
- Amount Applied For
- Amount Awarded
- Planned Dates:Duration (months)
- Description (English)
- Program Area
- Budget Fund
- Age Group
- Recipient Org:Incorporation Number
- Recipient Org:Charitable Registration Number
Cleanup Steps Requiring Judgement
Cleanup decisions are always a matter of judgement by the researcher. As previously explained , for the field Recipient Org:Charitable Registration Number field, a 95% clean or 5% detectable error rate was targeted. In this subsection an additional area of judgement is required. The cleanup for Population Served and Age Group field for example required some judgement in combining similar classifications.
e.g. Using OpenRefine Facet clustering in the field Population Served: records using “Diverse Cultural Communities and Racial Groups” were combined with records using “Diverse Cultural Communities”, and for the purpose of analysis the description “Diverse Cultural Communities and Racial Groups” was used.
e.g. Using OpenRefine Facet clustering in the field Population Served: records using “LGBTQIA” were combined with records using “LGBTQ+”, and for the purpose of analysis the description “LGBTQIA” was used.
e.g. Using OpenRefine Facet clustering in the field Population Served: records using “Indigenous or Aboriginal” were combined with records using “Indigenous”, records using “Aboriginal” and for the purpose of analysis the description “Indigenous or Aboriginal” was used.
In the Age Group field, after initial clustering, there were a number of grants covering two Age Groups. As the following table demonstrates, there are only six grants out of 3105 that served two age groups, but its useful to separate them out to allow cleaner analysis.
Table 1: Number Of Grants Serving Two Age Groups
Age Group 1 | Age Group 2 | Number of Grants |
---|---|---|
Adults (25-64) | 4 | |
Adults | Seniors (65+) | 1 |
Children up to 12 years | 368 | |
Children up to 12 years | Youth | 4 |
General population (all age groups) | 1457 | |
Seniors (65+) | 212 | |
Youth | 640 | |
Youth | Seniors (65+) | 1 |
(blank) | 1 |
The version of the otf_data_inventory_en file which has been cleaned up using OpenRefine is available here: otf_granting_data_since_april_1_2015_original2-openrefine-output. The cleaned up file was used to create the following tables.
Table 2: Number Of Grants By Population Served And Fiscal Year
Population Served | 2015-2016 | 2016-2017 | 2017-2018 | 2018-2019 | 2019-2020 | Grand Total |
---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 76 | 91 | 80 | 87 | 85 | 419 |
Francophones | 29 | 10 | 19 | 18 | 13 | 89 |
General Populations | 390 | 400 | 321 | 345 | 362 | 1818 |
Indigenous or Aboriginal | 29 | 28 | 37 | 20 | 31 | 145 |
LGBTTQIA | 2 | 10 | 12 | 9 | 6 | 39 |
People with Disabilities | 99 | 100 | 89 | 84 | 83 | 455 |
Senior | 1 | 1 | ||||
Women | 33 | 26 | 27 | 23 | 22 | 131 |
Grand Total | 659 | 665 | 585 | 586 | 602 | 3097 |
Table 3: Number Of Grants By Population Served And Grant Programme
Population Served | Community Investments | Ontario150 Community Capital Program | Strategic Investments | Grand Total |
---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 399 | 12 | 8 | 419 |
Francophones | 88 | 1 | 89 | |
General Populations | 1625 | 137 | 56 | 1818 |
Indigenous or Aboriginal | 137 | 4 | 4 | 145 |
LGBTTQIA | 37 | 1 | 1 | 39 |
People with Disabilities | 404 | 46 | 5 | 455 |
Senior | 1 | 1 | ||
Women | 128 | 3 | 131 | |
Grand Total | 2818 | 204 | 75 | 3097 |
Table 4: Average of Amount Applied For per Grant By Population Served and Fiscal Year
Population Served | 2015-2016 | 2016-2017 | 2017-2018 | 2018-2019 | 2019-2020 | Grand Total |
---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 237,541 | 200,853 | 169,005 | 225,395 | 203,041 | 206,967 |
Francophones | 123,072 | 81,080 | 224,505 | 185,544 | 157,446 | 157,664 |
General Populations | 144,200 | 132,029 | 157,505 | 141,643 | 160,225 | 146,577 |
Indigenous or Aboriginal | 130,452 | 180,114 | 135,568 | 228,525 | 183,361 | 166,186 |
LGBTTQIA | 71,850 | 295,350 | 194,958 | 150,022 | 351,867 | 228,156 |
People with Disabilities | 150,614 | 155,250 | 203,875 | 141,876 | 151,713 | 160,638 |
Senior | 10,000 | 10,000 | ||||
Women | 154,839 | 190,646 | 212,804 | 191,574 | 186,695 | 185,692 |
(blank) | 62,563 | 62,563 | ||||
Grand Total | 154,503 | 149,894 | 170,241 | 160,513 | 169,106 | 160,435 |
Table 5: Average of Amount Awarded per Grant By Population Served and Fiscal Year
Population Served | 2015-2016 | 2016-2017 | 2017-2018 | 2018-2019 | 2019-2020 | Grand Total |
---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 237,541 | 200,853 | 169,005 | 225,395 | 203,041 | 206,967 |
Francophones | 123,072 | 81,080 | 224,505 | 185,544 | 157,446 | 157,664 |
General Populations | 144,200 | 131,892 | 157,505 | 141,643 | 160,225 | 146,547 |
Indigenous or Aboriginal | 130,452 | 180,114 | 135,568 | 228,525 | 183,361 | 166,186 |
LGBTTQIA | 71,850 | 295,350 | 194,958 | 150,022 | 351,867 | 228,156 |
People with Disabilities | 150,614 | 155,250 | 203,875 | 141,876 | 151,713 | 160,638 |
Senior | 10,000 | 10,000 | ||||
Women | 154,839 | 190,646 | 212,804 | 191,574 | 186,695 | 185,692 |
(blank) | 56,938 | 56,938 | ||||
Grand Total | 154,503 | 149,746 | 170,241 | 160,513 | 169,106 | 160,403 |
Table 6: Average of Planned Dates:Duration By Population Served And Fiscal Year
Population Served | 2015-2016 | 2016-2017 | 2017-2018 | 2018-2019 | 2019-2020 | Grand Total |
---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 21.72 | 19.31 | 18.90 | 20.16 | 20.44 | 20.07 |
Francophones | 17.03 | 14.90 | 21.37 | 15.67 | 17.08 | 17.45 |
General Populations | 15.86 | 14.18 | 15.45 | 14.46 | 16.50 | 15.28 |
Indigenous or Aboriginal | 14.24 | 16.68 | 13.81 | 16.65 | 14.23 | 14.93 |
LGBTTQIA | 11.50 | 21.80 | 16.00 | 16.44 | 24.00 | 18.59 |
People with Disabilities | 17.07 | 14.34 | 18.17 | 14.08 | 16.24 | 15.98 |
Senior | 12.00 | 12.00 | ||||
Women | 20.12 | 19.96 | 21.59 | 18.74 | 21.05 | 20.31 |
(blank) | 12.75 | 12.75 | ||||
Grand Total | 16.89 | 15.33 | 16.72 | 15.56 | 17.15 | 16.32 |
Table 7: Population Served By Program Area
Population Served | Active People | Connected People | Green People | Inspired People | Promising Young People | Prosperous People | (blank) | Grand Total |
---|---|---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 30 | 113 | 15 | 77 | 118 | 65 | 1 | 419 |
Francophones | 12 | 22 | 1 | 30 | 15 | 9 | 89 | |
General Populations | 460 | 230 | 265 | 381 | 224 | 250 | 8 | 1818 |
Indigenous or Aboriginal | 8 | 31 | 7 | 44 | 31 | 24 | 145 | |
LGBTTQIA | 18 | 3 | 13 | 5 | 39 | |||
People with Disabilities | 103 | 186 | 43 | 66 | 57 | 455 | ||
Senior | 1 | 1 | ||||||
Women | 9 | 36 | 6 | 19 | 61 | 131 | ||
Grand Total | 622 | 637 | 288 | 584 | 486 | 471 | 9 | 3097 |
Table 8: Population Served by Budget Fund
Population Served | Capital | Collective Impact | Grow | Ontario150 | Provincial Impact | Seed | Grand Total |
---|---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 56 | 8 | 163 | 12 | 4 | 176 | 419 |
Francophones | 22 | 24 | 1 | 3 | 39 | 89 | |
General Populations | 613 | 56 | 433 | 137 | 18 | 561 | 1818 |
Indigenous or Aboriginal | 31 | 4 | 31 | 4 | 1 | 74 | 145 |
LGBTTQIA | 4 | 1 | 13 | 1 | 20 | 39 | |
People with Disabilities | 134 | 5 | 114 | 46 | 4 | 152 | 455 |
Senior | 1 | 1 | |||||
Women | 15 | 53 | 3 | 3 | 57 | 131 | |
Grand Total | 875 | 75 | 831 | 204 | 33 | 1079 | 3097 |
Table 9: Grants by Age Group and Population Served
Population Served | Children up to 12 years | Children up to 12 years;Youth | Youth | Adults (25-64) | Adults; Seniors (65+) | General population (all age groups) | Seniors (+65) | (blank) | Grand Total |
---|---|---|---|---|---|---|---|---|---|
Diverse Cultural Communities and Racialized Groups | 65 | 134 | 67 | 118 | 35 | 419 | |||
Francophones | 16 | 27 | 8 | 28 | 10 | 89 | |||
General Populations | 223 | 2 | 316 | 149 | 1030 | 97 | 1 | 1818 | |
Indigenous or Aboriginal | 14 | 1 | 36 | 12 | 82 | 145 | |||
LGBTTQIA | 1 | 22 | 5 | 9 | 2 | 39 | |||
People with Disabilities | 42 | 1 | 78 | 118 | 1 | 153 | 62 | 455 |