Note that you can get the latest version of the data, more comprehensive documentation, and various auxiliary files at the GitHub repository here: https://github.com/colincookman/pakistan_candidate_scrutiny_18. The versions released on dataverse will correspond to releases on the GitHub repository here: https://github.com/colincookman/pakistan_candidate_scrutiny_18/releases
This repository hosts cleaned candidate scrutiny forms released by the Election Commission of Pakistan for prospective candidates for Pakistan's 2018 national and provincial assembly elections. The scrutiny forms consist of data released on tax payments from the Federal Board of Revenue (FBR), corruption cases from the the National Accountability Bureau (NAB), and oustanding loans from the State Bank of Pakistan (SBP).
The final output after cleaning is also included as pk_candidate_scrutiny_data_2018.csv, or in wide format (collapsing three reported tax-year rows per candidate into a single row) as pk_candidate_scrutiny_data_2018_wide.csv.
Please note that this database may change with additional input, corrections, or data sources. Please cite it when using it if you feel that is appropriate.
The raw files used as a data source are approximately 10.3 GB in size and too large to host in this Github repository. They were posted by the ECP in late June 2018.
Pakistani election law does not impose residency requirements for candidacy filings and allows individual candidates to contest multiple seats simultaneously, within and across assemblies and provinces. As of the initial data release the ECP provided information for 19397 candidacy filings and 15907 unique candidates (as identified by accompanying Computerized National ID Card records). Note that in some cases individual records reported by the FBR, NAB, and SBP may not be consistent across multiple constituency filings.
province | assembly | direct_seats | womens_seats | minority_seats |
---|---|---|---|---|
Balochistan | National Assembly | 374 | 40 | NA |
Balochistan | Provincial Assembly | 1302 | 13 | 24 |
KPK | National Assembly | 1076 | 90 | NA |
KPK | Provincial Assembly | 1959 | 177 | 59 |
National | National Assembly | NA | NA | 215 |
Punjab | National Assembly | 2428 | 172 | NA |
Punjab | Provincial Assembly | 6234 | 668 | 141 |
Sindh | National Assembly | 1181 | 178 | NA |
Sindh | Provincial Assembly | 2916 | 94 | 56 |
The ECP had previously reported (https://www.ecp.gov.pk/PrintDocument.aspx?PressId=55295&type=Image, https://web.archive.org/web/20180627164802/https://www.ecp.gov.pk/PrintDocument.aspx?PressId=55295&type=Image) on June 18 2018 that 21482 nomination papers had been filed. While we cannot account for the discrepancy or identify missing candidates at this stage, based on the ECP's earlier aggregate figures the following number of candidacy filings may be missing from the current dataset (or not accounted for in the earlier ECP statement in cases where more records were available):
province | assembly | direct_seats | womens_seats | minority_seats |
---|---|---|---|---|
Balochistan | National Assembly | 435 (-61) | 36 (+4) | NA |
Balochistan | Provincial Assembly | 1400 (-98) | 116 (-103) | 56 (-32) |
KPK | National Assembly | 992 (+84) | 88 (+2) | NA |
KPK | Provincial Assembly | 1920 (+39) | 262 (-85) | 73 (-14) |
National | National Assembly | NA | NA | 154 (+61) |
Punjab | National Assembly | 2700 (-272) | 236 (-64) | NA |
Punjab | Provincial Assembly | 6747 (-513) | 664 (+4) | 232 (-91) |
Sindh | National Assembly | 1346 (-165) | 76 (+102) | NA |
Sindh | Provincial Assembly | 3626 (-710) | 213 (-119) | 110 (-54) |
A final list of candidates following the process of scrutiny, disqualification, appeal, and withdrawal is currently scheduled for release shortly after v1 of this data were released.
This dataset is being presented to encourage broader open data sharing among the community of analysts on Pakistan. We make no guarantees as to and cannot verify the accuracy of, or account for any discrepancies in, the underlying data.
There are currently three rows in this dataset for every candidate-constituency. Each row is then a candidate-constituency-tax_year.
candidate_code: Candidacy filing code generated from constituency and candidate number (not unique to single individuals)
province: Province location (note that former FATA constituencies are included in KPK, and Islamabad constituencies in Punjab)
assembly: National or provincial assembly
constituency_number: Constituency number for directly elected seats or womens / minorities reserved list
candidate_number: ECP-assigned candidacy filing number
candidate_CNICP_ECP: ECP-reported Computerized National ID Card, unique to single individuals
multi_candidate: Flags individuals that are contesting multiple constituencies
candidate_NTN: Candidate National Tax Number as reported by Federal Board of Revenue
candidate_NTN_issue: Date of NTN issue as reported by Federal Board of Revenue
candidate_RTO: Location of NTN-issuing Regional Tax Office as reported by Federal Board of Revenue
candidate_MNIC_NAB: Candidate Manual National ID Card as reported by the NAB
candidate_MNIC_SBP: Candidate Manual National ID Card as reported by the SBP
tax_year: Tax-year observation (2015, 2016, or 2017)
candidate_tax_type: Candidate filed, did not file, or was unregistered
candidate_tax-paid: Tax paid by candidate as reported by FBR
candidate_tax_paid_num: Tax paid converted to numeric values for calculation
candidate_tax_receipts: "Receipts under final tax regime" as reported by FBR
candidate_tax_receipts_num: Tax receipts converted to numeric values for calculation
candidate_tax_income: Taxable income as reported by FBR
candidate_tax_income_num: Taxable income converted to numeric values for calculation
candidate_tax_remarks: Additional remarks as reported by FBR
candidate_NAB_guilty: Binary variable if NAB reported any conviction, plea bargain, or other pending case against candidate
candidate_NAB_conviction: Binary variable if NAB reported any conviction against candidate
candidate_NAB_plea: Binary variable if NAB reported any plea bargain on the part of candidate
candidate_NAB_accused: Binary variable if NAB reported candidate accused or otherwise facing pending cases
candidate_NAB_remarks: NAB remarks on any
candidate_personal_loan: SBP remarks on candidate personal loans, if any reported
candidate_business_loan: SBP remarks on candidate business loans, if any reported
parl_inc_tax_2016: An indicator for whether this CNIC is linked to a parliamentarian in 2016 when the FBR released incumbents tax payment amounts in the 2016 Incumbent Parliamentarian report
parl_inc_name: Candidate name when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report
parl_inc_chamber: Candidate chamber when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report
parl_inc_province: Candidate province when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report, incomplete for MNAs
parl_inc_type: Candidate seat type when a parliamentarian as reported by the FBR in the 2016 Incumbent Parliamentarian report, incomplete for MNAs
parl_tax_paid_2016: Amount paid by candidate in 2016 when they were a parliamentarian, as reported by the FBR in the 2016 Incumbent Parliamentarian report
parl_aop_tax_paid_2016: Amount paid by AOPs the candidate was a part of in 2016 when they were a parliamentarian, as reported by the FBR in the 2016 Incumbent Parliamentarian report
candidate_name_FBR: Candidate name as reported by FBR in Urdu unicode
candidate_name_NAB: Candidate name as reported by NAB in Urdu unicode
candidate_name_SBP: Candidate name as reported by SBP in Roman Urdu (note: SBP forms did not report name information consistently)
urdu_name_match: Check on whether FBR and NAB Urdu names match (several cases indicate mismatch, apparently due to typos or nonstandardized name spellings)
MNIC_match: Check on whether NAB and SBP MNIC identifications match
target: Directory path for folder with candidate data
We also release a wide version of the data as pk_candidate_scrutiny_data_2018_wide.csv
. All that changes is the data is no longer candidate-constituency-tax_year, but is rather just candidate-constituency level. All of the candidate_tax_*
variables scraped from the FBR PDFs become candidate_tax_*_YYYY
variables.