-
Notifications
You must be signed in to change notification settings - Fork 0
/
notes.txt
134 lines (91 loc) · 4.87 KB
/
notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
CVR (Cast Vote Records) Notes:
Documentation: https://www.sfelections.org/results/20200303/data/DSuite_CVRExportFormat.pdf
BallotTypeContestManifest.json - mapping between BallotTypeId and ContestId. Seems to be many-to-many. Basically which contests were present on which ballots
BallotTypeManifest.json - lists types of ballots for this election
CandidateManifest.json - lists all candidates, and specify which contest they belong to
Configuration.json - config options selected used when generating these files. mostly not relevant
ContestManifest.json - lists all contests, notes how many votes are allowed for each contest, and number of ranks
CountingGroupManifest.json - lists counting groups (Election Day vs VBM)
DistrictManifest.json - not documented! Seems to have several different granularities of districts, ex, state-wide, congressional district, and more
DistrictPrecinctPortionManifest.json - mapping between precincts and districts (plural!) it belongs to
DistrictTypeManifest.json - lists granularities of districts, referenced by DistrictManifest.json
ElectionEventManifest.json - seems to describe the election itself
OutstackConditionManifest.json - describes various error conditions for ballots (overvote, undervote, etc)
PartyManifest.json - lists parties
PrecinctManifest.json - lists all precincts
PrecinctPortionManifest.json - seems very similar to PrecinctManifest.json, not sure the difference
TabulatorManifest.json - lists each voting machine
CvrExport_#.json - actual results from ballots! most data is in the Sessions list.
TabulatorId - which voting machine counted this ballot
BatchId, RecordId - just ids for this vote
CountingGroupId - which counting group this ballot belongs to
Original/Modified: actual votes, use IsCurrent to determine which is most up-to-date
PrecinctPortionId - precinct ballot was cast at
BallotTypeId
IsCurrent
Cards - list of cards scanned for this ballot
OutstackConditionIds - errors during processing this ballot?
PaperIndex - ballot card #?
Contests - list of each contest on scanned page.
Id - contest id
Overvotes, Undervotes, OutstackConditionIds - errors during processing this contest?
Marks - votes cast in this contest
CandidateId - which candidate was voted for
PartyId - party of candidate
Rank - only relevant for ranked choice contests, rank of this mark
WriteinIndex - only present if vote is for write-in candidate, not totally sure what this is
MarkDensity, WriteinDensity - how filled in bubble was!
IsAmbiguous - if mark is unclear
IsVote - true if mark was counted as a vote
Question: if a ballot spans multiple cards, are they grouped together in a single CVR record, or can they be split across multiple records? If they are split, can you associate those cards with each other? Do adjacent record ids matter?
Data model:
Candidate table:
id, name, contest_id
Contest table:
id, name, num_votes, num_ranks
Ballot type table:
id, name
Ballot type contest assoc table:
ballot_type_id, contest_id
Counting group table:
id, name
Party table:
id, name
Precinct portion table:
id, name
District table:
id, name, district_type_id
District type table:
id, name
Precinct portion district assoc table:
precinct_portion_id, district_id
Vote
id, tabulator_id, batch_id, record_id, precinct_portion_id, ballot_type_id, contest_id, candidate_id, party_id, rank
Can we have one row per-ballot page instead of one row per-vote? Would go from 4.7m rows to 574k.
Aggregated vote:
id, tabulator_id, batch_id, record_id, precinct_portion_id, ballot_type_id, votes:
[ { candidate_id, rank } ]
Queries:
SELECT candidate.id as candidate_id, candidate.name as candidate_name, candidate.contest_id as contest_id, contest.name as contest_name, COUNT(vote.id) as votes
FROM candidate
JOIN contest ON contest.id = candidate.contest_id
JOIN vote ON vote.candidate_id = candidate.id
GROUP BY candidate.id, contest.id;
~4s
SELECT candidate.id as candidate_id, candidate.name as candidate_name, contest.id as contest_id, contest.name as contest_name, a.votes as votes
FROM (
SELECT (jsonb_array_elements_text(ballot.votes)::jsonb)->'candidate_id' as candidate_id, COUNT(*) as votes
FROM ballot
GROUP BY (jsonb_array_elements_text(ballot.votes)::jsonb)->'candidate_id'
) AS a
JOIN candidate ON candidate.id = a.candidate_id::integer
JOIN contest ON candidate.contest_id = contest.id
GROUP BY candidate.id, contest.id, a.votes;
~5s
WHERE ballot.votes @> '[{"candidate_id":18}'
+ gin index on votes
582ms
WHERE EXISTS (SELECT id FROM vote as vote_s WHERE vote.tabulator_id = vote_s.tabulator_id AND vote.batch_id = vote_s.batch_id AND vote.record_id = vote_s.record_id AND vote_s.candidate_id = 18)
+ no index
~4s
best approach: unagg but only query votes table (with index on candidate_id) and then do separate query for candidate + contest data