Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc on querying scraper output using DuckDB #76

Merged
merged 5 commits into from
Sep 26, 2024

Conversation

jessemortenson
Copy link
Contributor

I've found DuckDB very helpful in investigating data issues across a set of data flatfiles.

I wanted to include a portable DuckDB data file with this, but immediately found one case where scraped data output in one jurisdiction didn't have all the same properties (vote events in WY missing dedupe_key whereas DE has dedupe_key). This causes the DuckDB view to fail with an error. So I added a note to the other ticket about improving scraper output, and instead this doc just includes the code to create the Views.

Copy link

@alexobaseki alexobaseki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this. I had a couple of comments.

docs/data/query-scraper-output-data.md Show resolved Hide resolved

```sql
SELECT classification, COUNT(*)
FROM bill_action_classifications

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we access bill_action_classifications? The same question for bill_actions, bill_version_links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are all Views that can be created with the queries listed in the "Use views to drill down..." section of the document. I just added a commit that tries to emphasize the need to create those views before using various queries.

Do you think that helps enough to explain? or is there something else that should be added to make this more clear?

@jessemortenson jessemortenson merged commit 431b513 into main Sep 26, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants