Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a view for HTTP headers #25

Merged
merged 3 commits into from
Oct 30, 2023
Merged

Conversation

edsu
Copy link
Contributor

@edsu edsu commented Oct 28, 2023

This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

SELECT warc_record_id, name, value FROM http_headers;

It can be helpful for identifying for things like:

SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;
value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1

Closes #24

@edsu edsu force-pushed the headers-view branch 2 times, most recently from 30131c4 to 41848dd Compare October 28, 2023 20:09
This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

```sql
SELECT warc_record_id, name, value FROM http_headers;
```

It can be helpful for identifying for things like:

```sql
SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;

value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1
```

Closes Florents-Tselai#24
@Florents-Tselai
Copy link
Owner

Florents-Tselai commented Oct 29, 2023

Two comments

  • I think we should prefix views with something like v_* and make have in plural (i.e. v_http_headers), and actually, this is only response headers, right?
  • We should start documenting these in a table-like format to keep track of things. I like how Postgres does this. e.g. pg_stat_activity

@edsu
Copy link
Contributor Author

edsu commented Oct 29, 2023

Ok, I can adjust the view name and the docs. Our other table names are singular, not plural. I think we should be consistent, and don't have a strong preference either way. Do you?

Add a similar table for HTTP requests. Prefix the view names with a `v_` to distinguish it in the schema from
actual tables.

Also add a description of the view with a table that defines the columns.
@edsu
Copy link
Contributor Author

edsu commented Oct 30, 2023

That was a good idea to treat requests the same, since they have http headers as well. I've updated this PR to create a view for request records as well, renamed both views to use the v_ prefix, and (hopefully) improved the documentation along the lines of what you pointed to in the Postgres docs.

Let me know if you have a preference for singular or plural table/view names.

@Florents-Tselai
Copy link
Owner

Looks beautiful!

No strong preference for singular / plural; let's keep it singular. No problem.

@edsu
Copy link
Contributor Author

edsu commented Oct 30, 2023

CI let me know I need to reformat :-)

@Florents-Tselai Florents-Tselai merged commit 43866ef into Florents-Tselai:main Oct 30, 2023
1 check passed
@edsu edsu mentioned this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Model HTTP Status
2 participants