Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model HTTP Status #24

Closed
edsu opened this issue Oct 23, 2023 · 1 comment · Fixed by #25 or #26
Closed

Model HTTP Status #24

edsu opened this issue Oct 23, 2023 · 1 comment · Fixed by #25 or #26

Comments

@edsu
Copy link
Contributor

edsu commented Oct 23, 2023

It would be useful to be able to see what HTTP status codes were returned with responses. Currently it is not being modeled. We could stuff it into http_headers, but I would be inclined to add it as a separate column on the response table.

@edsu edsu changed the title HTTP Status Model HTTP Status Oct 23, 2023
edsu added a commit to edsu/WarcDB that referenced this issue Oct 28, 2023
This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

```sql
SELECT warc_record_id, name, value FROM http_headers;
```

It can be helpful for identifying for things like:

```sql
SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;

value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1
```

Closes Florents-Tselai#24
edsu added a commit to edsu/WarcDB that referenced this issue Oct 28, 2023
This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

```sql
SELECT warc_record_id, name, value FROM http_headers;
```

It can be helpful for identifying for things like:

```sql
SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;

value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1
```

Closes Florents-Tselai#24
edsu added a commit to edsu/WarcDB that referenced this issue Oct 28, 2023
This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

```sql
SELECT warc_record_id, name, value FROM http_headers;
```

It can be helpful for identifying for things like:

```sql
SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;

value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1
```

Closes Florents-Tselai#24
edsu added a commit to edsu/WarcDB that referenced this issue Oct 28, 2023
This commit adds a migration that creates a view of the HTTP headers in the response table. Once the view is in place you can run a query like this without requiring JSON parsing:

```sql
SELECT warc_record_id, name, value FROM http_headers;
```

It can be helpful for identifying for things like:

```sql
SELECT
  value,
  COUNT(*) AS count
FROM http_header
WHERE name = 'content-type'
GROUP BY value
ORDER BY count DESC;

value                              count
---------------------------------  -----
application/javascript             57
image/png                          11
text/css                           7
text/html; charset=utf-8           6
image/jpeg                         4
image/gif                          4
text/fragment+html; charset=utf-8  3
image/svg+xml                      3
text/plain                         2
text/html; charset=UTF-8           1
```

Closes Florents-Tselai#24
@edsu
Copy link
Contributor Author

edsu commented Oct 30, 2023

Oops I linked the wrong ticket to #25. This should stay open.

edsu added a commit to edsu/WarcDB that referenced this issue Oct 30, 2023
Since the HTTP Response status code isn't in the headers dictionary it
should be modeled separately.

Fixes Florents-Tselai#24
edsu added a commit to edsu/WarcDB that referenced this issue Oct 31, 2023
Since the HTTP Response status code isn't in the headers dictionary it
should be modeled separately.

Fixes Florents-Tselai#24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant