Releases: internetarchive/cdx-summary
Releases · internetarchive/cdx-summary
Release 0.1.1b2
Release 0.1.1b2
includes following minor changes:
- Remove hyperlink highlight/underline in sample report
- Add features list and sample output in README
Release 0.1.1b1
Initial public release version 0.1.1b1
with the following features:
- Summarize local CDX files or remote ones over HTTP
- Handle
gz
andbz2
compression seamlessly - Handle CDX data input to STDIN from pipe
- Support Internet Archive Petabox web item summarization
- Seamless authorization to Internet Archive via the
ia
CLI tool - Human-friendly summary by default, but support summarized or detailed JSON reports
- Self-aware, as the input can be a previously generated JSON report in place of CDX data
- Summary includes:
- An overview of numbers of captures, consecutive unique URIs, unique hosts, accumulated WARC records size, and the first and last datetimes
- A grid of media types and status codes and their respective capture counts
- Top-N (configurable) hosts and their capture counts
- A random sample of N (configurable) memento URIs for
200 OK
HTML pages