Skip to content

Releases: internetarchive/cdx-summary

Release 0.1.1b2

03 Nov 14:14
57c9d21
Compare
Choose a tag to compare
Release 0.1.1b2 Pre-release
Pre-release

Release 0.1.1b2 includes following minor changes:

  • Remove hyperlink highlight/underline in sample report
  • Add features list and sample output in README

Release 0.1.1b1

03 Nov 00:53
ca45e29
Compare
Choose a tag to compare
Release 0.1.1b1 Pre-release
Pre-release

Initial public release version 0.1.1b1 with the following features:

  • Summarize local CDX files or remote ones over HTTP
  • Handle gz and bz2 compression seamlessly
  • Handle CDX data input to STDIN from pipe
  • Support Internet Archive Petabox web item summarization
  • Seamless authorization to Internet Archive via the ia CLI tool
  • Human-friendly summary by default, but support summarized or detailed JSON reports
  • Self-aware, as the input can be a previously generated JSON report in place of CDX data
  • Summary includes:
    • An overview of numbers of captures, consecutive unique URIs, unique hosts, accumulated WARC records size, and the first and last datetimes
    • A grid of media types and status codes and their respective capture counts
    • Top-N (configurable) hosts and their capture counts
    • A random sample of N (configurable) memento URIs for 200 OK HTML pages