internet-archiving

Here are 27 public repositories matching this topic...

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Dec 19, 2024
Python

akamhy / waybackpy

Star

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

pirate / wikipedia-mirror

Sponsor

Star

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

html docker nginx wiki docker-compose mediawiki wikipedia archiving datascience kiwix zim wikipedia-dump wikipedia-mirror openzim xowa internet-archiving mwdumper kiwix-offline-wikipedia

Updated Apr 7, 2021
Shell

ArchiveBox / good-karma-kit

Sponsor

Star

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

docker docker-compose ipfs distributed-computing tor distributed-storage sia boinc kiwix i2p foldingathome storj pywb internet-archiving archivebox good-karma archivewarrior zimfarm

Updated May 11, 2024

ArchiveBox / archivebox-browser-extension

Sponsor

Star

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

chrome-extension archiving svelte firefox-extension browser-extension web-archiving digital-preservation digipres internet-archiving archivebox

Updated Dec 16, 2024
TypeScript

ArchiveBox / electron-archivebox

Sponsor

Star

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

electron windows macos linux docker gui desktop web-archiving digipres internet-archiving archivebox desktop-electron

Updated Feb 28, 2023
JavaScript

vegetableman / vandal

Star

Navigator for Web Archive

chrome-extension firefox-addon wayback-machine webarchive internet-archiving

Updated Nov 23, 2023
JavaScript

mikwielgus / forum-dl

Sponsor

Star

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

python scraper forum discourse phpbb warc data-fetching simplemachines internet-archiving

Updated Jun 27, 2024
Python

pirate / internet-archiving-talk

Sponsor

Star

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

slideshow wget talks warc censorship web-archiving ethics internet-archiving archivebox

Updated Aug 15, 2024
JavaScript

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

cli chrome downloader curl headless scraping crawling http-client youtube-dl wget cli-tool puppeteer internet-archiving playwright archivebox yt-dlp gallery-dl ai-scraping

Updated Dec 17, 2024
JavaScript

ArchiveBox / docker-archivebox

Sponsor

Star

Home of the official docker image for ArchiveBox

docker kubernetes image docker-compose docker-image container oci digipres podman internet-archiving archivebox

Updated Dec 18, 2024

Own-Data-Privateer / hoardy-web

Star

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

cli backups internet archiving snapshot self-hosted archive browser-extension archiver web-archiving wayback-machine web-browsing web-archive website-archive auto-save offline-reading internet-archiving

Updated Dec 21, 2024
Python

ArchiveBox / readability-extractor

Sponsor

Star

Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.

wrapper node readability internet-archiving archivebox

Updated Sep 16, 2024
JavaScript

ArchiveBox / homebrew-archivebox

Sponsor

Star

Homebrew formula for the ArchiveBox self-hosted internet archiving solution.

macos homebrew package linuxbrew web-archiving digipres brew-tap internet-archiving archivebox

Updated Oct 5, 2024
Ruby

ArchiveBox / archivebox-proxy

Sponsor

Star

Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.

proxy https-proxy web-archiving web-proxy digital-preservation mitmproxy digipres internet-archiving archivebox

Updated Jul 12, 2024
Python

ArchiveBox / debian-archivebox

Sponsor

Star

Home of the official apt/deb package for Ubuntu/Debian-based systems.

package debian apt ubuntu web-archiving aptitude digipres internet-archiving archivebox stdeb

Updated Oct 5, 2024
Python

ArchiveBox / DigestBox

Sponsor

Star

DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.

backups warc web-archiving digipres headless-browser internet-archiving archivebox

Updated Feb 2, 2024
HTML

ArchiveBox / docs

Sponsor

Star

Source for the Github Wiki / ReadTheDocs documentation for AchiveBox, the self-hosted internet archiving solution.

python cli community documentation ui rest wiki sphinx usage web-archiving digipres internet-archiving archivebox

Updated Dec 18, 2024
CSS

ArchiveBox / pip-archivebox

Sponsor

Star

Official Python package for ArchiveBox, the self-hosted internet archiving solution.

python pypi wheel pip setuptools web-archiving digipres sdist internet-archiving archivebox

Updated Oct 5, 2024

itsliamdowd / WaybackBrowserMacOS

Star

Pick a date and explore websites from the early days of the internet to now all in an easy-to-use browser format! 💻

Updated Jul 1, 2022
Swift

Improve this page

Add a description, image, and links to the internet-archiving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the internet-archiving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

internet-archiving

Here are 27 public repositories matching this topic...

ArchiveBox / ArchiveBox

akamhy / waybackpy

pirate / wikipedia-mirror

ArchiveBox / good-karma-kit

ArchiveBox / archivebox-browser-extension

ArchiveBox / electron-archivebox

vegetableman / vandal

mikwielgus / forum-dl

pirate / internet-archiving-talk

ArchiveBox / abx-dl

ArchiveBox / docker-archivebox

Own-Data-Privateer / hoardy-web

ArchiveBox / readability-extractor

ArchiveBox / homebrew-archivebox

ArchiveBox / archivebox-proxy

ArchiveBox / debian-archivebox

ArchiveBox / DigestBox

ArchiveBox / docs

ArchiveBox / pip-archivebox

itsliamdowd / WaybackBrowserMacOS

Improve this page

Add this topic to your repo