URL-Archiver

The URL archiver enables the extraction of URLs from any Unicode text or PDF file and allows for interactive archiving on one of the supported archiving services.

⚠️ The application was designed to be platform-independent. However, it has only been tested on the following systems, so it cannot be guaranteed to work without restrictions on other platforms.

Windows 11 (Version 23H2)

Windows 10 (Version 22H2)

macOS (Ventura)

Ubuntu (20.04.3 LTS)

Authors

Supervisor

Frank Helbling (Project Management)
Dr. Simon Kramer (Technical)

Installation

Requirements

To build and start the application, ensure that the following dependencies are installed on your system:

Git: Latest stable version recommended.
Maven: Version 3.8 or higher.
Java: Version 21.

Clone the repository

To clone the repository, run the following command in a terminal:

git clone https://github.com/devobern/URL-Archiver.git

Build and run scripts

The build and run scripts are provided for Windows (build.ps1, run.ps1, build_and_run.ps1), Linux and MacOS (build.sh, run.sh, build_and_run.sh). The scripts are located in the root directory of the project.

⚠️ The scripts need to be executable. To make them executable, run the following command in a terminal:

Linux / MacOS: chmod +x build.sh run.sh build_and_run.sh

Windows:

Open PowerShell as an Administrator.

Check the current execution policy by running: Get-ExecutionPolicy.

If the policy is Restricted, change it to RemoteSigned to allow local scripts to run. Execute: Set-ExecutionPolicy RemoteSigned.

Confirm the change when prompted.

This change allows you to run PowerShell scripts that are written on your local machine. Be sure to only run scripts from trusted sources.

Windows

Build the application

To build the application, open a command prompt and run the following script:

./build.ps1

Run the application

To run the application, open a command prompt and run the following script:

./run.ps1

Build and run the application

To build and run the application, open a command prompt and run the following script:

./build_and_run.ps1

Linux

Build the application

To build the application, open a terminal and run the following script:

./build.sh

Run the application

To run the application, open a terminal and run the following script:

./run.sh

Build and run the application

To build and run the application, open a terminal and run the following script:

./build_and_run.sh

MacOS

Build the application

To build the application, open a terminal and run the following script:

./build.sh

Run the application

To run the application, open a terminal and run the following script:

./run.sh

Build and run the application

To build and run the application, open a terminal and run the following script:

./build_and_run.sh

User Manual

⚠️ To follow the instructions in this section, the application must be built, see Installation.

The URL-Archiver is a user-friendly application designed for extracting and archiving URLs from text and PDF files. Its intuitive interface requires minimal user input and ensures efficient management of URLs.

Getting Started

Windows

Open Command Prompt, navigate to the application's directory, and execute:

./run.ps1

Linux / MacOS

Open Terminal, navigate to the application's directory, and run:

./run.sh

Operating Instructions

Upon launch, provide a path to a text or PDF file, or a directory containing such files. The application will process and display URLs sequentially.

Navigation

Use the following keys to navigate through the application:

o: Open the current URL in the default web browser.
a: Access the Archive Menu to archive the URL.
s: Show a list of previously archived URLs.
u: Update and view pending archive jobs.
n: Navigate to the next URL.
q: Quit the application.
c: Change application settings.
h: Access the Help Menu for assistance.

Archiving URLs

Choose between archiving to Wayback Machine, Archive.today, both, or canceling.

When opting to use Archive.today for archiving, an automated browser session will initiate, requiring you to complete a captcha. Once resolved, the URL is archived, and the corresponding archived version is then collected and stored within the application.

Configuration

Customize Access/Secret Keys and the default browser. Current settings are shown with default values in brackets.

To get your S3-Credentials, follow the instructions in Getting S3-Credentials (Wayback Machine).

Exiting

To exit, press q. If a Bibtex file was provided, you'll be prompted to save the archived URLs in the Bibtex file.
Otherwise, or after saving the URLs in the Bibtex file, you'll be prompted to save the archived URLs in a CSV file.

For Bibtex entries:

Without an existing note field, URLs are added as: note = {Archived Versions: \url{url1}, \url{url2}}
With a note field, they're appended as: note = {, Archived Versions: \url{url1}, \url{url2}}

Getting S3-Credentials (Wayback Machine)

To generate your S3-Credentials, you need a Wayback Machine profile, which you can create here.

Generate S3-Credentials

Login to your Wayback Machine profile here.
Open this link to generate your S3-Credentials. If needed you can also delete your S3-Credentials on this page.

Project Status and Future Contributions

Current Development Status

Currently, the development of URL-Archiver by the original team is on hold. This is due to our work and academic commitments, which prevent us from dedicating the necessary time to further develop the project in the near future.

Open for Contributions

We welcome and encourage the open-source community to contribute to the development and enhancement of the URL-Archiver.

If you are interested in contributing, please ensure that any contributions adhere to the project's existing license terms.

We look forward to seeing how the URL-Archiver grows and evolves with the community's support and contributions.

Future Work

While we are currently not in a position to actively pursue these enhancements, we believe the following improvements would significantly contribute to the project's evolution and utility:

Deinstallation

To deinstall the application, simply delete the folder containing the application.

Licenses and Attributions

This project uses the following open-source software:

Library	License
JUnit Jupiter API	Eclipse Public License v2.0 (EPL-2.0)
JUnit Jupiter Engine	Eclipse Public License v2.0 (EPL-2.0)
Selenium Java	Apache License 2.0
Selenium Logger	MIT License
Mockito Core	MIT License
Mockito JUnit Jupiter	MIT License
System Lambda	MIT License
Apache PDFBox	Apache License 2.0
Jackson Core	Apache License 2.0
Jackson Dataformat XML	Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 635 Commits
LICENSES		LICENSES
bin		bin
report		report
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.ps1		build.ps1
build.sh		build.sh
build_and_run.ps1		build_and_run.ps1
build_and_run.sh		build_and_run.sh
pom.xml		pom.xml
run.ps1		run.ps1
run.sh		run.sh

License

devobern/URL-Archiver

Folders and files

Latest commit

History

Repository files navigation

URL-Archiver

Authors

Supervisor

Installation

Requirements

Clone the repository

Build and run scripts

Windows

Build the application

Run the application

Build and run the application

Linux

Build the application

Run the application

Build and run the application

MacOS

Build the application

Run the application

Build and run the application

User Manual

Getting Started

Windows

Linux / MacOS

Operating Instructions

Navigation

Archiving URLs

Configuration

Exiting

Getting S3-Credentials (Wayback Machine)

Generate S3-Credentials

Project Status and Future Contributions

Current Development Status

Open for Contributions

Future Work

Deinstallation

Licenses and Attributions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages