Athena is a Streamlit-based application designed to assist users in exploring and analyzing historical web archive data. The app utilizes advanced AI models to analyze trends, generate insights, and provide users with actionable suggestions. With a focus on temporal history and web page health, Athena offers a robust platform for understanding the evolution and resilience of web content over time.
- AI-Powered Chat: Athena uses OpenAI's models to interact with users, answering queries and providing insights into web archive data.
- Trend Analysis: Analyze trends in web archives, including webpage health, content stability, and availability over time.
- Interactive Visualizations: View trends and metrics via interactive charts and graphs displayed directly in the app.
- Function Execution: Athena can execute functions like fetching CDX data, retrieving historical snapshots, and performing trend analysis based on user queries.
Athena/
│
├── assets/
│ └── favicon.ico # Icon used in the app
├── config/
│ ├── function_schemas.py # Function schemas for validating inputs
│ ├── openai_config.py # Configuration for OpenAI API
│ ├── router_schemas.py # Schemas for routing and intent handling
│ └── suggestions.py # List of suggestions used in the app
│
├── services/
│ ├── openai_service.py # Service handling communication with OpenAI API
│ ├── semantic_router_service.py # Service for determining intent from user input
│ └── wayback_service.py # Service for fetching and analyzing data from Wayback Machine
│
├── utils/
│ ├── cdxdata.py # Utility for handling CDX data
│ ├── extract_text.py # Utility for extracting text from web archives
│ ├── fetch_data_wayback.py # Utility for fetching data from Wayback Machine
│ ├── loadcdx.py # Utility for loading CDX data
│ ├── snapinfo.py # Utility for handling snapshot information
│ └── trend_analysis.py # Core utility for analyzing trends in web archives
│
├── .env # Environment variables, including OPENAI_API_KEY
├── main.py # Main application file
├── requirements.txt # Python dependencies for the project
└── README.md # This file
- Python 3.7 or higher
- pip (Python package installer)
- An OpenAI API Key
-
Clone the repository:
git clone https://github.com/internetarchive/wbm_ai_sum.git cd wbm_ai_sum
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
- Create a
.env
file in the root directory with the following content:OPENAI_API_KEY=your-openai-api-key SYSTEM_PROMPT="Already defined system prompt"
- Replace
your-openai-api-key
with your actual OpenAI API key.
- Create a
-
Run the app:
streamlit run main.py
To run the app using Docker, you can use the following commands:
-
Build the Docker image:
docker build -t athena-app .
-
Run the Docker container:
docker run -p 8501:8501 athena-app
- The app provides an interactive chat interface where you can type queries and interact with Athena.
- The sidebar includes suggestions that can help guide your queries.
- The app analyzes trends in web archives, offering visualizations for Webpage Health, Content Stability, and Availability.
- These trends are displayed in both the main content area and the sidebar.
- Athena is capable of executing several functions based on your queries:
- fetch_cdx_data: Fetches CDX data for a specified URL.
- fetch_data_wayback: Retrieves data for a specified URL and timestamp.
- get_trend_analysis: Analyzes and visualizes trends for a specified URL.
Contributions are welcome! Please fork the repository and create a pull request to contribute.
If you encounter any issues or have suggestions for new features, please report them on the issue tracker.
See the LICENSE file for details.
- OpenAI for providing the AI models.
- Wayback Machine for providing web archive data.