Ananke2 Knowledge Framework

Overview

Ananke2 is a comprehensive multi-modal knowledge extraction framework designed for processing and analyzing academic papers and technical documents. The framework supports:

Multi-modal information processing (text, math expressions, logic expressions, code snippets)
Knowledge graph extraction with semantic triple tracking
Vector embeddings for semantic search
Multi-language support (English, Chinese, French, German, Japanese)
Document chunking with structured sentence parsing
Asynchronous task processing with Redis queue

Architecture

Database Layer

Graph Database (Neo4j)
- Knowledge graph storage
- Entity and relationship management
- Semantic triple storage with hit count tracking
- Graph-based querying capabilities
Vector Database (ChromaDB)
- Embedding storage for semantic search
- Multi-modal content vectorization
- Similarity search capabilities
- Document chunk embeddings
Structured Database (MySQL)
- Traditional data storage
- Metadata management
- Document structure information
- Processing task status tracking

Task Processing

Redis Task Queue
- Asynchronous task management
- Document processing workflow
- Retry mechanism for failed tasks
- Task status monitoring

Document Processing

Multi-modal Content Extraction
- PDF document parsing
- LaTeX content processing
- Mathematical expression extraction
- Code snippet identification
- Logic expression parsing
Document Chunking
- Structured sentence parsing
- Context-aware chunking
- Cross-reference preservation
- Multi-language chunk handling
Knowledge Graph Extraction
- Entity identification
- Relationship extraction
- Semantic triple generation
- Hit count tracking
- Multi-language entity linking

Setup

Prerequisites

Python 3.12+
Docker and Docker Compose
Python venv module

Installation

Clone the repository:

git clone https://github.com/ceresman/ananke2.git
cd ananke2

Create and activate virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

Configure pip to use BFSU mirror:

pip config set global.index-url https://mirrors.bfsu.edu.cn/pypi/web/simple

Install dependencies:

pip install -r requirements.txt
pip install -r requirements-dev.txt  # For development

Configure environment:

cp .env.example .env
# Edit .env with your credentials and API keys

Start services:

docker-compose up -d

Troubleshooting

If you encounter SSL errors with the BFSU mirror, try:

pip install --trusted-host mirrors.bfsu.edu.cn -r requirements.txt

For permission errors on Unix/Linux:

python -m venv .venv --system-site-packages

Configuration

Key environment variables in .env:

# Database Configuration
NEO4J_URI=bolt://0.0.0.0:7687
CHROMA_PERSIST_DIRECTORY=/path/to/chroma
MYSQL_HOST=0.0.0.0

# Redis Configuration
REDIS_HOST=0.0.0.0
REDIS_PORT=6379

# API Keys
QWEN_API_KEY=your-api-key

# Network Configuration
HOST=0.0.0.0
PORT=8000

Usage

Document Processing

from ananke2.tasks.document import process_document

# Process an arXiv paper
document_id = await process_document(arxiv_id="2301.00001")

# Process a local PDF
document_id = await process_document(file_path="/path/to/paper.pdf")

Knowledge Graph Queries

from ananke2.database.graph import Neo4jInterface

# Query entities by type
entities = await graph_db.search({
    "type": "TECHNOLOGY",
    "limit": 10
})

# Get related entities
related = await graph_db.get_relations(entity_id)

Semantic Search

from ananke2.database.vector import ChromaInterface

# Search similar documents
results = await vector_db.search({
    "query": "quantum computing applications",
    "limit": 5
})

Multi-language Support

from ananke2.tasks.document import process_document

# Process documents in different languages
doc_zh = await process_document(file_path="paper_zh.pdf", language="zh")
doc_fr = await process_document(file_path="paper_fr.pdf", language="fr")

API Documentation

REST Endpoints

POST /api/v1/documents: Upload and process new documents
GET /api/v1/documents/{id}: Retrieve document information
GET /api/v1/entities: Query knowledge graph entities
GET /api/v1/search: Semantic search across documents

WebSocket Events

document_status: Real-time document processing status
extraction_progress: Knowledge extraction progress updates

Development

Running Tests

python -m pytest tests/

Code Style

black .
isort .
flake8

License

Apache License 2.0

Contributing

Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
app		app
doc		doc
docker/mysql/init		docker/mysql/init
test_docs		test_docs
tests		tests
web		web
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
celery-broker.sqlite		celery-broker.sqlite
celery-results.sqlite		celery-results.sqlite
docker-compose.yml		docker-compose.yml
get-docker.sh		get-docker.sh
project_log.md		project_log.md
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_paper.pdf		test_paper.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ananke2 Knowledge Framework

Overview

Architecture

Database Layer

Task Processing

Document Processing

Setup

Prerequisites

Installation

Troubleshooting

Configuration

Usage

Document Processing

Knowledge Graph Queries

Semantic Search

Multi-language Support

API Documentation

REST Endpoints

WebSocket Events

Development

Running Tests

Code Style

License

Contributing

About

Releases

Packages

Contributors 2

Languages

ceresman/ananke2

Folders and files

Latest commit

History

Repository files navigation

Ananke2 Knowledge Framework

Overview

Architecture

Database Layer

Task Processing

Document Processing

Setup

Prerequisites

Installation

Troubleshooting

Configuration

Usage

Document Processing

Knowledge Graph Queries

Semantic Search

Multi-language Support

API Documentation

REST Endpoints

WebSocket Events

Development

Running Tests

Code Style

License

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages