Paperbaum is a decentralized academic paper publishing and verification system built on a custom Substrate-based parachain. It addresses issues of authorship verification, restricted access, and inefficient paper linking in academic publishing.
Link to the slides: here
- Substrate Parachain: Custom runtime for paper metadata storage in a merkle tree and verification.
- IPFS Integration: Decentralized storage for full paper content.
- Vector Similarity Engine: NLP-based system for semantic paper linking.
The core of Paperbaum is built on a custom Substrate parachain, providing a robust and flexible foundation for academic paper management and verification. The custom pallet uses a Merkle tree to natively link papers together. This pallet provides functionality for:
- Managing a Merkle tree of paper hashes
- Verifying Merkle proofs
- Storing and retrieving paper metadata
- Enforcing size limits on various paper attributes
Paperbaum leverages the InterPlanetary File System (IPFS) for decentralized storage of full paper content. This integration ensures that papers are stored in a distributed, content-addressed manner, enhancing accessibility and permanence.
Paperbaum implements a vector similarity engine for semantic paper linking. This system uses OpenAI's text embedding model to generate vector representations of papers, enabling efficient similarity searches. The generateEmbedding
function creates a vector representation of text, while cosineSimilarity computes the similarity between two vectors.
When a paper is uploaded, Paperbaum processes the PDF, extracts key metadata, and generates a vector representation:
- PDF text extraction
- Metadata extraction using GPT4o-mini
- Vector embedding generation
- IPFS upload
- Storage of metadata and vector in-memory in a merkle tree
For the parachain, first compile it using
cargo build --release
and then run
./target/release/node-template --dev
to run the substrate node on 127.0.0.1:9944
To run the backend server, enter the backend
directory and run
npm install
and then proceed to run
node server.js
to run the server on localhost:3000
To run the frontend, enter the frontend
directory and run
npm install
and then
npm run dev
to run the frontend on localhost:3001
- Develop a more sophisticated Merkle tree structure for efficient paper linking and verification.
- Implement Merkle Mountain Ranges (MMR) for dynamic dataset management, allowing efficient updates and proofs of inclusion.
- Develop a ZK-based reputation system for anonymous yet credible peer reviews.
- Create ZK proofs for citation verification without revealing full paper contents.
- Implement double-blind review processes using ZK proofs.
- Develop a reputation system for reviewers based on the quality and timeliness of their reviews.
- Develop cross-chain citation verification and tracking.
- Create a system for recognizing academic credentials and reputations across different blockchain networks.
- Implement versioning and provenance tracking of papers using OriginTrail's blockchain-agnostic protocol.
- Develop an AI-assisted discovery system leveraging OriginTrail's semantic data structure.
See MIT License