An implementation of the ERC-time-ordered-distributable-database (TODD) as a generic library. It can be used to make data TODD-compliant to facilitate peer-to-peer distribution.
Status: prototype
To test out a new database design, where user participation makes the entire database more available.
Questions for you:
- Do you have data that grows over time and that you would like users to host?
- Are you providing data as a public good and are wondering how to wean to community?
Min-know makes data into an append-only structure that anyone can publish to. Distribution happens like a print publication where users obtain Volumes as they are released. A user becomes a distributer too.
Volumes contain Chapters that can be obtained separately. This effectively divides the database, making large databases manageable for resource-constrained users.
📘🔍🐟
To make any database TODD-compliant so that data-users become data-providers.
TODD-compliance is about:
- Delivering a user the minimum knowledge that is useful to them.
- Delivering a user some extra data.
- Making it easy for a user to become a data provider for the next user.
A minnow is a small fish 🐟 that can be part of a larger collective.
Data is published in Volumes
.
📘 - A Volume
Volumes are added over time:
📘 📘 📘 📘 📘 ... 📘 <--- 📘 - All Volumes (published so far).
Volumes
have Chapters
for specific content. Chapters
can be obtained individually.
- 📘 An example volume with 256
Chapters
- 📕
0x00
First chapter (1st) - ...
- ...
- 📙
0xff
Last Chapter (256th)
- 📕
A Manifest
📜 exists that lists all Chapters for all Volumes. A manifest
simple contains IPFS hashes for data (see example manifests).
A user can check the manifest and find which Chapter
is right for
them. They can ignore the IPFS hashes that don't match their needs.
📜🔍🐟
The user starts with something they know (a key), for example, an address. For every key, only one Chapter will be important.
- User (🐟) key is an address:
0xf154...f00d
. - Data is divided into chapters using the first two characters of address (
Chapter
=0xf1
)
Visually:
- 📕
0x00
- ...
- ...
- 📗
0xf1
<--- 🐟0xf154...f00d
(user only needs thisChapter
) - ...
- ...
- 📙
0xff
For every published Volume
, the user only downloads the right Chapter
for their needs.
The Min-know library automates this by using the CIDs in the manifest to find files on IPFS.
This means obtaining one Chapter
from every Volume
that has ever been published.
Hence, the user 🐟 only needs 1/256th of the entire database.
Once downloaded, the Chapters
can be queried for useful information that
the database contains.
Optionally, they can also pin their Chapters
to IPFS, which makes the data
available from more sources.
Iteraction with the library occurs the Todd
struct ([database::types::Todd
]) through the methods:
- For users:
obtain_relevant_data()
check_completeness()
find()
- For maintainers:
full_transformation()
extend()
repair_from_raw()
generate_manifest()
manifest()
See ./ARCHITECTURE.md for how this library is structured.
All examples can be seen with the following command:
cargo run --example
See ./examples/README.md for more information.
See ./DATABASES.md for different databases that have been implmemented in this library.
The maintainer methods in the examples are used to create and extend a TODD-compliant database.
This requires having a local "raw" source, which will be different for every
data type. The library will use the methods in the ./extraction
module
to convert the data.
For example:
- The address-appearance-index is created and maintained by having locally available Unchained Index chunk files (produced by trueblocks-core https://github.com/TrueBlocks/trueblocks-core)). They are parsed and reorganised to form the TODD-compliant format.
- The nametags database is created and maintained by having individual files (one per address) that contain JSON-encoded names and tags.
Other raw formats might be flat files containing data of various kinds.
See ./GETTING_STARTED.md for how to use min-know for a new database.
TODD-compilance is about coordination by default (e.g., having a Schelling point for a distributed database).
The manifest contains the CIDs of all the Chapters for a given database. A new manifest is created when a database is updated and new CIDs are added. Old CIDs remain unchanged.
After creating the manifest, that person can post it under their own IPNS. Anyone who knows this IPNS can watch for new manifests to be published there.
To broadcast that you are going to publish, you can perform a single transaction to a broadcasting contract (https://github.com/perama-v/GAMB) to record your IPNS name with the topic you wish to publish (the name of the database you are publishing). An example GAMB-compliant contract might look like: PublisherRegistry.sol, with main functions as follows:
/// @notice Record the IPNS of a publisher who will publish for a topic.
/// @dev Maps the given IPNS to the specified topic, appending to existing submissions.
function registerPublisher(string memory topic, string memory ipns_of_publisher) public {
topics.push(topic);
publisherHashMap[topic].push(Publisher({submitted_by: msg.sender, ipns: ipns_of_publisher}));
emit NewPublisher(msg.sender, topic, ipns_of_publisher);
}
/// @notice Gets all publishers for a topic.
/// @dev Gets the Publishers that are mapped to the given topic string.
/// @return Returns an array of Publishers.
function getPublishersForTopic(string memory topic) public view returns (Publisher[] memory) {
return publisherHashMap[topic];
}
After this single transaction, you can update your IPNS to the latest manifest hash for free.
The purpose of the contract is two-fold:
- Discovery (anyone can find publishers for a topic from a single "meeting point".)
- Censorshiop resistance (no one can stop you from posting your IPNS to a topic.)
Anyone else can also submit their IPNS name to the contract and publish new volumes for the database. While not yet implmemented, the process of checking that contract, fetching manifests, comparing the CIDs they contain and coordinating to collaborate on publishing can all be automated.
While not implmemented in this library, it is intended that end-users of a TODD-compliant diatabase could automatically pin any Chapters they download. This could be an opt out process and could result in most users contributing significantly to the long term availability of data.
See ./FAQ.md
This is a very experimental library that is mostly an exploration for feasibility analysis.
The library is not currently being used to deliver data to real end users. Though it is designed to be readily implemented f (see ./GETTING_STARTED.md) that can all share the same core of the library.
Does the idea interest you? A suited for?
- twitter: @eth_worm
- github @perama-v
Raise an issue or say hi ❤