Skip to content

perama-v/min-know

Repository files navigation

Min-know

An implementation of the ERC-time-ordered-distributable-database (TODD) as a generic library. It can be used to make data TODD-compliant to facilitate peer-to-peer distribution.

Status: prototype

Why does this library exist?

To test out a new database design, where user participation makes the entire database more available.

Questions for you:

  • Do you have data that grows over time and that you would like users to host?
  • Are you providing data as a public good and are wondering how to wean to community?

Min-know makes data into an append-only structure that anyone can publish to. Distribution happens like a print publication where users obtain Volumes as they are released. A user becomes a distributer too.

Volumes contain Chapters that can be obtained separately. This effectively divides the database, making large databases manageable for resource-constrained users.

Principles

📘🔍🐟

To make any database TODD-compliant so that data-users become data-providers.

TODD-compliance is about:

  1. Delivering a user the minimum knowledge that is useful to them.
  2. Delivering a user some extra data.
  3. Making it easy for a user to become a data provider for the next user.

A minnow is a small fish 🐟 that can be part of a larger collective.

End Users

Data is published in Volumes.

📘 - A Volume

Volumes are added over time:

📘 📘 📘 📘 📘 ... 📘 <--- 📘 - All Volumes (published so far).

Volumes have Chapters for specific content. Chapters can be obtained individually.

  • 📘 An example volume with 256 Chapters
    • 📕 0x00 First chapter (1st)
    • ...
    • ...
    • 📙 0xff Last Chapter (256th)

A Manifest 📜 exists that lists all Chapters for all Volumes. A manifest simple contains IPFS hashes for data (see example manifests). A user can check the manifest and find which Chapter is right for them. They can ignore the IPFS hashes that don't match their needs.

📜🔍🐟

The user starts with something they know (a key), for example, an address. For every key, only one Chapter will be important.

  • User (🐟) key is an address: 0xf154...f00d.
  • Data is divided into chapters using the first two characters of address (Chapter = 0xf1)

Visually:

  • 📕 0x00
  • ...
  • ...
  • 📗 0xf1 <--- 🐟 0xf154...f00d (user only needs this Chapter)
  • ...
  • ...
  • 📙 0xff

For every published Volume, the user only downloads the right Chapter for their needs. The Min-know library automates this by using the CIDs in the manifest to find files on IPFS.

This means obtaining one Chapter from every Volume that has ever been published. Hence, the user 🐟 only needs 1/256th of the entire database.

Once downloaded, the Chapters can be queried for useful information that the database contains.

Optionally, they can also pin their Chapters to IPFS, which makes the data available from more sources.

Interface

Iteraction with the library occurs the Todd struct ([database::types::Todd]) through the methods:

  • For users:
    • obtain_relevant_data()
    • check_completeness()
    • find()
  • For maintainers:
    • full_transformation()
    • extend()
    • repair_from_raw()
    • generate_manifest()
    • manifest()

Architecture

See ./ARCHITECTURE.md for how this library is structured.

Examples

All examples can be seen with the following command:

cargo run --example

See ./examples/README.md for more information.

Databases

See ./DATABASES.md for different databases that have been implmemented in this library.

Database Maintainers

The maintainer methods in the examples are used to create and extend a TODD-compliant database.

This requires having a local "raw" source, which will be different for every data type. The library will use the methods in the ./extraction module to convert the data.

For example:

  • The address-appearance-index is created and maintained by having locally available Unchained Index chunk files (produced by trueblocks-core https://github.com/TrueBlocks/trueblocks-core)). They are parsed and reorganised to form the TODD-compliant format.
  • The nametags database is created and maintained by having individual files (one per address) that contain JSON-encoded names and tags.

Other raw formats might be flat files containing data of various kinds.

Extend the library for your data

See ./GETTING_STARTED.md for how to use min-know for a new database.

Manifest coordination using a smart contract

TODD-compilance is about coordination by default (e.g., having a Schelling point for a distributed database).

The manifest contains the CIDs of all the Chapters for a given database. A new manifest is created when a database is updated and new CIDs are added. Old CIDs remain unchanged.

After creating the manifest, that person can post it under their own IPNS. Anyone who knows this IPNS can watch for new manifests to be published there.

To broadcast that you are going to publish, you can perform a single transaction to a broadcasting contract (https://github.com/perama-v/GAMB) to record your IPNS name with the topic you wish to publish (the name of the database you are publishing). An example GAMB-compliant contract might look like: PublisherRegistry.sol, with main functions as follows:

/// @notice Record the IPNS of a publisher who will publish for a topic.
/// @dev Maps the given IPNS to the specified topic, appending to existing submissions.
function registerPublisher(string memory topic, string memory ipns_of_publisher) public {
    topics.push(topic);
    publisherHashMap[topic].push(Publisher({submitted_by: msg.sender, ipns: ipns_of_publisher}));
    emit NewPublisher(msg.sender, topic, ipns_of_publisher);
}
/// @notice Gets all publishers for a topic.
/// @dev Gets the Publishers that are mapped to the given topic string.
/// @return Returns an array of Publishers.
function getPublishersForTopic(string memory topic) public view returns (Publisher[] memory) {
    return publisherHashMap[topic];
}

After this single transaction, you can update your IPNS to the latest manifest hash for free.

The purpose of the contract is two-fold:

  1. Discovery (anyone can find publishers for a topic from a single "meeting point".)
  2. Censorshiop resistance (no one can stop you from posting your IPNS to a topic.)

Anyone else can also submit their IPNS name to the contract and publish new volumes for the database. While not yet implmemented, the process of checking that contract, fetching manifests, comparing the CIDs they contain and coordinating to collaborate on publishing can all be automated.

Pin by default to IPFS

While not implmemented in this library, it is intended that end-users of a TODD-compliant diatabase could automatically pin any Chapters they download. This could be an opt out process and could result in most users contributing significantly to the long term availability of data.

Frequently Asked Questions

See ./FAQ.md

Contributing

This is a very experimental library that is mostly an exploration for feasibility analysis.

The library is not currently being used to deliver data to real end users. Though it is designed to be readily implemented f (see ./GETTING_STARTED.md) that can all share the same core of the library.

Does the idea interest you? A suited for?

  • twitter: @eth_worm
  • github @perama-v

Raise an issue or say hi ❤

About

A library for TODD-compliant data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages