Skip to content

Merkle Tree Filter Module

Miguel Guimarães edited this page Apr 6, 2022 · 4 revisions

Merkle Tree Filter Module

In cryptography and computer science, a hash tree or Merkle tree is a tree in which every leaf node is labelled with the cryptographic hash of a data block, and every non-leaf node is labelled with the hash of the labels of its child nodes. Hash trees allow efficient and secure verification of the contents of large data structures. Hash trees are a generalization of hash lists and hash chains. Wikipedia.

Use Case

This filter module is used to generate a hash which represents the content of large data structures. This process aims to fulfill the requirement of completeness and correctness of the archival process, i.e. to ensure that no message is lost (not archived or not correctly archived).

Implementation

For every cell a hash can be calculated based on the content with the cell order well defined. As hashes have always the same length, to calculate the row hash simply concatenate all cell hashes and hash the result. To hash the table, the rows sub-set and order must be well defined. Ensure the rows are ordered by primary key or other unique non-null column. Concatenate the hashes of all rows and hash the result. To hash the schema concatenate all the hashes for all tables and hash the result. The same process applies to find out the database hash.

A simplistic representation of Merkle tree calculation can be found on the image bellow.

Result

Produces a JSON file with the following information:

  • The cryptographic algorithm used to digest the data;
  • Information about the schemas, table and columns used for the calculation;
  • The top-hash value.

Optionally it can contain every leaf node used for the calculation for debug purpose.

Bellow you can find an example of the file generated by this filter module.

{
  "merkle": {
    "algorithm": "SHA-256",
    "schemas": [
      {
        "sakila": {
          "tables": [
            {
              "actor": {
                "columns": [
                  "actor_id",
                  "first_name",
                  "last_name",
                  "last_update"
                ]
              }
            }
          ]
        }
      }
    ],
    "topHash": "EC2DF58B4EEF15E3CF23E5415F75F69AFCB2D566230AB8AF1F2412E56C97549D"
  }
}

How to use

dbptk migrate -i <import-module> [args] -e [export-module] [args] -f merkle-tree -f1f <path>

Advanced configurations

Change the cryptographic algorithm

dbptk migrate -i <import-module> [args] -e [export-module] [args] -f merkle-tree -f1f <path> -f1d SHA-256

Change the font case

dbptk migrate -i <import-module> [args] -e [export-module] [args] -f merkle-tree -f1f <path> -f1fc lowercase

Activate the debug mode

dbptk migrate -i <import-module> [args] -e [export-module] [args] -f merkle-tree -f1f <path> -f1e

Integration with the import-config module

By default every column is used for the Merkle tree calculation. It is possible to explicit choose which columns to include. This can be done with the import-config module that can be consulted here.