Skip to content

Latest commit

 

History

History
120 lines (89 loc) · 13.8 KB

Ganjine.md

File metadata and controls

120 lines (89 loc) · 13.8 KB

Ganjine - DataBase Library

As Albert Einstein said, "No problem can be solved from the same level of consciousness that created it." So in big data era, we can't solve data persistent problem with old concept like RDBMS! A lot of people say "We want a scalable relational database", But they don't exactly know what they want and we almost can say they don't want that at all! that's only because they were trying to treat "Big Data" the same way they treated data with an RDBMS: "by conflating data and views and relying on incremental algorithms".

Ganjine build on this concept: "No one can go back in time to change the truthfulness of a piece of data"

Ganjine is not standalone application instead it is highly customizable embedded library to make multi model database application by acquire stream data model in three layer architecture or even use in one layer architecture! It doesn't know anything about data structure and work like a key/value data store engine with support transaction, secondary index and subscribe (notifications about a record or index)!

Like other SabzCity projects this project will also develope in automatic manner not admin rule!! In this case not just automate by increase or decrease amount of data but also automate if data store cluster need more or less computing power to handle requests!

Ganjine need PersiaOS as standalone OS or transition version for storage engine and networks! So it can't run and throw panic if it can't connect to PersiaOS! We rule this because exiting storage engines even newest one have some fundamental problem like they build in top of exiting FileSystems as defaults or don't respect cluster size of storage devices that need more logic handle by processors!!

If you check any databases you can feel that all of them want to do all thing about data by itself and by this choice it is so hard to scale out data layer of a platform! So we decide some rules to take many logic part form database app except store and retrieve data and handle in logic layer of platform. So Ganjine can scale with your app scalability architecture!

Ganjine can use for any purpose like Multi Model DataStore, Data Mining, Machine Learning & AI (Artificial intelligence)

Architecture

  • Ganjine has fully distributed architecture not decentralized one!!
  • Architect with minimum layer and logic to gain max performance!
  • Ganjine doesn't use coordinator(cassandra, ...) or lease(cockroachdb, ...) architecture! These types of architectures use unneeded computing and network capacity! Ganjine SDK in business logic layer app of platform indicate exact node to read or write records!!
  • Unlike other distributed DB, Ganjine doesn't use follower(cockroachdb, yugabyte, ...) or slave() architecture everywhere! Choose a leader from a cluster replication drop efficency that can get from connect business logic layer app to closest data layer app. Almost always first class data centers use for data stores have very stabile physical connection to other same class DC other than ISP servers. So each node in each cluster is master||leader for given ranges primary index!
  • Just in transaction find and write service, closest node to business logic layer app get read||write and coordinated with master secondary index range and do request! if master not response, by type of error situation, all or at least 2/3 related range nodes agreeing to choose new master node. This decide is intelligence choose by many factors like daily get read or write request for that range not a random choose!!
  • In normal write request, when write to disk finished and record async send to replicated nodes, business logic layer get response that write record have been successfully done!
  • See simple architecture Diagram here

CRUD

There is no update data, just set, get and delete in storage engine layer. any Secondary hash index belong to the time series record can be used as version control of that record, So each record in each time(version) have unique hash as RecordID. With this rule in every layer we can cache a record by its RecordID and it is guaranty that always be same data!

Transaction

Due to transaction has many different scenario in each situation in distributed computing, Developer can send multi request in one connection but failed process must handle in logic layer.

Secondary Index

Many number of secondary index can implement, But some consideration exist due to Ganjine is distributed database! Simplest secondary index like primary one is hash index that must implement without any question! It is base of transaction feature!

Considerations

Considerations are about choose between available solutions. we discuss here about logic behind each decision!

Replication Strategy

If more than one replication zone exist that we strongly suggest have three replication, Replicate a record can do

  • In business logic app! But in this layer we lost performance due app must connect to all replicated nodes in all zones! Due to first class data-centers that use for data-stores usually have stable connection rather than ISP data-centers!
  • In each data layer nodes to replicate node to nodes!

We implement first choose in Ganjine!

Partition or sharding selection

Select right shard node to send get or set request can implement in three way:

  • Client side! Client SDK select related node to ID range!
  • Proxy assisted! A third party app select related node to ID range!
  • Server side partitioning! Any DB get request and select related node to ID range!

We implement first choose in Ganjine!

Add new node

When exiting nodes reach capacity and need new node on each replication, we can choose between

  • Split data range from node with capacity problem and copy all record with desire range to new node. Due to this we have 2 node with more capacity in both two new nodes!
  • Split data range from node with capacity problem and Add node to node list, but hold range responser node with old node! in this situation on each get and set record on out of real range of node, two more network roundtrip needed!

We implement first choose in Ganjine!

Secondary Index Records

We have two decision for these type of records:

  • When making cluster, make max number of record for secondary index that take a lot of space!
  • In node splitting, Make two empty secondary index record and read entirely exiting record and transfer each index to proper new record!

We implement second choose in Ganjine!

RecordID

RecordID can be one of below situation that have some drawbacks.

  • real 128bit universal unique ID with random algorithm
    • Add new node to some ranges force database engine move some record to new node!
  • make by 64bit as time in second format plus 64bit hash of record data!
    • All write go to just one node even in petabyte platforms with billion users!

We implement both choose in Ganjine and developer can choose it!

Limiting transaction number per index

Due to network latency and transaction architecture that let just one action at a time, If every transaction just get 100ms so in each minute 600, each hour 36,000 and in each day just 864,000 transaction can be done on each specific record like user financial ballance! To beat this limitation devs must change thinking way to solve problems e.g. just make transaction on decreasing operation of financial not increasing! and check ballance healthy in UI app or croon jobs and have a service that can use to recalculate user balance if some thing wrong exist!

Not Goals

You can always do what you want to do in your app level but we don't support all needed in engine but some of them might be supported in Ganjine SDK!

SQL - Structured Query Language

We don't support SQL due to it is a DSL that means developers need to learn lot of new things for almost nothing that simple RPC give them! Also we believe SQL is not good language in term to respect DevOps culture! But for those developers that don't agree with us we plan to add some SQL parser & convertor to SDK for those developers!

Schema Changing

It is so huge work when you don't write data structure with data unlike JSON! It is not our goals to handle in this engine to let you easy old data schema changing! We think to support this in SDK!

B-Tree index

Due to have distributed high performance index engine and Ganjine don't know anything about data in a record, we decide to not support range query like notEquals(!=, <>), BETWEEN, LARGER, ORDER, ... so don't support b-tree index! We suggest change data model and transfer range query to UI not here! We believe language need search engines that understand languages not just simple equal bytes!!

Table, Namespace, Sets, ...

We don't offer any splitting data to logical block as named! Developers can achieve this usage by make multi subdomain app for each desire usages e.g. db1.sabz.city, db2.sabz.city, ... or even have better name like big.db.sabz.city for big data like videos and archive.db.sabz.city for any old data not use daily anymore like dead people data for 100 year ago!

Implementations

Inspired of

Articles

Ganjine word meaning

"Ganjine" (Persian: گنجینه) is an archaeological term for a collection of valuable objects or artifacts, sometimes purposely buried in the ground, in which case it is sometimes also known as a cache.