Releases: dingodb/dingo
Release Notes v0.9.0
Release Notes v0.9.0
1. New Features
1)License Management Mechanism
Introduced a License management feature to protect DingoDB's intellectual property. With the License activation and management tools, users can easily manage and monitor software usage, ensuring legal and compliant use.
2)Single Machine Lite Version of DingoDB
Implemented a Single Machine Lite version of DingoDB, lowering the usage threshold for users. This version can run on a single machine without complex distributed deployment, making it ideal for development, testing, and small-scale application scenarios, helping users quickly get started and validate DingoDB's capabilities.
3)New C++ SDK
Provided a new C++ SDK to facilitate secondary development and integration for developers. The SDK offers a rich set of APIs that support efficient data operations and management, enhancing development efficiency.
2. Feature Optimizations
2.1 Storage Layer Optimization
1)Braft Modification
Added support for controlling election priority among peers, addressing the vector index Leader balancing issue. This helps improve cluster stability and performance, avoiding single-point overloads.
2)Prefilter Performance Enhancement
Adjusted the data structure of ScalarData to improve pre-filtering rates. Through data structure optimization, DingoDB can process data filtering faster, reducing query latency.
3)Instruction Set Expansion
In addition to the default SSE instructions for vector distance calculation functions FAISS and HNSWLIB, added support for AVX2/AVX512 instruction sets. By expanding the instruction set, vector computation efficiency is improved, particularly in high-performance computing environments.
4) Vector Distance Calculation Performance Improvement
Implemented runtime CPU instruction set acceleration switching, significantly enhancing vector distance calculation performance. The system automatically switches instruction sets (e.g., SSE, AVX2, AVX512) as needed, especially effective in handling large-scale datasets.
5)Leader Balance Rate Improvement
Optimized algorithms to improve the balance rate of Leaders in the cluster. The improved election algorithm ensures a more balanced distribution of Leaders, enhancing the overall system performance.
6)Vector Index Data Insertion Performance Improvement
Optimized the insertion performance of IVF_FLAT and IVF_PQ vector indexes. The improved insertion algorithm increases data insertion efficiency, reducing index build time.
7)Synchronization Operation Performance Optimization
Optimized the synchronization performance of BThread and PThread. By reducing the overhead of thread synchronization, the performance in multi-threaded environments is enhanced.
8)Vector Search Performance Improvement
Adjusted parameters such as Region size, Region count, and number of threads to effectively improve vector search performance. Optimized resource allocation significantly boosts search response speed and efficiency.
2.2 Computation Layer Optimization
1)Log System Optimization
Revamped the Executor layer log system to provide full-link log information, enhancing log traceability. The improved log system records more detailed information, helping users comprehensively monitor and analyze various events during task execution.
2)Observable Metrics
Introduced a new Metric information statistics feature to monitor job metrics at various stages, enhancing task observability. By monitoring performance metrics in real time, users can better understand task execution and performance bottlenecks, improving system operation and maintenance efficiency.
Release Notes v0.8.0
Release Notes v0.8.0
Major New Features
1. Distributed Transaction
The addition of distributed transaction capabilities meets the core ACID features of the database, ensuring the integrity and reliability of the database, and expands the range of applications.
- Transaction-related interfaces are added to the Store layer/Index layer/Executor layer.
- Provides the ability for garbage collection of distributed transaction data, cleaning up completed and no longer needed transaction data, freeing up storage space, and reducing storage space occupancy.
- Transaction table creation: When creating a table, specify ENGINE=LSM_TXN to complete the creation.
- Transaction commit methods:
- Explicit commit: Use the COMMIT command to complete the commit.
- Implicit commit: Use SQL commands (BEGIN, START TRANSACTION, etc.) to indirectly complete the commit.
- Auto commit: After INSERT/UPDATE/DELETE execution, the system automatically completes the commit.
- Three transaction isolation levels: Read Committed, Repeatable Read.
- Two transaction modes: Optimistic and Pessimistic.
- Transaction locking mech anism: Provides table-level and row-level lock management. By locking tables/rows, it ensures transaction consistency and isolation, effectively avoiding data conflicts between concurrent transactions.
- Deadlock detection mechanism: Supports periodic checking of lock resources and waiting relationships in the system to identify potential deadlock situations.
2. Compute Pushdown
- Refactoring of compute pushdown, optimizing code execution logic, and improving data query performance.
- Supports expression compute pushdown, handling execution with expression syntax to improve computational efficiency.
- Supports Vector ScalarData operator pushdown: When performing vector approximate nearest neighbor search, filters scalar data to further select data that meets specific conditions.
- Python SDK introduces the Self Query feature, providing filtering capabilities for vector data Scalar Data, satisfying specific query vector data scenarios.
Product Feature Enhancements
1. Data Storage Layer
1.1 Architecture optimization
- Added encapsulation for google::protobuf::Closure to facilitate request statistics and log tracking.
- Refactored the RawRocksEngine class by splitting it into multiple files based on functionality and supporting multi-column family mode to address the bloated issues of the current RawRocksEngine.
- Refactored the StoreService/IndexService modules to unify the logic inside and outside the queue.
- Refactored the Storage class by extracting the execution queue logic and placing it in the traffic control module.
1.2 Region Management (Merge & Split):
- Optimized the region split strategy by introducing backward region splitting in addition to the existing strategy.
- Added region merge functionality to the Store layer/Coordinator layer to dynamically adjust data and optimize storage space utilization.
- Supported splitting in multi-column family mode, greatly improving scalability, performance, and reliability. Adopted a unified encoding format compatible with key encoding formats for distributed transactions.
1.3 Vector Indexing
- Based on retrieval speed, a new IVF_FLAT vector indexing method based on inverted indexes is added, which is suitable for high-dimensional sparse vector data. It provides fast retrieval speed and good retrieval performance.
- Based on memory, a new IVF_PQ vector indexing method is added, which is based on inverted indexes and product quantization. It is suitable for high-dimensional dense vector data and offers good search speed and low storage overhead.
- Based on accuracy, a new BruteForce index is added, which is suitable for small-scale vector datasets or scenarios that require high search accuracy.
1.4 Storage Engines
- Added B+Tree engine to optimize database query performance.
- Added XDP engine to achieve high-performance data processing.
- Diversified storage engine support, allowing users to specify specific storage engines based on their actual business needs.
1.5 Snapshot Capability Upgrade
- Upgraded VectorIndex to support multi-column family storage.
- Snapshot supports multi-column family storage mode and is compatible with key encoding formats for distributed transactions.
- Implemented Fake Snapshot to reduce I/O burden.
- Supported BaikalDB-style save/load snapshot.
2. Executor Execution Layer
2.1 Data Types
- Added Blob data type for storing binary data such as images, audio, videos, etc.
2.2 SQL Syntax
- SQL layer provides batch data import and export.
- Added vector distance calculation functions:
- Inner product distance: ipDistance
- Euclidean distance: l2Distance
- Cosine distance: cosineDistance
- Support vector queries without functions, allowing vector queries even without vector indexes.
- Table supports Chinese for table creation, insertion, querying, updating, and deletion.
- Distributed transaction-related parameters:
- Support transaction parameter settings at different levels: Global/Session.
- Timeout settings, supporting setting retry or blocking timeout, automatically rolling back after the timeout:
- Lock_wait_timeout
- Set [session | global] statement_timeout = timeout
2.3 Module Refactoring
- Based on version-based new features, refactor existing modules such as Store/Task/Job/Calcite/Client for distributed transactions.
- Integrate the client-side with the SQL execution layer to optimize the system architecture and reduce code redundancy.
3. SDK Layer
- Added C++ SDK, enabling independent integration testing execution with Dingo-store based on the C++ SDK.
4. Operations and Monitoring
- Visual web monitoring interface to monitor the real-time health status of Store, Executor, and Coordinator components, providing cluster-wide monitoring information.
Release Notes v0.7.0
Release Notes v0.7.0
1.Store Storage Layer
1.1 Distributed Storage
- Provide the ability to manage IndexRegions, supporting dynamic creation and deletion of IndexRegions.
- Add functionality for Raft Snapshot creation and installation for IndexRegions, which helps generate and load snapshot data for IndexRegions, enhancing system reliability and recovery capabilities.
- Introduce the Build, Rebuild, and Load functions for VectorIndex to enable efficient creation, reconstruction, and loading of vector indexes, facilitating similarity search of vector data.
- Enhance the management capability of IndexRegion for capacity expansion and contraction, enabling dynamic adjustment of index size to accommodate changes in data scale.
- Support automatic splitting of VectorIndex/ScalarIndex Regions for region partitioning based on data load and distribution.
- Introduce a mechanism to load indexes only on the leader (saving memory), by concentrating index loading and maintenance tasks on the leader node to reduce memory consumption on other nodes.
1.2 Vector Index
- Provide the ability to manage vector indexes, including operations such as creation, deletion, and querying of vector indexes.
- Offer diverse types of vector indexes, including HNSW, FLAT, IVF_FLAT, and IVF_PQ.
- Support read and write operations for scalar data, enabling mixed storage and fusion analysis of multimodal data.
- Enable top-N similarity search capability.
- Allow precise lookup based on ID.
- Provide the ability to perform batch queries based on specified offsets.
- Support pre-filtering in vector search by passing a scalar key during VectorSearch operation.
- Support post-filtering in vector search by passing a scalar key during VectorSearch operation.
1.3 Scalar Index
- Support the creation of indexes on non-vector columns, providing more efficient query and retrieval capabilities for non-vector data.
- Provide the ability to manage scalar indexes, including operations such as creation, deletion, and querying of scalar indexes.
- Support LSM Tree-type ScalarIndex, using LSM Tree as the underlying storage structure to build ScalarIndex.
1.4 Distributed Lock
- Implement the Lease mechanism for distributed locks, allowing clients to acquire, release, and maintain distributed locks by managing the lifecycle and renewal of leases.
- Support MVCC (Multi-Version Concurrency Control) for key-value storage: The Coordinator stores all change records for each key-value pair and generates a globally unique revision for each change.
- Provide a simple and efficient OneTimeWatch mechanism for event notification scenarios that only require triggering once.
2. Executor Execution Layer
2.1 Data Types
- Extend the Float data type to support high-dimensional data storage and processing for supporting vector databases.
2.2 SQL Syntax
- Extend the CREATE TABLE statement to support creating scalar tables and vector tables.
- Add new vector index query functions for retrieving vector data.
- Introduce functions for text and image vectorization, converting text and images into vector representations.
2.3 SQL Optimizer
- Support mapping statistics to Calcite selectivity calculations to accurately estimate query costs and select the optimal execution plan, thereby improving query performance and efficiency.
- Support different types of statistics: general statistics (e.g., Integer, Double, Float), cm_sketch, histograms, and Calcite's default calculation for all types.
- Introduce the ANALYZE TABLE command to collect statistics information, notifying the optimizer to collect and update statistics for specified tables.
- Provide a custom CostFactory to implement RelOptCost, redefine interfaces such as isLe, isLt, multiply, plus, and minus.
- Rewrite Dingo TableScan cost calculation.
- Modify DingoLikeScan selectivity estimateRowCount calculation method.
2.4 Pushdown computation
- Optimize the C++ layer serialization and deserialization logic by reducing the number of deserialized columns, shortening the deserialization time.
- Add serialization and deserialization for List data type.
- Optimize the C++ expressions to improve computation efficiency.
- Support pushdown execution plan with a prefix selection to apply the query conditions to the data source as early as possible, reducing the number of rows that need to be read and processed.
2.5 Partitioning strategy
- Add Hash-range partitioning strategy, which has some hashing properties to reduce data skew problems, achieving even distribution of data.
3. SDK Layer
3.1 Python SDK
- Add a Python SDK client for communication with the server.
- Provide Python SDK functionality for Index operations.
- Support join operations in Python SDK.
- Use the pip package management tool to publish the Python SDK, improving its usability, maintainability, and portability.
- DingoDB-Python supports data serialization and deserialization using Proto.
3.2 Java SDK
- Provide Java SDK functionality for Index operations.
- Provide distance measurement API for vector modules.
- Offer partitioning strategy based on Index, distributing data to different partitions based on the range of data index values, facilitating the proper configuration of partitioned data.
- DingoClient provides the ability to merge multiple partitions into one, simplifying the merging process and improving data management efficiency.
- Provide an index encoding mechanism based on AutoIncrement, automatically assigning a unique identifier to each new record to ensure that each record has a unique identifier.
4. Knowledge Assistant Support
- Successfully integrated with the LangChain framework.
- Added support for cosine similarity queries, expanding the vector index query capabilities to include cosine similarity queries. This is useful for retrieving data such as text and images.
- Added a count interface to calculate the number of records in a data collection.
- Added a scan interface for scanning data collections while also satisfying scalar-based data filtering operations.
Release Notes v0.6.0
Release Notes v0.6.0
1 架构层
1.1 存储计算分离
1. 计算引擎(Executor):接收基于MySQL协议和DingoDB自有协议的SQL,进行SQL解析、逻辑计划和执行计划生成,对接低层Store存储。
2. 分布式存储引擎(Store):基于C++的高效分布式存储。整个存储层分为元数据存储和数据存储;存储层设计采用灵活扩展的方式,进行多种存储引擎的扩展,如Rocksdb, memory, xdp-rocks等。
3. 支持计算下推操作:为了高效的提升聚合、过滤操作带来的价值,提升计算的效率,存储层支持计算下推的逻辑实现;支持filter,count,sum, min, max等操作。
1.2 Raft升级
1. 提供Leader选举机制,支持多节点选举;
2. 提供日志复制,保证了系统的可靠性,有效防止数据丢失。
3. 提供高性能的Raft,采用多线程和异步IO,提高了系统的吞吐量和响应速度。
4. 提供Snapshot机制,用于恢复状态机的状态。能够减少日志的大小,从而提高性能,还可以用于在节点故障时快速恢复状态机的状态。
5. 提供集群扩缩容迁移能力,能够在不影响整个系统的稳定性和一致性前提下更加容易地添加或删除节点。
1.3 协议层支持MySQL协议
1. 提供MySQL Shell这种交互式命令行工具,用于高效管理和操作MySQL数据库;同时支持SSL加密,保证数据库的安全性。
2. 提供MySQL JDBC Driver 数据库连接驱动程序, 通过Java应用程序中的JDBC API访问MySQL数据库,用于连接和操作MySQL数据库。
1.4 集群运维监控
1. 提供可视化监控,涉及grafana监控、http监控,实现集群节点(磁盘、CPU、IO等)、表(分区、Region)、Region监控、raft group监控
2. 提供多种部署方案:单点、docker-compose、ansible多节点部署
3. 提供了集群在线扩容、缩容方案,进行集群扩缩容操作。
2 功能层
2.1 Common公共和基础模块
1. 支持手动调整日志级别,可以根据实际场景需要灵活地控制日志地详细程度,减少日志文件的大小和存储成本。
2. 优化Store & Dingo Client端错误码
3. 支持C++版本的数据序列化,将序列化后的数据按照序列化时的格式解析,然后将解析后的数据还原为原始数据,
2.2 Raft管理和分布式存储
1. 提供Snapshot机制,用于恢复状态机的状态。
2. 支持Region Split,当某个Region超过最大限制,系统自动将其分裂成多个Region,保证各个Region大小接近,有利于进行调度决策。
3. 支持Region Merge,当某个Region因为大量删除导致Region的大小变小是,系统会将较小的两个相邻Region进行合并。
4. 优化了Range范围校验规则,提高代码执行效率,从而缩短数据查询时间,对性能有极大提升。
5. 支持活配置dingo-store服务线程个数,用户可以根据实际场景需要调整线程个数
6. 支持服务在运行时指定故障点(failpoint),方便测试Corner case。
7. 支持Sotre & Region 的Metric信息管理
8. 支持算子计算下推,存储层提供基本的operator,DingoClient作为中间端的桥接器,支持SDK和SQL场景,操作类型如下:
- SUM
- SUM0
- COUNT
- MAX
- MIN
- COUNTWITHNULL
9. 支持自增ID Auto Increment ID,当创建一个具有自增列的表时,DingoDB 自动为插入到表中的每一行分配一个唯一的整数值。通过使用分布式序列生成器确保自增列的值在整个集群中都是唯一的。
2.3 SQL协议层
1. 重构Executor端架构,Executor端负责计算,用于解析并响应 Client 端 SQL 请求和其他管理请求
2. 兼容MySQL协议
3. 完成Calcite的升级,提高SQL端的执行效率
4. Metric表级信息采集
5. 网络传输层增加task响应机制(STOP/READY/QUIT)
2.4 SQL语法扩展
1. 扩展创建表时指定副本数和分区能力信息,附加相关的附属信息。
2. 扩展通过SQL进行Region分裂,实现数据分布管理,使用更加灵活易用。
3. 扩展MySQL协议相关语法
- 支持查看全局/用户/会话变量
- 支持设置全局/用户/会话变量
- 支持查看表结构/指定列的信息
- 支持查看表/用户创建语句
- 支持设置mysql-driver空闲超时时长
- 支持Sql的预处理
2.5 Java SDK层
SDK为开发人员创建的一组软件工具程序,通过特定API接口对数据库进行操作,开发者能够更加灵活且高效的执行数据库操作,降低学习成本,极大提升开发效率。以下是DingoDB SDK层支持的功能特性:
1. 支持通过DingoDB自研API接口执行Connect集群操作
2. 支持表操作(创建/删除)
3. 支持单条数据操作(查看/插入/删除/修改)
4. 支持批量数据操作(查看/插入/删除/修改)
5. 支持范围过滤后的聚合操作
- SUM
- SUM0
- COUNT
- MAX
- MIN
- COUNTWITHNULL
2.6 DevOPS层
可视化系统监控
- 节点信息监控,帮助用户更有效观察服务器节点状态变化
- 系统进程监控,帮助用户及时发现异常进程,用户及时响应处理
新增系统运维工具
- 支持多节点部署,使用ansible自动化运维工具,通过批量系统配置,程序部署,命令运行等功能来实现批量部署的功能。
新增DBA级别的系统管理工具
- 支持Leader的迁移,将同一group中的Leader切换到领一个follower节点,用于负载均衡或紧急情况下重启机器。
- 支持Region分裂/合并,系统自动将其分裂/合并成多个Region,保证各个Region大小接近,达到负载均衡。
- 支持节点扩缩容,用户可以根据实际场景中数据分布来决定是否增加或减少借点,从而做到负载均衡。
- 可视化Sechma/Table/Region管理,通过可视化工具有效监控Sechma/Table/Region信息。
Release Notes v0.5.0
Release Note - V0.5.0
一、SQL相关特性
- 支持like关键字的模糊查询
- 支持用户认证:用户的增删改查
- 支持用户权限赋予
- 支持集群认证
- 支持SQL批量插入
- 优化Calcite函数校验机制
- 错误码信息重构
二、元数据管理
- 将集群表粒度管理拆分到executor
- 废弃原有Dingo-jraft模块
- Coordinator中将原有Dingo-jraft迁移至Dingo-mpu
- 支持基于SQL的元数据表查询
三、索引相关
- 支持索引的增删改查,提升查询性能
- 支持多种多索引类型:非主键索引和联合索引
四、SDK相关特性
- 支持基于链式表达式的计算,实现多种范围查找后的聚合计算、更新等
- 支持非主键列扫描、过滤计算
- 指标计算特性列表:
序号 | 函数 | 说明 |
---|---|---|
1 | Scan | 扫描表中数据 |
2 | Get | 读取表中数据 |
3 | Filter | 根据条件过滤数据 |
4 | Add | 对列进行数值加操作 |
5 | Put | 向表中写入数据 |
6 | Update | 修改表中数据 |
7 | Delete | 删除表中数据 |
8 | DeleteRange | 范围删除表中数据 |
9 | Max | 对列与输入求最大值 |
10 | Min | 对列与输入求最小值 |
11 | Avg | 对列与输入求平均数 |
12 | Sum | 对列与输入求和 |
13 | Count | 计算记录条数 |
14 | SortList | 对输入的数值和已存储的数值按照数值大小进行排序,默认升序 |
15 | DistinctList | 对输入的数值和已存储的数值执行去重操作,对重复的数值只纪录一次 |
16 | List | 列表,基于输入的数值和已存储的数值,根据条件返回List结果 |
17 | IncreaseCount | 递增次数,序列中,存在相邻两点递增,统计相邻递增的次数 |
18 | DecreaseCount | 递减次数,序列中,存在相邻两点递减,统计相邻递减的次数 |
19 | maxIncreaseCount | 最大递增,序列中,每次连续递增中产生的递增次数的最大值 |
20 | maxDecreaseCount | 最大递减,序列中,每次连续递减中产生的递减次数的最大值 |
五、列存
- 支持基于Merge Tree的列式存储
六、分布式存储
- 解决RocksDB update/delete磁盘释放慢的问题
- 优化Prefix Scan
- 完成RocksDB版本升级
- 优化RocksDB的I/O流程
- 释放DeleteRange执行后的磁盘空间
- RocksDB固定参数可配置
Release Notes v0.4.0
1. Feature and Optimization about SQL
1.1 Features about SQL
1.1.1 Extended SQL Syntax
- Support TTL when create table using options
- Support to assign partitions when create table
1.1.2 Features about Complex Data Type
- Support Operations about MAP
- Support Operations about MultiSet
- Support Operations about Array
1.1.3 Support to use variables in SQL statement, such as insert, select, delete.
1.1.4 Support stratagy to control messages transmitted between operators in execution plan
1.1.5 Support new SQL function
No | Function Name | Description about Function |
---|---|---|
1 | pow(x,y) | The POW() function returns the value of a number raised to the power of another number |
2 | round(x,y) | The ROUND() function rounds a number to a specified number of decimal places |
3 | ceiling(x) | The CEILING() function returns the smallest integer value that is bigger than or equal to a number |
4 | floor(x) | The FLOOR() function returns the largest integer value that is smaller than or equal to a number |
5 | mod(x,y) | The MOD() function returns the remainder of a number divided by another number |
6 | abs(x) | The ABS() function returns the absolute (positive) value of a number. |
1.2 Optimization about SQL
- Optimizate query using range filter
- Optimizate query about range scan
- Optimizate type system about dingo internally
- Optimization about SQL date/time/timestamp function
2. Operation of Key-Value
2.1 Equivalent operation of Key-Value and SQL
- Support to do table operation using Key-Value API, such as create table, drop table
- Support to insert, update, delete record in table using Key-Value API
- Support to do table operation using Annotation API
- Operations about table and record are equivalent between Key-Value API and SQL
2.2 Operation lists about Key-Value SQL
2.2.1 Basic Key-Value Operation
No | Function Name | Description about Function |
---|---|---|
1 | put | insert or update records in table |
2 | get | query records by user key |
3 | delete | delete records by user key |
2.2.2 Numerical operations
No | Funcation Name | Description about Function |
---|---|---|
1 | add | add values on same data type |
2 | sum | calculate the summary of columns filtered by keys |
3 | max | calculate the max of columns filtered by keys |
4 | min | calculate the min of columns filtered by keys |
2.2.3 Compound operation
No | Function Name | Description about Function |
---|---|---|
1 | Operate | do multiple operations on a single record, the operation list can be numerical operation or basic operation |
2 | OperateList | do multiple operations on a single record |
3 | UDF | defined using LUA script to implement user define function |
2.2.4 Collection operations
No | Type | Function Name | Description about Function |
---|---|---|---|
1 | read | size | get size of the elements |
2 | read | get_all | get all the elements of collection |
3 | read | get_by_key | get all the elements of collection by input key |
4 | read | get_by_value | get all the elements of collection by input value |
5 | read | get_by_index_range | get all the elements of collection by range index |
6 | write | put | append a element to the end |
7 | write | clear | clear all the elements of collection |
8 | write | remove_by_key | remove the key from collection |
9 | write | remove_all_by_value | remove all records match the value |
10 | write | remove_by_index | remove record by index |
2.2.5 Filter operations
- DateFilter
Query records using range filter with Date
type.
- NumberRange
Query records using range filter with Numberic type.
- StringRange
Query records using range filter with String type
- ValueEquals
Query records with specifiy record value.
3. Optimization about Storage
3.1 Distributed Consistency Protocol
- Refactor the implements of raft protocol to replace
sofa-jraft
- Refactor the implements about log replication and leader selection
- Support new serialization about key and value
3.2 Improvement about Rocksdb
- Rocksdb can load configuration by files
- Support TTL features using user timestamp
- Update Rocksdb version and release package about
io.dingodb.
on maven central
4. Other features
- Support parameters using JDBC connection such as
timeout
- Support
explain
to view plan about Dingo SQL - Support to release related package to maven-central
No | Module | Description about module |
---|---|---|
1 | dingo-driver-client | the jdbc driver client used by sql |
2 | dingo-sdk | the key-value sdk client to do operation about key-value |
3 | dingo-rocksdb | Extended features on rocksdb |
Release Notes v0.3.0
1.Semantics and Function of SQL
1.1 New data type
- Boolean
- Date: default format yyyy-MM-dd
- Time: default format HH:mm:ss
- Timestamp: default format yyyy-MM-dd HH:mm:ss.SSS
1.2 Allow assigning a default value to column, either constant or internal functions
1.3 Support Join operation
- Inner Join
- Left Join
- Right Join
- Full Join
- Cross Join
1.4 Function list about String
No | Function Names | Notes about Function |
---|---|---|
1 | Concat | Adds two or more expressions together |
2 | Format | Formats a number to a format like "#,###,###.##", rounded to a specified number of decimal places |
3 | Locate | The LOCATE() function returns the position of the first occurrence of a substring in a string |
4 | Lower | Converts a string to lower-case |
5 | Lcase | Converts a string to lower-case |
6 | Upper | Converts a string to upper-case |
7 | Ucase | Converts a string to upper-case |
8 | Left | Extracts a number of characters from a string (starting from left) |
9 | Right | Extracts a number of characters from a string (starting from right) |
10 | Repeat | Repeats a string as many times as specified |
11 | Replace | Replaces all occurrences of a substring within a string, with a new substring |
12 | Trim | Removes leading and trailing spaces from a string |
13 | Ltrim | Removes leading spaces from a string |
14 | Rtrim | Removes trailing spaces from a string |
15 | Mid | Extracts a substring from a string (starting at any position) |
16 | Substring | Extracts a substring from a string (starting at any position) |
17 | Reverse | Reverses a string and returns the result |
1.5 Function list about Date and Time
No | Function Names | Notes about Function |
---|---|---|
1 | Now | Return current date and time |
2 | CurrentDate | Return the current date |
3 | Current_date | Return the current date |
4 | CurTime | Return the current time |
5 | Current_time | Return the current time |
6 | Current_timestamp | Return the current date and time |
7 | From_UnixTime | Convert unix time to timestamp |
8 | Unix_Timestamp | Format the time to unix timestamp |
9 | Date_Format | Formats a date |
10 | DateDiff | Returns the number of days between two date values |
11 | Time_Format | Formats a time by a specified format |
2. Management of Replicator
2.1 Management of metadata
- Physical table can be split into N partitions based on data size
- Management of physical tables such as table creation time, table status, partition strategy, split conditions, etc
2.2 Scheduler of partition replicator
- Support multiple partition modes, such as One table with one partition, One table with multiple partitions
- Support multiple split strategies, such as auto-split or manually split by API
- Support resource isolation between physical tables
2.3 Tools of partition management
- Support to view status about partition, such as leader, follower, etc
- Support to migrate, split partition by internal API
- Support to view metrics about partition, such as write, read latency, size, record count
3. The data access method for DingoDB
3.1 JDBC mode
- Support to connect to dingo by JDBC
3.2 SDK client mode
- Support to put, get, and delete records to tables in dingo
- Support to batch write records to tables in dingo
3.3 Import data from external
- Support to import data from local files in CSV, JSON format
- Support to import data from Kafka in JSON and Avro format
4. Tools and Monitor
- Support to monitor dingo cluster by grafana and prometheus
- Support to management partitions of the cluster by API
- Support to adjust log level dynamically by tools
- Support to deploy cluster by ansible or docker-compose
- Newly add autotests more than 1300+
Release Notes v0.2.0
-
Architecture
- Refactor DingoDB architecture abandon Zookeeper, Kafka and Helix.
- Using raft as the consensus protocol to make agreement across multiple nodes on membership selection and data replication.
- Region is proposed as the unit of data replication, it can be scheduled, split, managed by
coordinator
. - The distributed file system is replaced by distributed key-value implemented by raft and rocksdb.
-
Distributed Storage
- Support region to replicate across multiple nodes.
- Support region to split based on policies such as key counts or region size.
- Support Region to perform periodic snapshot.
-
SQL
- Support more aggregation functions, such as min,max,avg, etc.
- Support
insert into ... select
.
-
Client Tools
- thin jdbc driver
Release Notes v0.1.0
DingoDB 0.1.0 Release Notes
- Cluster
- Distributed computing. Cluster nodes are classified into coordinator role and executor role.
- Distributed meta data storage. Support creating and dropping meta data of tables.
- Coordinators support SQL parsing and optimizing, job creating and distributing, result collecting.
- Executors support task executing.
- Data store
- Using RocksDB storage.
- Encoding and decoding in Apache Avro format.
- Table partitioning by hash of primary columns.
- SQL parsing and executing
- Create (
CREATE TABLE
) and drop table (DROP TABLE
). - Supporting common SQL data types: TINYINT, INT, BIGINT, CHAR, VARCHAR, FLOAT, DOUBLE, BOOLEAN
- Insert into (
INSERT INTO TABLE
) and delete from (DELETE FROM TABLE
) table. - Query table (
SELECT
). - Support filtering and projecting in query.
- Support expressions in filter conditions and projecting columns.
- Support point query.
- Create (
- User interface
- Command line interface (CLI)
- Support SQL input and executing in CLI.
- Output query results in table format in CLI.
- Output time consumed by query in CLI.