Releases: chdb-io/chdb
Releases · chdb-io/chdb
v1.0.0rc3
v1.0.0rc2
v1.0.0rc1
v0.16.0rc2
v0.16.0rc1
chdb Release Summary
chdb 0.16 based on clickhouse 23.10
Query Enhancements
-
Vector Addition:
python3 -m chdb "SELECT [1, 2, 3] + [4, 5, 6]"
.
-
Omit file() Function:
python3 -m chdb "SELECT * from '/home/Clickhouse/bench/hits_0.parquet' limit 10"
.
-
NumPy as Input Format:
- Support for NumPy as an input format with the query
SELECT * FROM 'data.npy'
.
- Support for NumPy as an input format with the query
-
Parquet Optimizations:
- Writing parquet files is 10x faster, it's multi-threaded now. Almost the same speed as reading.
- Parquet filter pushdown. I.e. when reading Parquet files, row groups (chunks of the file) are skipped based on the WHERE condition and the min/max values in each column.
- Optimize reading small row groups by batching them together in Parquet.
-
Condition Pushdown for ORC:
- Using data skipping indices in
ORC
, similarly toParquet
.
- Using data skipping indices in
-
PRQL Support:
- Added support for
PRQL
as a query language.
- Added support for
-
urlCluster Function:
- Add
urlCluster
table function.
- Add
New Features
- Introducing
arrayFold
for applying a lambda function to multiple arrays. - Extended support for asynchronous inserts with external data via the native protocol.
- Introduced function
jsonMergePatch
for merging JSON strings. - Continued support for Kusto Query Language dialect with Phase 1 implementation.
- Introduced a new SQL functionarrayRandomSample
for sampling elements from an input array.
- Added support for dropping cache for Protobuf format withSYSTEM DROP SCHEMA FORMAT CACHE [FOR Protobuf]
. - Conditions on arguments of a table with a space-filling curve in its key can now be used for indexing.
- New setting
force_optimize_projection_name
checks that a projection is used in the query. - Added aggregation function
lttb
using the Largest-Triangle-Three-Buckets algorithm for downsampling data. CHECK TABLE
query has better performance and usability, supporting checking particular parts.
- Introduced functionbyteSwap
for reversing the bytes of unsigned integers.
- Added functionsformatQuery
andformatQuerySingleLine
for formatted SQL query output.
- Introduced DWARF input format for reading debug symbols from an ELF file.
- IntroducedSHOW SETTING setting_name
as a simpler version ofSHOW SETTINGS
.
- Added fieldssubstreams
andfilenames
to thesystem.parts_columns
table.
- Introduced a settingcreate_table_empty_primary_key_by_default
for defaultORDER BY ()
.
Performance Improvements
- Fixed contention on Context lock, significantly improving performance for short-running concurrent queries.
- Improved the performance of inverted index creation by 30%.
- Optimized memory consumption for external aggregation with many temporary files.
- Added option
query_plan_preserve_num_streams_after_window_functions
to preserve the number of streams after evaluating window functions. - Released more streams if data is small, optimizing resource usage.
- Optimized RoaringBitmaps before serialization.
- Optimized inverted index posting lists to use the smallest possible representation.
- Set a reasonable size for the marks cache for secondary indices by default.
- Avoided unnecessary reconstruction of index granules when reading skip indexes.
- Cached CAST function in set during execution to improve the performance of function
IN
when set element type doesn't match column type. - Improved write performance to EmbeddedRocksDB tables.
- Improved overall resilience for ClickHouse in case of many parts within a partition.
- Reduced memory consumption during loading of hierarchical dictionaries.
- All dictionaries now support the setting
dictionary_use_async_executor
. - Prevented excessive memory usage when deserializing
AggregateFunctionTopKGenericData
. - Reduced CPU consumption for AsyncMetrics threads on a Keeper with lots of watches.
- Experimental inverted indexes now do not store tokens with too many matches, saving space.
- Improved write performance to EmbeddedRocksDB tables.
- Improved write performance to hierarchical dictionaries.
v0.15.0
What's Changed
- Enable hdfs, avro and rapidJson/simdJson by @nmreadelf in #123
Full Changelog: v0.14.2...v0.15.0
v0.14.2
v0.14.1
What's Changed
- Add CHDB_VERSION for cmake by @nmreadelf and @lmangani in #96
- Enable output_format_arrow_string_as_string for default by @nmreadelf in #98
- Strip library/wheels by @lmangani in #99
- Support UDF in Python by @auxten in #100
- Fix minor test coverage by @auxten
Full Changelog: v0.13.0...v0.14.0
v0.14.0
What's Changed
- Add CHDB_VERSION for cmake by @nmreadelf in #96
- Enable output_format_arrow_string_as_string for default by @nmreadelf in #98
- Strip library/wheels by @lmangani in #99
- Support UDF in Python by @auxten in #100
Full Changelog: v0.13.0...v0.14.0