Key highlights of this release
There have been some key performance improvements in this release:
- New and much faster recursive path finding algorithms that implement relationship patterns
with the Kleene star [*
]. - Data spilling to disk during copy which enables copying very large graphs on machines with
limited RAM. - Zone maps which enable much faster scans of node/rel properties when there is
a filter on numeric properties.
From the usability perspective, we have the following improvements:
- CSV auto detection to automatically detect several CSV configurations during data ingest.
- Improved UX during CSV import that can report to users about skipping erroneous CSV lines.
- New JSON data type that you can use to store JSON blobs as node/relationship properties.
- New official Golang API so that you can build applications on top of Kùzu using Go!
What's Changed
- Make C API header available in the kuzu target's include directories by @benjaminwinger in #4034
- Add option
force_checkpoint_on_close
by @ray6080 in #4032 - Fix delete on empty database by @andyfengHKU in #4038
- Fix list_contains implicit casting by @andyfengHKU in #4044
- Force cast when scanning by @mxwli in #4041
- In memory mode by @ray6080 in #4012
- Implement attach remote duckdb database by @acquamarin in #4040
- Only commit memory on windows when initially claiming the frame by @benjaminwinger in #4047
- Fix list contains casting on empty list by @andyfengHKU in #4049
- Fix consecutive merge by @andyfengHKU in #4046
- Fix semi mask on scan node table by @ray6080 in #4050
- Add mscv in mem CI workflow; remove build steps in CI workflow by @ray6080 in #4051
- Fix resize and append of chunkedNodeGroup by @ray6080 in #4054
- Refactor list auxiliary buffer by @andyfengHKU in #4052
- Rework checkpoint memory usage estimation by @ray6080 in #4055
- Optimize catalog access by @andyfengHKU in #4058
- Fix updates to same row leading to incorrect write-write conflict errors by @ray6080 in #4063
- Added new output modes for shell by @MSebanc in #4053
- Vacuum dropped columns during checkpoint by @ray6080 in #4074
- Refactor generation of grammar files. by @mxwli in #4073
- Auto checkpoint when closing db by @ray6080 in #4075
- Csv sniffing by @mxwli in #3932
- Fix platform issues with md5sum utility by @mxwli in #4077
- Make property case insensitive by @andyfengHKU in #4039
- Add More Json tests by @mxwli in #4061
- Fixed cli modes to escape characters by @MSebanc in #4085
- Added some missing algorithm includes by @benjaminwinger in #4086
- Implemented cli start up commands ability by @MSebanc in #4078
- Fix bug in CorrelatedSubqueryUnnestSolver by @andyfengHKU in #4088
- Fix a bug when intersecting an empty build side table by @andyfengHKU in #4089
- Enabled progress bar by default for shell by @MSebanc in #4065
- Add default empty str for database path for Python and Node.js API by @mewim in #4090
- Rework iterateCatalogEntries and catalog entry oid by @ray6080 in #4057
- Add in_memory constructor for rust database and API docs for in-memory mode by @benjaminwinger in #4094
- Update api doc for java, nodejs, python and add pytest for in-mem mode by @ray6080 in #4095
- Fix detection of when rust builds don't need to re-build the bundled C++ library by @benjaminwinger in #4093
- Update number of threads in 32-bit build by @mewim in #4096
- bump version to 0.6.0 by @ray6080 in #4076
- Fix constant lambda expression evaluation by @andyfengHKU in #4098
- Refactor function&function expression by @andyfengHKU in #4102
- Csv reader progress fix by @MSebanc in #4099
- Fix user defined types in nested type by @acquamarin in #4081
- change occurrences of std::regex uses to RE2 library calls by @mxwli in #4100
- Csv reading improvement by @mxwli in #4079
- Fix failed tests when compression is disabled by @ray6080 in #4104
- Automatically upload binary dataset to s3 by @mewim in #4114
- Make Connection::query in rust call the C++ query function instead of using prepare+execute by @benjaminwinger in #4117
- Disabled progress bar by default due to performance issues by @MSebanc in #4115
- Add Pyarrow Skip and Limit by @mxwli in #4112
- Add projection push down to csv and parquet by @andyfengHKU in #4113
- Schedule lsqb to run every day by @ray6080 in #4122
- Unify getColumnID of node and rel TableCatalogEntry by @ray6080 in #4123
- Keys function by @acquamarin in #4125
- Fix serialize tinysnb dataset on windows by @acquamarin in #4124
- Fix JSON null handling by @mxwli in #4118
- Implement projection push down in scan relation tables. by @acquamarin in #4126
- Add option to ignore errors in CSV parsing + improve error messages by @royi-luo in #4067
- Random Split for Multi Copy Testing by @yiyun-sj in #3989
- Extension installation rework by @acquamarin in #4066
- Enable scheduled run for interactive v1 by @ray6080 in #4133
- Implement Predicate functions by @acquamarin in #4109
- Add filter push down to relational table by @andyfengHKU in #4120
- Added cli support for single line comments and added multiline mode by @MSebanc in #4134
- add projection to pyarrow, numpy, and json by @mxwli in #4119
- Fix progress bar threading issue by @royi-luo in #4140
- Schedule auto run of SNB BI by @ray6080 in #4139
- ALP implementation by @royi-luo in #3994
- Added shell development guide by @MSebanc in #4144
- Schedule auto run of fintech bench by @ray6080 in #4143
- Fix export-import rel group by @andyfengHKU in #4147
- Fix undirected edge projection by @andyfengHKU in #4151
- Merge BMFileHandle and FileHandle by @ray6080 in #4045
- Fix casting assertion errors in
getNodeTableEntries
andgetRelTableEntries
by @ray6080 in #4154 - Replace Linux open file flag with kuzu open file flag by @acquamarin in #2931
- Fix import legacy exported database by @acquamarin in #4157
- Fix user defined type casting by @acquamarin in #4156
- Http file cache improve by @acquamarin in #4159
- Ignore node batch insert failures by @royi-luo in #4158
- Deprecate ARM64 macOS CI job by @mewim in #4164
- Make page size be flexible configurable by @ray6080 in #4162
- Fix Windows extension test PostgreSQL host by @mewim in #4169
- Fix Windows build tool path by @mewim in #4170
- Fix indentation in build script by @mewim in #4171
- Fix multiplatform build and test workflow by @mewim in #4174
- Add compile flag to reduce size of static library on windows by @benjaminwinger in #3644
- Fix attach kuzu in in-mem mode by @acquamarin in #4177
- Generate LDBC-01 from CI by @mewim in #4180
- Minor fix for multiplatform build and test by @mewim in #4179
- Fix dataset paths for generated datasets on S3 by @mewim in #4182
- Restructure extension build pipeline (test) by @mewim in #4191
- Fix the number of threads used in the CSV converter tests by @benjaminwinger in #4190
- Deploy extensions to Dockerhub; add draft purge script by @mewim in #4192
- Add Go API installation command by @prrao87 in #4193
- Add extension purging workflow by @mewim in #4194
- Refactor page size test by @mewim in #4195
- Add missing json creation functions by @acquamarin in #4185
- Fix race condition causing an infinite loop in the eviction queue by @benjaminwinger in #4187
- Make vector size be more flexible by @ray6080 in #4173
- Fix double-initialization of the NullChunkData buffer by @benjaminwinger in #4186
- Fix CI LDBC interactive v1 benchmark by @mewim in #4201
- Re-enabling LDBC Shortest Path Queries by @WWW0030 in #4209
- Update multiplatform-build-test.yml to include Ubuntu 24.10 by @mewim in #4218
- Rework checkpoint of version info by @ray6080 in #4210
- Fixing issue 4129: Unexpected error message when copying from csv files with mismatched num of columns by @SterlingT3485 in #4202
- Publish nightly build for extensions by @mewim in #4228
- Fix 2973: Remove "Kuzu" prefix from all Java API classes by @SterlingT3485 in #4229
- Fix compilation warnings and enable Werror on gcc and clang by @ray6080 in #4220
- Nested alias by @andyfengHKU in #4223
- Json reader improvement by @acquamarin in #4225
- Optimize extension cmake files by @acquamarin in #4198
- Fix buffer manager failure false positive by @benjaminwinger in #4221
- list lambda body support use parameter more than once by @excaliburwyj in #4163
- Track ColumnChunk allocations through the BufferManager by @benjaminwinger in #3743
- Skip extension test from the main pipeline by @mewim in #4232
- Fix return same column with different alias by @andyfengHKU in #4233
- Improved cli syntax highlighting by @MSebanc in #4236
- Fix windows open file flag by @acquamarin in #4238
- Optimize version vector array by @ray6080 in #4196
- Fixed cli prompt retaining previous highlighting in singleline mode by @MSebanc in #4242
- Implement json auto-detect options by @acquamarin in #4237
- Feature: show_tables() and table_info() by @WWW0030 in #4234
- Refactor with projection list rewrite by @andyfengHKU in #4244
- Newline delimited josn by @acquamarin in #4245
- Populate error source info (e.g. line number) for node batch insert if the source is a CSV file by @royi-luo in #4184
- Providing better error message in CLI by @WWW0030 in #4248
- Fix undefined behaviour in the Buffer Manager after failure by @benjaminwinger in #4246
- Add function call to clear cached warnings by @royi-luo in #4249
- Fix order by scope by @andyfengHKU in #4254
- Added ability to turn off cli highlighting by @MSebanc in #4252
- Changed cli default mode to multiline by @MSebanc in #4253
- Fix subquery planning by @andyfengHKU in #4255
- Fix nested aggregate by @andyfengHKU in #4259
- Remove the unnecessary pass of nodeIDVector by @ray6080 in #4260
- Skip scan of rel id when possible by @ray6080 in #4261
- Enable read only DB tests on windows by @benjaminwinger in #4200
- Migrate Java build process to use gradle build system by @mewim in #4263
- Remove local storage import from transaction.h by @benjaminwinger in #4178
- Expose rel internal id and use it for de-duplication for NetworkX conversion by @mewim in #4241
- Fix rust warnings and update arrow-rs dependency by @benjaminwinger in #4265
- Fixing 2439: Improve Copy Option Usability by @SterlingT3485 in #4230
- Remove unncessary scan of nbrNodeID from rel tables by @ray6080 in #4264
- Add warnings + error ignoring for index lookup failures by @royi-luo in #4256
- Add Maven publish workflow by @mewim in #4270
- Simplify and scan rel operator and rel table scan interface by @ray6080 in #4258
- Add Sonatype OSSRH to Java deploy workflow by @mewim in #4272
- Update Java docs, example, and download link by @mewim in #4276
- Fix rel checkpoint due to incrrect set null and misaligned gaps due to empty src node by @ray6080 in #4274
- Avoid unnecessary allocation of CSR header chunks when scanning from local rel tables by @ray6080 in #4277
- Adding Progress Report for REL_BATCH_INSERT by @WWW0030 in #4266
- Fix OPTIONAL MATCH null value handling for NetworkX conversion by @mewim in #4282
- Fixing pattern naming by @WWW0030 in #4283
- Backend for new recursive joins by @semihsalihoglu-uw in #4278
- Add clang tidy checks on init variables by @ray6080 in #4268
- Fix re-opening databases after a copy failure by @benjaminwinger in #4279
- Remove RJOutputType by @andyfengHKU in #4290
- Following modification on copy option usability by @SterlingT3485 in #4292
- Adding regexp for split_to_array by @WWW0030 in #4288
- Add remove RJInputType by @andyfengHKU in #4291
- Support more types in top-k by @acquamarin in #4304
- Add gds object manager by @andyfengHKU in #4298
- Fix memory leak in JSON parsing by @royi-luo in #4302
- Swap order of type arguments to ku_dynamic_cast by @benjaminwinger in #4307
- Refactor Validate Property Name by @WWW0030 in #4303
- Populate warning data for batch insert warnings with JSON source + populate line number for JSON scan errors by @royi-luo in #4294
- Track edge ID in gds by @andyfengHKU in #4310
- Refactor
DataChunk::getValueVector()
to return reference toValueVector
instead ofstd::shared_ptr<ValueVector>
by @royi-luo in #4316 - Deprecate single shortest path path tracking by @andyfengHKU in #4317
- Make CI job fail on benchmark error by @mewim in #4322
- Switch to trusted publisher for PyPI publication by @mewim in #4323
- Skip failed benchmark by @mewim in #4324
- Remove outputProperty from gds function parameter list by @andyfengHKU in #4325
- Support decimal datatype when scanning from duckdb by @acquamarin in #4326
- Removing ENCODED_JOIN and fixing benchmark tests by @WWW0030 in #4329
- Batched scan from persistent storage for rel tables by @ray6080 in #4318
- Add test with ASAN to linux extension test by @royi-luo in #4311
- Allowing PRIMARY KEY specification after Column Definition by @WWW0030 in #4314
- Fixing call test by @WWW0030 in #4335
- Use an iterator instead of collecting Graph scan results into an std::vector by @benjaminwinger in #4327
- Add bwd/both direction scan to gds by @andyfengHKU in #4333
- Add edge direction to gds by @andyfengHKU in #4336
- Add semi mask point lookup by @andyfengHKU in #4340
- Cli multiline mode whitespace fix by @MSebanc in #4342
- CLI auto-complete highlighting and initial improvements by @MSebanc in #4341
- Make default buffer pool size in Java API same as others by @royi-luo in #4343
- Spill to disk by @benjaminwinger in #4188
- Fix distinct order by not preserving order by @andyfengHKU in #4345
- Add CODEOWNERS file by @benjaminwinger in #4231
- Add support for MAP data type for C API by @WWW0030 in #4339
- Add Dialect Detection as the First Step of CSV Sniffing by @SterlingT3485 in #4309
- SelectionVector cleanup by @benjaminwinger in #4346
- Add missing numOutputTuples in physical operators by @ray6080 in #4351
- Enable constant compression for nulls by @ray6080 in #4166
- Execute recursive pattern as gds by @andyfengHKU in #4347
- Logical explain by @MSebanc in #3987
- Add support for MAP data type for JAVA API by @WWW0030 in #4349
- Vectorize EdgeCompute by @benjaminwinger in #4352
- Bug fixes + optimizations on Fintech benchmark by @royi-luo in #4348
- Optimize short query and avoid scheduling overhead by @ted-wq-x in #4358
- Optimize rel label prune under both direction by @ted-wq-x in #4357
- Fix CI after GitHub environment change by @mewim in #4363
- Adding DECIMAL Support for C_API by @WWW0030 in #4359
- Some more ALP-related bug fixes by @royi-luo in #4364
- Add gds edge predicate by @andyfengHKU in #4361
- Add very basic table stats for node tables by @ray6080 in #4366
- java api query result by @WWW0030 in #4367
- fix path semantic no work by @excaliburwyj in #4365
- Fix clang format on CI by @mewim in #4372
- Temp disable SemanticTest by @mewim in #4377
- Enable testing for WebAssembly by @mewim in #4376
- Move
--break-system-packages
settings to env variables by @mewim in #4378 - Fix failed WASM tests by @mewim in #4380
- Decrease the test concurrency for WASM on CI by @mewim in #4382
- Remove out-dated semantic test by @andyfengHKU in #4383
- fix the execution plan containing NodeLabelFiler always results in empty by @ted-wq-x in #4370
- Add option to regexp_replace function by @acquamarin in #4375
- Disable import and export tests in WASM by @mewim in #4384
- Enable zone map for scan node/rel tables by @royi-luo in #4373
- Implement Header Detection as the third step of CSV Sniffing by @SterlingT3485 in #4360
- Optimize column&ChunkedCSRHeader memory initialize by @ted-wq-x in #4362
- Standalone call function by @acquamarin in #4390
- Increase wasm stack size by @royi-luo in #4391
- Replacing Deprecated Finalize() with autoCloseable from JAVA_API by @WWW0030 in #4388
- Rework logical op print info interface by @ray6080 in #4379
- Added cli autocompletion for table columns by @MSebanc in #4355
- Added cli command to toggle completion highlighting by @MSebanc in #4394
- Zone map enable follow-up by @royi-luo in #4393
- Implement single thread mode for task scheduler by @mewim in #4399
- Added function names to cli autocomplete by @MSebanc in #4395
- GDS node filter by @andyfengHKU in #4374
- Support different path semantic for gds by @andyfengHKU in #4387
- Add rel multiplicity check for insertions by @royi-luo in #4401
- Make scan functions const by @benjaminwinger in #4406
- Update rust dependencies by @benjaminwinger in #4410
- Add operator id to explain output to make semiMasker output clearer by @ted-wq-x in #4411
- Always update null mask in vector comparison functions by @royi-luo in #4407
- Improve progress bar performance by @royi-luo in #4409
- Migrate recursive join to new GDS framework by @andyfengHKU in #4404
- isVisible returns false if chunkedGroup is nullptr (instead of crashing) by @royi-luo in #4412
- Updating Mini Parquet by @WWW0030 in #4400
- Properly enable progress bar by @royi-luo in #4415
- Cleanup and address warnings in storage by @ray6080 in #4427
- Removing PathSemiMasker unnecessary mask node by @ted-wq-x in #4418
- Gds no path track optimization by @andyfengHKU in #4419
- Change uncommitted local node offsets to be consecutive with committed ones by @ray6080 in #4420
- Enable scan of uncommitted transaction local data inside OnDiskGraph by @ray6080 in #4421
- Revert unintentional change to maxThreads by @benjaminwinger in #4435
- Added test for cli autocompletion by @MSebanc in #4434
- Add alias to the explain output of scan_table operator by @ted-wq-x in #4437
- Add support for creating LIST from C API by @mewim in #4436
- GDS dst node pruning optimization by @andyfengHKU in #4430
- Refactor GDS var length function by @andyfengHKU in #4438
- Using croaring library instead of MaskCollection by @ted-wq-x in #4356
- Add support for creating STRUCT from C API by @mewim in #4439
- Add support for creating MAP from C API and fix Java util functions for MAP by @mewim in #4442
- Fix clang tidy and msvc compilation from #4356 by @ray6080 in #4441
- set canParallel for clear_warnings() to false by @royi-luo in #4443
- Clean up unused constants by @ray6080 in #4449
- Fix PathSingleTableSemiMasker not working as expected by @ted-wq-x in #4451
- Bug in Testing framework OK followed by EOF raise seg fault by @WWW0030 in #4445
- Allow scanning edge properties in GDS edgeCompute by @benjaminwinger in #4386
- Add shortest path early stop by @andyfengHKU in #4448
- Remove RDF by @andyfengHKU in #4344
- Modify BM so behaviour can be overriden during tests + some robustness improvements by @royi-luo in #4450
- Fixed ctrl c not interrupting shell bug by @MSebanc in #4446
- Fix primary key scan not working by @ted-wq-x in #4454
- Add limit push down optimization by @ted-wq-x in #4428
- Fix clang format by @mewim in #4464
- remove unused header by @SterlingT3485 in #4465
- Fix aggregation overflow bug by @acquamarin in #4466
- Clean up after limit push down in #4428 by @ray6080 in #4463
- Remove runQuery interface by @ray6080 in #4469
- Rework limit push down optimizer by @andyfengHKU in #4470
- Some Bug fix in JSON type casting (Date, Interval, Timestamp), Improvement (literal NULL/NAN/INF) in CSV and new tests on datatype casting for JSON by @SterlingT3485 in #4447
- Remove clang-format as dependency for jobs by @royi-luo in #4478
- Fix duplicate quote char in JSON CASTING by @SterlingT3485 in #4473
- Push down dst table id filter to GDS by @andyfengHKU in #4474
- Gds limit push down by @andyfengHKU in #4471
- Fix path printing by @andyfengHKU in #4482
- Fix attaching duckdb database hosted on authenticated s3 by @acquamarin in #4480
- Store just one OnDiskGraph scan state for both fwd and backward by @benjaminwinger in #4424
- Fixed shell unicode table name printing bug by @MSebanc in #4484
- Allow loading extension for multiple times by @acquamarin in #4483
- CSV parallel parsing fixes by @royi-luo in #4479
- Revert "Store just one OnDiskGraph scan state for both fwd and backward" by @andyfengHKU in #4487
- Gds progress and interrupt by @andyfengHKU in #4488
- Lazily create the copy.tmp file when used by @benjaminwinger in #4472
- Fix invalid error handler reference when calling
ParallelCSVScanSharedState::constructPopulateFunc
by @royi-luo in #4491 - Fix extension test failure on windows by @acquamarin in #4490
- Refactor semi masker by @andyfengHKU in #4494
- Fix JSON_CONTAIN and Adding More JSON test cases by @SterlingT3485 in #4489
- Apply semi mask to path property by @andyfengHKU in #4497
- Optimize gds limit semi mask planning by @andyfengHKU in #4498
- Disallow JSON type as primary key by @acquamarin in #4500
- Fix import db deadlock by @ray6080 in #4502
- Fix udt serialization by @acquamarin in #4508
- Fix Linux extension upload and deprecate pre-compiled 32-bit Linux extension by @mewim in #4511
- More ALP bug fixes by @royi-luo in #4505
- Acquire a lock when destroying/copying a python udf function by @acquamarin in #4507
- Fix json function return type and null handling by @acquamarin in #4518
- fix string cast in nested json object by @SterlingT3485 in #4520
- Fix sample_size is zero when COPY FROM by @SterlingT3485 in #4522
- Fix json contains by @acquamarin in #4521
- Fix issue 4527 by @andyfengHKU in #4528
- Remove spill_to_disk_temp_file, instead add spill_to_disk by @ray6080 in #4532
- Bump project version to 0.7.0 by @mewim in #4525
- Fix ci by @acquamarin in #4536
New Contributors
- @SterlingT3485 made their first contribution in #4202
- @excaliburwyj made their first contribution in #4163
Full Changelog: v0.5.0...v0.7.0