11.0.0 (2022-08-16)
Breaking changes:
Implemented enhancements:
- Make RowAccumulator public #3138
- docs: proposal for consolidating docs into a Contributor Guide #3127
- feat: support Timestamp +/- Interval #3103
- a
arrow_typeof
like posgresql'spg_typeof
#3095 - Add DataFrame section to user guide #3066
- Document all scalar SQL functions in user guide #3065
- Simplify implementation of approx_median so that it can be exposed in Python #3063
- Support double quoted literal strings for dialects(such as mysql,bigquery) #3055
- Simplify / speed up implementation of character_length to unicode points #3049
- Follow-up on Clickbench benchmark #3048
- Why the PhysicalPlanner is an async trait ? #3032
- Optimize file stream metrics. #3024
- Proposal: Enable typed strings expressions for VALUES clause #3017
- Proposal: Add
date_bin
function #3015 - The upcoming release of Arrow (20?) breaks datafusion #3006
- Can I select some files for query based on the filtering rules in the directory? #2993
- Rename FormatReader to FileOpener #2990
- Derive
Hash
trait forJoinType
#2971 - CAST from Utf8 to Boolean #2967
- Add baseline_metrics for FileStream to record metrics like elapsed time, record output, etc #2961
- Example to show how to convert query result into rust struct #2959
- simplify not clause #2957
- Implement Debug for ColumnarValue #2950
- Parallel fetching of column chunks when reading parquet files #2949
- Extension mechanism for
SessionConfig
#2939 - Streaming CSV/JSON Object Store Read #2935
- Support CSV Limit Pushdown to Object Storage #2930
- Add support for
pow
scalar function #2926 - Add support for exact
median
aggregate function #2925 - Support
mean
as synonym foravg
#2922 - Rename a column name #2919
- Move
ScalarValue
tests alongside implementation, movefrom_slice
tocore
#2913 - Fail gracefully if optimization rule fails #2908
- Make ObjectStoreRegistry as a trait which can allow Ballista to introduce a self registry ObjectStoreRegistry #2905
- Remove datafusion-data-access crate #2903
- Improve formatting of logical plans containing subquery expressions #2898
- Atan2 added to built-in functions #2897
- The explain statements only print logical plans for debug/other purpose. #2894
- JSON version of
display_indent()
#2889 - It would be nice to have a way to generate unique IDs in optimizer rules #2886
- Add support for
TIME
literal values #2883 - Add h2o benchmark #2879
- Implement
from_unixtime
function #2871 - Add
cast
function for creating logical cast expression #2870 - Release DataFusion 10.0.0 #2862
- Implement
information_schema.views
#2857 - Migrate from avro_rs to apache_avro #2783
- Add optimizer rule to remove
OFFSET 0
#2584 - Preserve Element Name in ScalarValue::List #2450
- Add EXISTS subquery support to Ballista #2338
- Add documentation on supported functions to datafusion website #1487
- documentations for datafusion-cli can be consolidated a bit more #1352
- Optimizer: Predicate Rewrite pass for TPCH Q19 #217
- feat: add optimize rule
rewrite_disjunctive_predicate
#2858 (xudong963)
Fixed bugs:
- Regression in SQL support for
ORDER BY
and aliased expressions #3160 - panic when deal with
@
operator #3137 - Incorrect type coercion rule for date + interval #3093
- Cast string to timestamp crash while we input time before 1970 with floating number second #3082
- INTEGER type does't work while importing csv #3059
- Cannot GROUP BY Binary #3050
- incorrect i32 coercion for
to_timestamp
#3046 - Error pruning
IsNull
expressions: Column 'instance_null_count' is declared as non-nullable but contains null values #3042 - I want to query some files in a directory. Is there any way? #3013
- The expression to get an indexed field is only valid for
List
types (common_sub_expression_eliminate
) #3002 - Double to_timestamp_seconds produces abnormal result #2998
- External parquet table fails when schema contains differing key / value metadata #2982
- SELECT on column with uppercase column name fails with FieldNotFound error #2978
- panic reading AWS-generated parquet file #2963
- Can't filter rowgroup for parquet prune for some data type #2962
- CI test is failing with
final link failed: No space left on device
#2947 - bug: new ObjectStore breaks backward compatibility with contrib plugins #2931
- bug: file types handled wrong #2929
- bug: changing the number of partitions does not increase concurrency #2928
- csv_explain fails on RC verifier #2916
- index out of range error from datafusion_row::write::write_field #2910
- Optimization rule
CommonSubexprEliminate
creates invalid projections #2907 - serde_json requires that either
std
(default) oralloc
feature is enabled #2896 - Inconsistent type coercion rules with comparison expressions #2890
- Doc Error: the test directory link 404 which is in CONTRIBUTING.md #2880
- Round trips through
ScalarValue
's sometimes don't preserve types (e.g. change types fromDictionaryArray
) #2874 - Error with CASE and DictionaryArrays:
ArrowError(InvalidArgumentError("arguments need to have the same data type"))
#2873 - window functions not supported in expressions #2869
- Unable to work with month intervals #2796
- Discord invite link in communication page has expired #2743
- Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719
- Reading parquet with (pre-release) arrow fails with "out of order projection is not supported" #2543
- Fix SQL planner bug when resolving columns with same name as a relation #3003 [sql] (andygrove)
- fix
RowWriter
index out of bounds error #2968 (comphead) - fix: support decimal statistic for row group prune #2966 (liukun4515)
- Fix invalid projection in
CommonSubexprEliminate
#2915 (andygrove)
Documentation updates:
- MINOR: Fix broken links in contrib guide #3135 (andygrove)
- MINOR: User Guide: Move expressions to top-level page #3134 (andygrove)
- User Guide: Combine CLI pages #3133 (andygrove)
- User Guide: Add documentation for JOIN syntax #3130 (andygrove)
- separate contributors guide #3128 (kmitchener)
- minor: remove python docs, now they're in another project #3119 (kmitchener)
- minor: doc fixes: fix link to datafusion-python project and add link to slides for rece… #3118 (kmitchener)
- Add all scalar SQL functions to user guide #3090 (andygrove)
- Add DataFrame reference to the user guide #3067 (andygrove)
- MINOR: Add CeresDB to list of products using DataFusion #3060 (andygrove)
- Minor: improve some docstrings about pruning #3041 (alamb)
- doc: add a new video link about datafusion #3025 (xudong963)
- Update README.md to add CnosDB into the Known Uses #2933 (cnoshb)
Performance improvements:
Closed issues:
- Rename
do_data_time_math()
todo_date_time_math()
#3172 - Automatic version updates for github actions with dependabot #3106
- [EPIC] Proposal for Date/Time enhancement #3100
- Upgrade prost/tonic everywhere #3028
- [Question] interested in helping with documentation #2866
- Introducing a new optimizer framework for datafusion. #2633
- Enable discussion tab? #2350
- Add support for AVG(Timestamp) types #200
- TPC-H Query 22 #175
- TPC-H Query 21 #172
- TPC-H Query 20 #171
- TPC-H Query 17 #168
- TPC-H Query 11 #163
- TPC-H Query 4 #160
- TPC-H Query 2 #159
- [Datafusion] Optimize literal expression evaluation #106
Merged pull requests:
- Rename do_data_time_math() to do_date_time_math() #3173 (JasonLi-cn)
- [Minor] Remove some redundant code #3169 (alamb)
- Support
INTEGER
again in addition toINT
inCREATE TABLE
andCAST
statements #3167 [sql] (alamb) - Fix regression in SQL parser related to resolution of aliased expressions #3165 [sql] (andygrove)
- update cargo lock #3164 (waitingkuo)
- add test case for cast_timestamp_before_1970 #3163 (waitingkuo)
- Return proper error message for ill formed variable reference #3162 (alamb)
- Remove outdated license text left over from arrow repo #3154 (alamb)
- Expose RowAccumulator in physical_plan #3151 (iajoiner)
- Rename
DateIntervalExpr
toDateTimeIntervalExpr
#3150 (alamb) - Bump actions/labeler from 4.0.0 to 4.0.1 #3144 (dependabot[bot])
- User Guide: Add documentation for subquery syntax #3132 (andygrove)
- MINOR: User Guide: Move Data Types and Information Schema to their own pages #3131 (andygrove)
- Minor: Clean up
array
test #3121 (alamb) - add arrow_typeof #3120 (waitingkuo)
- Bump actions/labeler from 2.2.0 to 4.0.0 #3114 (dependabot[bot])
- Bump actions/checkout from 2 to 3 #3113 (dependabot[bot])
- Bump actions/setup-node from 2 to 3 #3112 (dependabot[bot])
- Bump actions/setup-python from 3 to 4 #3111 (dependabot[bot])
- Feature/support timestamp plus minus interval #3110 (JasonLi-cn)
- docs: fix typo #3109 (dzvon)
- Remove offset if its zero #3102 (turbo1912)
- Hash binary values #3098 [sql] (Dandandan)
- Update to object_store 0.4 #3089 (tustvold)
- Add cast function for creating cast expression #3084 (turbo1912)
- Upgrade to arrow 20.0.0 (but no change to object_store), including
prost
, andtonic
#3083 [sql] (avantgardnerio) - impl Debug for ColumnarValue, add some docs #3076 (alamb)
- [Minor] run cargo update in datafusion-cli directory #3075 (alamb)
- update cargo.lock in
datafusion-cli
#3074 (waitingkuo) - Update sql parser to v0.20.0 #3072 [sql] (waitingkuo)
- Add opening, scanning, processing metrics in file stream #3070 (Ted-Jiang)
- Simplify
approx_median
implementation, expose viaDataFrame
API #3064 [sql] (andygrove) - docs: fix PruningStatistics example and some typos #3062 (roeap)
- feat: support double quoted literal strings for dialects(such as mysql,bigquery,spark) #3056 [sql] (Rachelint)
- Allow Overriding AsyncFileReader used by ParquetExec #3051 (Cheappie)
- to_timestamp i32 coerced to i64 #3047 (waitingkuo)
- Fix
IsNull
pruning expression generation without null_count statistics #3044 (alamb) - feat: Support
week
,decade
,century
for Interval literal #3038 [sql] (ovr) - feat: Support Binary bitwise shift operators (<< and >>) #3037 [sql] (ovr)
- Use concat_elements_utf8 from arrow rather than custom kernel #3036 (alamb)
- minor: update minimal rust version to 1.62, matching arrow-rs #3035 [sql] (kmitchener)
- feat: Add
date_bin
built-in function #3034 (stuartcarnie) - Split
binary_expr.rs
into smaller modules #3026 (alamb) - feat: Enable typed strings expressions for VALUES clause #3018 [sql] (stuartcarnie)
- fix typo for PR3003 #3011 (waitingkuo)
- feat: Add support for TIME literal values #3010 [sql] (stuartcarnie)
- add TimeUnit::Second as signature for ToTimestampSeconds #3004 (waitingkuo)
- Rename FileReader to FileOpener (#2990) #2991 (tustvold)
- minor: collation the prune test #2986 (liukun4515)
- Optionally skip metadata from schema when merging parquet files #2985 (alamb)
- [Minor] Extract interval parsing logic, add unit tests #2984 [sql] (alamb)
- Update sqlparser to 0.19 #2981 [sql] (alamb)
- test: add file/SQL level test for pruning parquet row group with decimal data type. #2977 (liukun4515)
- Derive Hash for JoinType #2972 (liurenjie1024)
- Example that shows how to convert query result into rust struct #2959 #2969 (thomas-k-cameron)
- Add baseline_metrics for FileStream to record metrics like elapsed ti… #2965 (Ted-Jiang)
- test: add test for decimal and pruning for decimal column #2960 (liukun4515)
- Simplify expressions with
NOT
clause #2958 (AssHero) - chore: update jit-related dependencies #2956 (xudong963)
- Update to arrow
19.0.0
#2955 [sql] (alamb) - Remove CI Caching to preserve diskspace #2948 (alamb)
- Add metadata_size_hint for optimistic fetching of parquet metadata #2946 (thinkharderdev)
- Minor: Remove left over debugging statement #2944 (alamb)
- add Atan2 #2942 (waitingkuo)
- Use
Arc<ObjectStoreRegistry>
and remove ObjectStoreRegistry::clone #2941 (tustvold) - add extension system to
SessionConfig
#2940 (crepererum) - Update prost-build requirement from 0.7 to 0.10 #2937 (dependabot[bot])
- Add streaming JSON and CSV reading, `NewlineDelimitedStream' (#2935) #2936 (tustvold)
- feat(catalog): Implement information_schema.views #2934 [sql] (BaymaxHWY)
- Support
window
functions in expressions by re-write projection after building window plan #2932 [sql] (AssHero) - Add pow as synonym for power #2927 (andygrove)
- Add
from_unixtime
function #2924 (waitingkuo) - fix(aggregate): support mean as synonym avg #2923 (BaymaxHWY)
- Add
DataFrame::with_column_renamed
#2920 (andygrove) - Run clippy with optional features #2918 (tustvold)
- Fix release verification script by not overriding
ARROW_TEST_DATA
orPARQUET_TEST_DATA
#2917 (alamb) - Move
ScalarValue
tests alongside implementation, movefrom_slice
todatafusion_core
#2914 (alamb) - Optimizer should have option to skip failing rules #2909 (andygrove)
- Introduce ObjectStoreProvider to create an object store based on the url #2906 (yahoNanJing)
- Remove datafusion-data-access crate #2904 (yahoNanJing)
- Combine all comparison coercion rules #2901 (andygrove)
- Add
Projection::try_new
andProjection::try_new_with_schema
#2900 (andygrove) - Improve formatting of logical plans containing subqueries #2899 [sql] (andygrove)
- add session option 'datafusion.explain.logical_plan'. when set to true, the explain statement will only print logical plans. #2895 (AssHero)
- Preserve field name in
ScalarValue::List
#2893 [sql] (comphead) - Adds optional serde support to datafusion-proto #2892 (tustvold)
- Implement
ScalarValue::Dictionary
and preserve type through conversion back/forth to Array #2891 (alamb) - Add an ID generator in preparation for PR 2885 #2887 (avantgardnerio)
- Add support for correlated subqueries & fix all related TPC-H benchmark issues #2885 (avantgardnerio)
- fix(doc): update test directory link in CONTRIBUTING.md #2882 (BaymaxHWY)
- Add h2o bench groupby queries #2881 (andygrove)
- Add support for month & year intervals #2797 (avantgardnerio)
- Migrate from avro_rs (0.13) to apache_avro (0.14) #2784 (martin-g)
10.0.0-rc1 (2022-07-12)
10.0.0 (2022-07-12)
Breaking changes:
- Convert batch_size to config option #2771 (andygrove)
- MINOR: Remove Offset struct #2734 (andygrove)
- feat: async extension planner #2713 (waynexia)
- Switch to object_store crate (#2489) #2677 (tustvold)
Implemented enhancements:
- update documentation, fix styling to match main Arrow project #2864
- Update top-level README #2850
- [Question]How to call an async function in
ExecutionPlan::exec
method? #2847 - Add
DataFrame::with_column
#2844 - Improve ergonomics of physical expr
lit
#2827 - Add Python examples for reading CSV and query by SQL in Doc #2824
- eliminate multi limit-offset nodes to EmptyRelation if possible #2822
- Make
LogicalPlan::Union
be consistent with other plans #2816 - Use coerced data type from value and list expressions during planning inlist expression #2793
- Add configuration option to enable/disalbe
CoalesceBatchesExec
#2790 - Simplify FilterNullJoinKeys rule #2780
- Allow configuration settings to be specified with environment variables #2776
- Automatically update
configs.md
in user guide #2770 - Support multiple paths for ListingTableScanNode #2768
- Reduce outer joins #2757
- support data type coerced and decimal in INLIST expr #2755
- Change ExtensionPlanner::plan_extension() to an async function #2749
- Add
IsNotNull
filter to join inputs if one side of join condition does not allow null #2739 - Sort preserving MergeJoin #2698
- Improve readability of table scan projections in query plans #2697
- DataFusion 9.0.0 Release #2676
- Improve UX for
UNION
vsUNION ALL
(introduce a LogicalPlan::Distinct) #2573 [sql] - Implement some way to show the sql used to create a view #2529
- Consider adopting IOx ObjectStore abstraction #2489
- Support
sum0
as a built-in agg function #2067 - implement grouping sets, cubes, and rollups #1327
- Ruby bindings #1114
- Support dates in hash join #2746 (andygrove)
Fixed bugs:
- Docker Error #2851
- Anti join ignores join filters #2842
- Can't test or compile sub-model code after upgrade to arrow-rs 17.0.0 #2835
- Not evaluate the set expr in the InList for the optimization #2820
- CASE When: result type should be coercible to a common type #2818
- IN/NOT IN List: NULL is not equal to NULL #2817
- panic when case statement returns null #2798
- InList: Can't cast the list expr data type to value expr data type directly #2774
- InList Expr: expr and list values must can be converted to a same data type #2759
- tpchgen docker syntax change prevents volume from binding #2751
- Cannot join on date columns (Unsupported data type in hasher: Date32) #2744
rewrite_expression
does not properly handleExists
andScalarSubquery
#2736- LocalFileSystem Not sorted by file name, As a result, the data lines queried in multiple files are out of order. #2730
- Filter push down need consider alias columns #2725
- Recent API change in
GlobalLimitExec
breaks compatibility with Ballista #2720 - Common Subexpression Eliminiation pass errors if run twice on some plans: Schema contains duplicate unqualified field name 'IsNull-Column-sys.host' #2712
- The data type is not compatible with other system, for example spark or PG database #1379
Documentation updates:
- Fix docs styling #2865 (kmitchener)
- Various updates to top-level README #2854 (andygrove)
- MINOR: Add documentation for running integration tests #2839 (andygrove)
- add csv registration and sql query to examples #2825 (waitingkuo)
- [minor] refine doc #2753 (Ted-Jiang)
Closed issues:
- Consider adding a prominent note in the readme about ballista #2853
- support decimal in (NULL) #2800
- InList: Don't treat Null as UTF8(None) #2782
- InList: don't need to treat Null as UTF8 data type #2773
- Implement extensible configuration mechanism #138
Merged pull requests:
- Update CONTRIBUTING.md #2876 (waitingkuo)
- Make LogicalPlan::Union be consistent with other plans #2868 (comphead)
- minor: remove unneeded files from project root #2863 (kmitchener)
- chore: make cargo clippy happy in nigtly #2860 [sql] (xudong963)
- Update to arrow 18.0.0 #2856 [sql] (alamb)
- chore: remove ballista-related docker-compose file #2852 (xudong963)
- Adding dataframe with_column function #2849 (comphead)
- anti joins now respect join filters #2843 (andygrove)
- MINOR: make name meaningful and clean up code #2841 (liukun4515)
- Make
lit
implementation more concise #2838 (alamb) - InList: set/list value must be evaluated to get the values #2834 (liukun4515)
- Add SHOW CREATE TABLE with initial support for views #2830 [sql] (mrob95)
- Improve ergonomics of physical expr
lit
#2828 (alamb) - Eliminate multi limit-offset nodes to emptyRelation #2823 (AssHero)
- Fix the ci #2821 (liukun4515)
- CaseWhen: coerce the all then and else data type to a common data type #2819 (liukun4515)
- Fix
ScalarValue::isNull
calculation #2815 (alamb) - Fix nullability calculation for
CASE
expressions #2814 (alamb) - Bump numpy from 1.21.3 to 1.22.0 in /integration-tests #2811 (xudong963)
- Fix data type calculation for
CaseExpr
s withNULLs
#2810 (AssHero) - InList: fix bug for comparing with Null in the list using the set optimization #2809 (liukun4515)
- Use specialized dictionary kernels (#1178) #2808 (tustvold)
- fix schema nullability for
information_schema
schema #2804 (alamb) - fix: correctly calculate join output schema nullability #2803 (alamb)
- Correct schema nullability declaration in tests #2802 (alamb)
- Don't treat Null as UTF8(None) and change error info. #2801 (liukun4515)
- MINOR: Remove reference to docker image that is no longer available #2795 (andygrove)
- Use coerced type in inlist expr planning #2794 (viirya)
- Add LogicalPlan::Distinct #2792 [sql] (mrob95)
- Add config option for coalesce_batches physical optimization rule, make optional #2791 (andygrove)
- Improve readability of table scan projections in query plans (remove
Some
andNone
) #2789 [sql] (comphead) - Simplify FilterNullJoinKeys rule #2781 (andygrove)
- MINOR: re-export sqlparser from datafusion-sql crate #2779 [sql] (andygrove)
- Update to arrow 17.0.0 #2778 [sql] (alamb)
- Support multiple paths for ListingTableScanNode #2775 (Ted-Jiang)
- Remove expr_sub_expressions and rewrite_expression functions #2772 (mrob95)
- minor: update cranelift related dependencies #2769 (xudong963)
- minor: panic rather than fail silently on bad dictionary in hash join #2767 (alamb)
- MINOR: make
prettier
use consistent between CI and contributing guide #2766 (andygrove) - Rewrite subexpressions of InSubquery in rewrite_expression #2765 (mrob95)
- Support
DataType::Decimal
forIN
andNOT IN
expressions #2764 (liukun4515) - Implement extensible configuration mechanism #2754 (andygrove)
- Remove redundant docker argument #2752 (avantgardnerio)
- Add optimizer pass to reduce
left
/right
/full
joins toinner
join if possible #2750 [sql] (AssHero) - MINOR: Remove legacy CLI context enum #2748 (andygrove)
- CSE unit test for duplicate fields #2747 (waynexia)
- MINOR: Improve unsupported data type error message #2745 (andygrove)
- Add optimizer rule to filter out null keys before a join #2740 (andygrove)
- Sort file names in a directory #2730 #2735 (yourenawo)
- fix: filter push down with
InList
expressions #2729 (Ted-Jiang) - [Minor] add debug info in optimizer.rs #2726 (Ted-Jiang)
- Add public API for GlobalLimitExec and LocalLimitExec #2722 (andygrove)
- Add additional data types are supported in hash join #2721 (AssHero)
- Upgrade to arrow
16.0.0
#2718 [sql] (alamb) - Fix clippy warnings with toolchain 1.63 #2717 [sql] (waynexia)
- Support for GROUPING SETS/CUBE/ROLLUP #2716 (thinkharderdev)
- fix: check redundant fields while building projection plan #2715 (waynexia)
- Sort preserving
SortMergeJoin
#2699 (korowa) - fix: union schema fix #2688 [sql] (gandronchik)
- Support default precision and scale to
CAST <EXPR> AS DECIMAL
#2680 [sql] (gandronchik)
9.0.0 (2022-06-10)
Breaking changes:
- MINOR: Move
simplify_expression
rule todatafusion-optimizer
crate #2686 (andygrove) - Move physical expression planning to
datafusion-physical-expr
crate #2682 (andygrove) - Create new
datafusion-optimizer
crate for logical optimizer rules #2675 (andygrove) - Remove
ExecutionProps
dependency fromOptimizerRule
#2666 (andygrove) - Remove ObjectStoreSchemaProvider (#2656) #2665 (tustvold)
- Move
LogicalPlanBuilder
todatafusion-expr
crate #2576 (andygrove) LogicalPlanBuilder
now usesTableSource
instead ofTableProvider
#2569 (andygrove)- Remove
scan_empty
method fromLogicalPlanBuilder
#2568 (andygrove) - MINOR: Move expression utils from sql module to expr crate #2553 (andygrove)
- Remove
scan_json
methods fromLogicalPlanBuilder
#2541 (andygrove) - Remove
scan_avro
methods fromLogicalPlanBuilder
#2540 (andygrove) - Remove
scan_parquet
methods fromLogicalPlanBuilder
#2539 (andygrove) - MINOR: Move
ExprVisitable
andexprlist_to_columns
to datafusion-expr crate #2538 (andygrove) - Remove
scan_csv
methods fromLogicalPlanBuilder
#2537 (andygrove) - Fix Redundant ScalarValue Boxed Collection #2523 (comphead)
- Support for OFFSET in LogicalPlan #2521 (jdye64)
Implemented enhancements:
- [EPIC] JIT support for
DataFusion
#2703 - Show column names instead of column indices in query plans #2689
- Proposal: remove automated ballista CI checks from DataFusion #2679
- Pass SessionState to TableProvider #2658
- Is ObjectStoreSchemaProvider Still Needed? #2656
- Add logical plan support to
datafusion-proto
#2630 - Like, NotLike expressions work with literal
NULL
#2626 - Move
JOIN ON
predicates push down logic from planner to optimizer #2619 - Remove
ExecutionProps
fromOptimizerRule
trait #2614 - Add, Minus, Multiply, divide, Modulo operator work with literal
NULL
#2609 - Support
DESCRIBE <table>
to show table schemas #2606 - Support
CREATE OR REPLACE TABLE
#2605 - filter_push_down tests should not rely on TableProvider and ExecutionPlan #2600
- Move logical optimizer rules out of the core datafusion crate #2599
- Push Limit through outer Join #2579
datafusion_proto
crate should have exhaustive match statements for handlingExpr
#2565- String representation of Expr variant #2563
- File URI Scheme Interpretation #2562
- Implement physical plan for OFFSET #2551
- Update limit pushdown rule to support offsets #2550
- Move
LogicalPlanBuilder
todatafusion-expr
crate #2536 - Logical optimizer rule "simplify expressions" should not depend on the core datafusion crate #2535
- Support optional filter in Join #2509
- Improve SQL planner & logical plan support for JOIN conditions #2496
- Numeric, String, Boolean comparisons with literal
NULL
#2482 - Redundant ScalarValue Boxed Collection #2449
- ObjectStore Directory Semantics #2445
- Add support for
OFFSET
in SQL query planner + logical plan #2377 - SQL planner should use
TableSource
notTableProvider
#2346 - Move SQL query planning to new crate #2345
- Update LogicalPlan rustdoc code to not use LogicalPlanBuilder #2308
- [Optimizer] Refactor
convert join
#2256 - [Optimizer] Infer is not null predicate from
where clause
#2254 - Support ArrayIndex for ScalarValue(List) #2207
- [Ballista] Fill functional gaps between datafusion and ballista #2062
- [Ballista] support datafusion built_in UDAF work in ballista cluster #1985
- Export C API #1113
Fixed bugs:
- Fix Typos in Docs #2695
- Unable to build a docker image #2691
- Optimization pass
AggregateStatistics
changes type of output fromInt64
toUInt64
#2673 - ViewTable Circular Reference #2657
ScalarValue::to_array_of_size
panics computing statistics for nested parquet file #2653- The result type of count/count_distinct #2635
- limit_push_down is not working properly with OFFSET #2624
- Avro Tests Fail To Compile #2570
- Unused Window functions experssion is wrongly removed from LogicalPlan during optimalization #2542
- Bug: ObjectStoreRegistry get_by_uri does not return correct path when "scheme" is provided #2525
- There are duplicate and inconsistent copies of
datafusion.proto
#2514 - Projection pushdown produces incorrect results when column names are reused #2462
- Incorrect Parquet Projection For Nested Types #2453
- LogicalPlanBuilder::scan_csv creates scans with invalid table names #2278
- Inner join incorrectly pushdown predicate with OR operation #2271
- Ignored alias for columns with aggregate function and incorrect results when collecting statistics is enabled #2176
- Join on path partitioned columns fails with error #2145
Documentation updates:
- Fix Ballista link #2654 (dsaxton)
- MINOR: Add Blaze as a project using DataFusion #2618 (yjshen)
- [MINOR] remove datafusion-cli's ballista feature from docs #2612 (Ted-Jiang)
- chore(doc) remove ballista from datafusion-cli readme #2604 (ming535)
Closed issues:
- [Question] Converting TableSource to custom TableProvider #2644
- [Question] Why DataFusion is shipped with arrow version 9.1.0 on crates.io ? #2474
Merged pull requests:
- Test optional features in CI #2708 (tustvold)
- support indexed fields proto #2707 (nl5887)
- Update sqlparser-rs to 0.18.0 #2705 (alamb)
- [MINOR]: Add documentation to
datafusion-row
modules #2704 (alamb) - Make sure that the data types are supported in hashjoin before genera… #2702 (AssHero)
- Move remaining code out of legacy
core/logical_plan
module #2701 (andygrove) - Move some tests from core to expr #2700 (andygrove)
- MINOR: Improve Docs Readability #2696 (ryanrussell)
- Combine limit and offset to
fetch
andskip
and implement physical plan support #2694 (ming535) - MINOR: Add datafusion-sql example #2693 (andygrove)
- Remove Ballista related lines from Dockerfile #2692 (mocknen)
- Show column names instead of indices in query plans #2690 (andygrove)
- MINOR: Remove uses of TryClone for Parquet #2681 (tustvold)
- Fix
AggregateStatistics
optimization so it doesn't change output type #2674 (alamb) - If statistics of column Max/Min value does not exists in parquet file, sent Min/Max to None #2671 (AssHero)
- MINOR: Move more expression code to
datafusion-expr
crate #2669 (andygrove) - MINOR: Rewrite imports in optimizer moduler #2667 (andygrove)
- Update snmalloc-rs requirement from 0.2 to 0.3 #2663 (dependabot[bot])
- Add module doc for RuntimeEnv, SessionContext, TaskContext, etc... #2655 (tustvold)
- Prune unused dependencies from datafusion-proto #2651 (tustvold)
- MINOR: Implement serde for join filter #2649 (andygrove)
- pushdown support for predicates in
ON
clause of joins #2647 (korowa) - Move
SortKeyCursor
andRowIndex
into modules, addsort_key_cursor
test #2645 (alamb) - Implement DESCRIBE <table> #2642 (LiuYuHui)
- Implement
LogicalPlan
serde indatafusion-proto
#2639 (andygrove) - Fix limit + offset pushdown #2638 (ming535)
- change result type of count/count_distinct from uint64 to int64 #2636 (liukun4515)
- if none columns in window expr are needed, remove the window exprs #2634 (AssHero)
- Like, NotLike expressions work with literal
NULL
#2627 (WinkerDu) - MINOR: Refactor
datafusion-proto
dependencies and imports #2623 (andygrove) - MINOR: add optimizer struct #2616 (jackwener)
- Remove FilterPushDown dependency on physical plan #2615 (andygrove)
- Support CREATE OR REPLACE TABLE #2613 (AssHero)
- Support binary mathematical operators work with
NULL
literals #2610 (WinkerDu) - chore: try fix CI coverage #2608 (Ted-Jiang)
- MINOR: Rename benchmark crate #2607 (andygrove)
- chore(dep): bump cranelift to 0.84.0 #2598 (waynexia)
- fix some typos #2597 (ming535)
- Support limit pushdown through left right outer join #2596 (Ted-Jiang)
- Unignore rustdoc code examples in
datafusion-expr
crate #2590 (andygrove) - Evaluate JIT'd expression over arrays #2587 (waynexia)
- [minor]Fix ci clippy for unused import #2586 (Ted-Jiang)
- [Doc]add doc for enable SIMD need
cargo nightly
#2577 (Ted-Jiang) - Add DataFrame
union_distinct
and fix documentation fordistinct
#2574 (andygrove) - Fix avro tests (#2570) #2571 (tustvold)
- Make datafusion-proto match exhaustive #2567 (andygrove)
- Support limit push down for offset_plan #2566 (Ted-Jiang)
- Introduce Expr.variant_name() function #2564 (jdye64)
- Fix some 404 links in the contribution guide #2561 (hi-rustin)
- Update datafusion-cli readme cli version #2559 (hi-rustin)
- MINOR: Move
expr_rewriter.rs
todatafusion-expr
crate #2552 (andygrove) - Fix
JOIN
s with complex predicates in ON (split ON expressions only by AND operator) #2534 (korowa) - Reduce duplication in file scan tests #2533 (tustvold)
- Fix size_of_scalar test #2531 (alamb)
- Update to arrow-rs 14.0.0 #2528 (alamb)
- ObjectStoreRegistry get_by_uri now returns correct path when "scheme" is provided #2526 (timvw)
- MINOR: Add ORDER BY clause to test #2524 (andygrove)
- Remove unused
binary_array_op_scalar!
in binary.rs #2512 (alamb) - fix
NULL <op> column
evaluation, tests for same #2510 (alamb) - Fix projection pushdown produces incorrect results when column names are reused #2463 (jonmmease)
- Benchmark for sort preserving merge #2431 (alamb)
- Support GetIndexedFieldExpr for ScalarValue #2196 (ovr)
8.0.0 (2022-05-12)
Breaking changes:
- Add SQL planner support for
ROLLUP
andCUBE
grouping set expressions #2446 (andygrove) - Make
ExecutionPlan::execute
Sync #2434 (tustvold) - Introduce new
DataFusionError::SchemaError
type #2371 (andygrove) - Add
Expr::InSubquery
andExpr::ScalarSubquery
#2342 (andygrove) - Add
Expr::Exists
to represent EXISTS subquery expression #2339 (andygrove) - Move
LogicalPlan
enum todatafusion-expr
crate #2294 (andygrove) - Remove dependency from
LogicalPlan::TableScan
toExecutionPlan
#2284 (andygrove) - Move logical expression type-coercion code from
physical-expr
crate toexpr
crate #2257 (andygrove) - feat: 2061 create external table ddl table partition cols #2099 [sql] (jychen7)
- Reorganize the project folders #2081 (yahoNanJing)
- Support more ScalarFunction in Ballista #2008 (Ted-Jiang)
- Merge dataframe and dataframe imp #1998 (vchag)
- Rename
ExecutionContext
toSessionContext
,ExecutionContextState
toSessionState
, addTaskContext
to support multi-tenancy configurations - Part 1 #1987 (mingmwang) - Add Coalesce function #1969 (msathis)
- Add Create Schema functionality in SQL #1959 [sql] (matthewmturner)
- omit some clone when converting sql to logical plan #1945 [sql] (doki23)
- [split/16] move physical plan expressions folder to datafusion-physical-expr crate #1889 (Jimexist)
- remove sync constraint of SendableRecordBatchStream #1884 (doki23)
- [split/15] move built in window expr and partition evaluator #1865 (Jimexist)
Implemented enhancements:
- Include
Expr
todatafusion::prelude
#2347 - Implement
Serialization
API for DataFusion #2340 - Implement
power
function #1493 - allow
lit
python function to supportboolean
and other types #1136 - Automate dependency updates #37
- Add
CREATE VIEW
#2279 (matthewmturner) - [Ballista] Support Union in ballista. #2098 (Ted-Jiang)
- Change the DataFusion explain plans to make it clearer in the predicate/filter #2063 (Ted-Jiang)
- Add
write_json
,read_json
,register_json
, andJsonFormat
toCREATE EXTERNAL TABLE
functionality #2023 (matthewmturner) - Qualified wildcard #2012 [sql] (doki23)
- support bitwise or/'|' operation #1876 [sql] (liukun4515)
- Introduce JIT code generation #1849 (yjshen)
Fixed bugs:
- CASE expr with NULL literals panics
'WHEN expression did not return a BooleanArray'
#1189 - Function calls with NULL literals do not work #1188
- Add SQL planner support for calling
round
function with two arguments #2503 (andygrove) - nested query fix #2402 (comphead)
- fix issue#2058 file_format/json.rs attempt to subtract with overflow #2066 (silence-coding)
- fix bug the optimizer rule filter push down #2039 (jackwener)
- fix: replace
ExecutionContex
andExecutionConfig
withSessionContext
andSessionConfig
#2030 (xudong963) - Fixed parquet path partitioning when only selecting partitioned columns #2000 (pjmore)
- Fix ambiguous reference error in filter plan #1925 (jonmmease)
- platform aware partition parsing #1867 (korowa)
- Fix incorrect aggregation in case that GROUP BY contains duplicate column names #1855 (alex-natzka)
Documentation updates:
- MINOR: Make crate READMEs consistent #2437 (andygrove)
- minor: Improve documentation for DFSchema join and merge functions #2367 (andygrove)
- Change the code location and add annotation #2037 [sql] (jackwener)
- Fix typos (Datafusion -> DataFusion) #1993 (andygrove)
- Add examples to use MemTable and TableProvider (#1864) #1946 (PierreZ)
- Add doc for building
datafusion-cli
when connect the ballista #1866 (liukun4515) - Add benchmarks section to DEVELOPERS.md #1838 (tustvold)
Performance improvements:
- Avoid an Arc::clone per row in benchmark #1975 (jhorstmann)
- Update datafusion-cli allocator #1878 (matthewmturner)
Closed issues:
- Make expected result string in unit tests more readable #2412
- remove duplicated
fn aggregate()
in aggregate expression tests #2399 - split
distinct_expression.rs
intocount_distinct.rs
andarray_agg_distinct.rs
#2385 - move sql tests in
context.rs
to corresponding test files indatafustion/core/tests/sql
#2328 - Date32/Date64 as join keys for merge join #2314
- Error precision and scale for decimal coercion in logic comparison #2232
- Support Multiple row layout #2188
- TPC-H Query 18 #169
- TPC-H Query 16 #167
- Implement Sort-Merge Join #141
- Split logical expressions out into separate source files #114
Merged pull requests:
- Minor: remove code that is now included in arrow-rs #2511 (alamb)
- MINOR: Enable multi-statement benchmark queries #2507 (andygrove)
- MINOR: Add ignored tests for all remaining benchmark queries #2506 (andygrove)
- Update to
sqlparser
0.17.0
#2500 (alamb) - Add metrics for ParquetExec #2499 (Ted-Jiang)
- Limit cpu cores used when generating changelog #2494 (andygrove)
- Optimize MergeJoin by storing joined indices instead of creating small record batches for each match #2492 (richox)
- Add SQL planner support for
grouping()
aggregate expressions #2486 (andygrove) - MINOR: Parameterize changelog script #2484 (jychen7)
- Numeric, String, Boolean comparisons with literal
NULL
#2481 (WinkerDu) - Adds unit test cases of mathematical expressions working with
null
literal #2478 (WinkerDu) - Minor: Move test code from
context.rs
intosql_integration
#2473 (alamb) - Minor: Use ExprVisitor to find columns referenced by expr #2471 (alamb)
- minor: remove expr dependency from the row crate, update crate-deps.dot/svg #2470 (yjshen)
- Fix
read_from_registered_table_with_glob_path
fails if path contains // #2465 #2468 (timvw) - Add support for list_dir() on local fs #2467 (wjones127)
- MINOR: Partial fix for SQL aggregate queries with aliases #2464 (andygrove)
- minor: move struct definition out of
aggregate/mod.rs
, etc #2458 (WinkerDu) - Fix bugs in SQL planner with GROUP BY scalar function and alias #2457 (andygrove)
- feat: Support CompoundIdentifier as GetIndexedField access #2454 (ovr)
- Table provider error propagation #2438 (jdye64)
- MINOR: Improve error messages for GROUP BY / HAVING queries #2435 (andygrove)
- minor: remove redundant code #2432 (jackwener)
- minor: update versions and paths in changelog scripts #2429 (andygrove)
- Fix Ballista executing during plan #2428 (tustvold)
- minor: format table result vec & remove some unnecessary semicolons #2425 (WinkerDu)
- Basic support for
IN
andNOT IN
Subqueries by rewriting them toSEMI
/ANTI
Join #2421 (korowa) - Allow subqueries without aliases #2418 (andygrove)
- Fix bug in subquery join filters referencing outer query #2416 (andygrove)
- MINOR: remove duplicated function
format_state_name()
#2414 (WinkerDu) - Make expected result string in unit tests more readable #2413 (WinkerDu)
sum(distinct)
support #2405 (WinkerDu)- Update ordered-float requirement from 2.10 to 3.0 #2403 (dependabot[bot])
- remove duplicated
fn aggregate()
in aggregate expression tests #2400 (WinkerDu) - Support type-coercion from Decimal to Float64 #2396 (comphead)
- minor: SchemaError code cleanup and improvements #2391 (andygrove)
- Support struct_expr generate struct in sql #2389 (Ted-Jiang)
- Re-organize and rename aggregates physical plan #2388 (yjshen)
- refactor
distinct_expressions.rs
and split intocount_distinct.rs
andarray_agg_distinct.rs
#2386 (WinkerDu) - Allow CTEs to be referenced from subquery expressions #2384 (andygrove)
- Upgrade to arrow 13 #2382 (alamb)
- Grouped Aggregate in row format #2375 (yjshen)
- Fix bugs with CTE aliasing and normalize all identifiers in the SQL planner #2373 (andygrove)
- Stop optimizing queries twice #2369 (andygrove)
- feat: Support casting to arrays to primitive type #2366 (ovr)
- Add proper support for
null
literal by introducingScalarValue::Null
#2364 (WinkerDu) - minor: fix duplicate column bug in subquery support #2362 (andygrove)
- Normalize subquery aliases #2359 (andygrove)
- Implement physical planner support for DATE +/- INTERVAL #2357 (andygrove)
- Add SQL query planner support for Scalar Subqueries #2354 (andygrove)
- Add SQL query planner support for IN subqueries #2352 (andygrove)
- Add
Expr
to prelude #2348 (alamb) - Add SQL planner support for EXISTS subqueries #2344 (andygrove)
- Add public Serialization/Deserialization API for
Expr
to/from bytes #2341 (alamb) - Support for date32 and date64 in sort merge join #2336 (hntd187)
- [physical-expr] move aggregate exprs and window exprs to their own modules #2335 (yjshen)
- fix: union schema #2334 (gandronchik)
- Improve sql integration test organization #2333 (alamb)
- Support scalar values for func Array #2332 (Ted-Jiang)
- move sql tests from
context.rs
to corresponding test files intests/sql
#2329 (WinkerDu) - deprecate
index_of
and makeindex_of_column_by_name
public #2320 (jdye64) - Fix HashJoin evaluating during plan #2317 (tustvold)
- minor: remove two source files that only had re-exports #2313 (andygrove)
- Don't sort batches during plan #2312 (tustvold)
- Move case/when expressions to datafusion-expr crate #2311 (andygrove)
- Fix CrossJoinExec evaluating during plan #2310 (tustvold)
- Make SortPreservingMerge Usable Outside Tokio (#2201) #2305 (tustvold)
- chore: update cranelift to 0.83.0 #2304 (yjshen)
- Always increment timer on record #2298 (tustvold)
- Remove unnecessary env var for parquet_sql example #2297 (sergey-melnychuk)
- Simplify sort streams #2296 (tustvold)
- MINOR: beautify code with neat idents #2295 (WinkerDu)
- Move FileType enum from sql module to logical_plan module #2290 (andygrove)
- Remove Parquet Empty Projection Workaround #2289 (tustvold)
- Add BatchPartitioner (#2285) #2287 (tustvold)
- Make row its crate to make it accessible from physical-expr #2283 (yjshen)
- Enable filter pushdown when using In_list on parquet #2282 (Ted-Jiang)
- Update uuid requirement from 0.8 to 1.0 #2280 (dependabot[bot])
- Add bytes scanned metric to ParquetExec #2273 (thinkharderdev)
- Fix outer join output with all-null indices on empty batch #2272 (yjshen)
- Re-export DataFusion crates #2264 (andygrove)
- rewrite approx_median to approx_percentile_cont while planning phase #2262 (korowa)
- Introduce RowLayout to represent rows for different purposes #2261 (yjshen)
- fix string coercion missing in Eq/NotEq operator #2258 (WinkerDu)
- Update to Arrow 12.0.0, update tonic and prost #2253 (alamb)
- minor: move field_util from
physical-expr
crate toexpr
crate #2250 (andygrove) - Move identifer case tests to
sql_integ
, add negative cases, Debug forDataFrame
#2243 (alamb) - Implement sort-merge join #2242 (richox)
- fix: find the right wider decimal datatype for comparison operation #2241 (liukun4515)
- Fix join without constraints #2240 (Dandandan)
- Add type coercion rule for date + interval #2235 (andygrove)
- support array with scalar arithmetic operation for decimal data type #2233 (liukun4515)
- chore: add
debug!
log in some execution operators #2231 (NGA-TRAN) - Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199) #2226 (tustvold)
- minor: add editor config file #2224 (jackwener)
- minor: Refactor to avoid repeated code in replace_qualifier #2222 (andygrove)
- update cli readme #2220 (liukun4515)
- Use
filter
(filter_record_batch) instead oftake
to avoid using indices #2218 (Dandandan) - Add single line description of ExecutionPlan (#2216) #2217 (tustvold)
- Remove tokio::spawn from HashAggregateExec (#2201) #2215 (tustvold)
- Remove tokio::spawn from WindowAggExec (#2201) #2203 (tustvold)
- Make ParquetExec usable outside of a tokio runtime (#2201) #2202 (tustvold)
- add sql level test for decimal data type #2200 (liukun4515)
case when
supportsNULL
constant #2197 (WinkerDu)- feat: Support simple Arrays with Literals #2194 (ovr)
- [Ballista] Enable ApproxPercentileWithWeight in Ballista and fill UT #2192 (Ted-Jiang)
- refactor: simplify
prepare_select_exprs
#2190 (jackwener) - Multiple row-layout support, part-1: Restructure code for clearness #2189 (yjshen)
- make nightly clippy happy #2186 (xudong963)
- [Ballista]Make PhysicalAggregateExprNode has repeated PhysicalExprNode #2184 (Ted-Jiang)
- MINOR: handle
NULL
in advance to avoid value copy instring_concat
#2183 (WinkerDu) - fix: Sort with a lot of repetition values #2182 (yjshen)
- cli: update lockfile #2178 (happysalada)
- Add LogicalPlan::SubqueryAlias #2172 (andygrove)
- minor: Avoid per cell evaluation in Coalesce, use zip in CaseWhen #2171 (yjshen)
- Handle merged schemas in parquet pruning #2170 (thinkharderdev)
- Implement fast path of with_new_children() in ExecutionPlan #2168 (mingmwang)
- enable explain for ballista #2163 (doki23)
- Add delimiter for create external table #2162 (matthewmturner)
- [MINOR] enable
EXTRACT week
and add test (after sqlparser update to 0.16) #2157 (Ted-Jiang) - Optimize the evaluation of
IN
for large lists using InSet #2156 (Ted-Jiang) - Update sqlparser requirement from 0.15 to 0.16 #2152 (dependabot[bot])
- fix
not(null)
with constantnull
#2144 (WinkerDu) - Add IF NOT EXISTS to
CREATE TABLE
andCREATE EXTERNAL TABLE
#2143 (matthewmturner) - implement 'StringConcat' operator to support sql like "select 'aa' || 'b' " #2142 (WinkerDu)
- #2109 By default, use only 1000 rows to infer the schema #2139 (jychen7)
- [CLI] Add show tables in ballista for datafusion-cli #2137 (gaojun2048)
- fix: incorrect memory usage track for sort #2135 (yjshen)
- Update quarterly roadmap for Q2 #2133 (matthewmturner)
- Reduce SortExec memory usage by void constructing single huge batch #2132 (yjshen)
- MINOR: fix concat_ws corner bug #2128 (WinkerDu)
- Minor add clarifying comment in parquet #2127 (alamb)
- Minor: make disk_manager public #2126 (yjshen)
- JIT-compille DataFusion expression with column name #2124 (Dandandan)
- minor: replace array_equals in case evaluation with eq_dyn from arrow-rs #2121 (alamb)
- Serialize timezone in timestamp scalar values #2120 (thinkharderdev)
- minor: fix some clippy warnings from nightly rust #2119 (alamb)
- Fix case evaluation with NULLs #2118 (alamb)
- issue#1967 ignore channel close #2113 (silence-coding)
- cli: add cargo.lock #2112 (happysalada)
- doc: update release schedule #2110 (jychen7)
- fix df union all bug #2108 [sql] (WinkerDu)
- Reduce repetition in Decimal binary kernels, upgrade to arrow 11.1 #2107 (alamb)
- update zlib version to 1.2.12 #2106 (waitingkuo)
- Create jit-expression from datafusion expression #2103 (Dandandan)
- Add CREATE DATABASE command to SQL #2094 [sql] (matthewmturner)
- Refactor SessionContext, BallistaContext to support multi-tenancy configurations - Part 3 #2091 (mingmwang)
- minor: remove duplicate test #2089 (jackwener)
- minor: remove repeated test #2085 (jackwener)
- Fix lost filters and projections in ParquetExec, CSVExec etc #2077 (Ted-Jiang)
- Remove dependency of common for the storage crate #2076 (yahoNanJing)
- [MINOR] fix doc in `EXTRACT(field FROM source) #2074 (Ted-Jiang)
- [Bug][Datafusion] fix TaskContext session_config bug #2070 (gaojun2048)
- Short-circuit evaluation for
CaseWhen
#2068 (yjshen) - split datafusion-object-store module #2065 (yahoNanJing)
- Allow
CatalogProvider::register_catalog
to return an error #2052 (alamb) - Add test in register_catalog and change to use named symbolic constants #2050 (alamb)
- Update to arrow/parquet 11.0 #2048 (alamb)
- minor: format comments (
//
to//
) #2047 (jackwener) - use cargo-tomlfmt to check Cargo.toml formatting in CI #2033 (WinkerDu)
- feat: #2004 approx percentile with weight #2031 (jychen7)
- Refactor SessionContext, SessionState and SessionConfig to support multi-tenancy configurations - Part 2 #2029 (mingmwang)
- Simplify prerequisites for running examples #2028 (doki23)
- Replace usage of
println!
with logger macros #2020 (silence-coding) - Automatically test examples in user guide #2018 (vchag)
- return VecDeque for DFParser::parse_sql #2017 [sql] (doki23)
- Eliminate the scalar value filter #2002 (jackwener)
- Fixing a typo in documentation #1997 (psvri)
- Correct documentation of ExprVisitor #1996 (alamb)
- Make it possible to only scan part of a parquet file in a partition #1990 (yjshen)
- Update Dockerfile to fix integration tests #1982 (andygrove)
- Remove some more unecessary cloning in sql_expr_to_logical_expr #1981 [sql] (alamb)
- Add ticket reference to clippy allow #1978 [sql] (alamb)
- Implement EXTRACT expression with week, month, day, hour #1974 (Ted-Jiang)
- Address typo in ExprVisitable trait documentation #1970 (jdye64)
- Update sqlparser requirement from 0.14 to 0.15 #1966 (dependabot[bot])
- PruningPredicate should take owned Expr #1960 (thinkharderdev)
- Update to arrow 10.0.0, pyo3 0.16 #1957 (alamb)
- update jit-related dependencies #1953 (xudong963)
- minor code refinement:
if_exists
name change, wildcard field for logical plan, etc. #1951 [sql] (xudong963) - Allow different types of query variables (
@@var
) rather than just string #1943 [sql] (maxburke) - Pruning serialization #1941 (thinkharderdev)
- Add write_parquet to
DataFrame
#1940 (matthewmturner) - Fix select from EmptyExec always return 0 row after optimizer passes #1938 (Ted-Jiang)
- Add debug log when waiting for spilling on other consumers #1933 (viirya)
- Add db benchmark script #1928 (matthewmturner)
- Add write_csv to DataFrame #1922 (matthewmturner)
- [MINOR] Update copyright year in Docs #1918 (alamb)
- add metadata to DFSchema, close #1806. #1914 [sql] (jiacai2050)
- Clippy fix on nightly #1907 (yjshen)
- Updated Rust version to 1.59 in all the files #1903 (NaincyKumariKnoldus)
- support extract second and minute in expr. #1901 (Ted-Jiang)
- Update crate descriptions #1899 (alamb)
- Remove uneeded Mutex in Ballista Client #1898 (alamb)
- [split/17] move the rest of physical expr to datafusion-physical-expr crate #1892 (Jimexist)
- Avoid unnecessary branching in row read/write if schema is null-free #1891 (yjshen)
- Make parquet support optional for datafusion-common crate #1886 (jonmmease)
- Fix clippy lints #1885 (HaoYang670)
- Add support for
~/.datafusionrc
and cli option for overriding it to datafusion-cli #1875 (matthewmturner) - [Minor] Clean up DecimalArray API Usage #1869 [sql] (alamb)
- Changes after went through "Datafusion as a library section" #1868 (nonontb)
- Enhance MemorySchemaProvider to support
register_listing_table
#1863 (matthewmturner) - Increase default partition column type from Dict(UInt8) to Dict(UInt16) #1860 (Igosuki)
- Update to arrow 9.1.0 #1851 (alamb)
- move some tests out of context and into sql #1846 (alamb)
- [split/14] create
datafusion-physical-expr
module #1843 (Jimexist) - Return
Error
when parquet reader fails rather than no data withprintln!
#1837 (alamb) - determine build side in hash join by
total_byte_size
instead ofnum_rows
#1831 (xudong963) - Make ballista support an optional feature to datafusion-cli #1816 (alamb)
- Update documentation example for change in API #1812 (alamb)
- rename references of expr in physical plan module after datafusion-expr split #1798 (Jimexist)
- DataFusion + Conbench Integration #1791 (dianaclarke)
- The returned path value of get_by_uri should be self-described with entire path #1779 (yahoNanJing)
- Use
eq_dyn
,neq_dyn
,lt_dyn
,lt_eq_dyn
,gt_dyn
,gt_eq_dyn
kernels from arrow #1475 (alamb)
7.1.0 (2022-04-10)
Fixed bugs:
- By default, use only 1000 rows to infer the schema #2159
7.0.0 (2022-02-14)
Breaking changes:
- Consolidate various configurations options, remove unrelated
batch_size
#1565 - Extract logical plans in LogicalPlan as independent struct #1228
- Update
ExecutionPlan
to know about sortedness and repartitioning optimizer pass respect the invariants #1776 (alamb) - Update to
arrow 8.0.0
#1673 (alamb) - Remove non idiomatic
DataFusionError::into_arrow_external_error
in favor of From conversion #1645 (alamb) - Remove
Accumulator::update
andAccumulator::merge
#1582 (Jimexist) - implement
Hash
for various types and replacePartialOrd
#1580 (Jimexist) - Replace
DatafusionError
withGenericError
inObjectStore
interface #1541 (matthewmturner) - Make
FLOAT
SQL type map toFloat32
rather thanFloat64
#1423 [sql] (liukun4515) - Map
REAL
SQL type toFloat32
rather thanFloat64
to be consistent with pg #1390 [sql] (hntd187)
Implemented enhancements:
- Create new
datafusion_expr
crate #1753 - Create new
datafusion_common
crate #1752 - API to get Expr's type and nullability without a
DFSchema
#1725 - Cleaner API to create
Expr::ScalarFunction
programatically #1718 - Introduce a
Vec<u8>
based row-wise representation for DataFusion #1708 - Simplify creating new
ListingTable
#1705 - Implement TableProvider for DataFrameImpl to allow registration of logical plans #1698
- Public Expr simplification API #1694
- Query Optimizer: Add OUTER --> INNER join conversion #1670
- Support reading from CSV, Avro and Json files that have mergeable/compatible, but not identical schemas #1669
- Remove
DataFusionError::into_arrow_external_error
in favor ofFrom
conversion #1644 - Include join type in display implementation for logical plan #1620
- Switch datafusion to using
eq_dyn_scalar
, etc kernels #1610 - Proposal: Remove
Accumulator::update
andAccumulator::merge
#1549 - Replace DataFusionError/Result with impl Error for ObjectStore and Reader #1540
- Add
approx_quantile
support #1538 - support sorting decimal data type #1522
- Keep all datafusion's packages up to date with Dependabot #1472
- ExecutionContext support init ExecutionContextState with
new(state: Arc<Mutex<ExecutionContextState>>)
method #1439 - support the decimal scalar value #1393
- Documentation for using scalar functions with the the DataFrame API #1364
- Support
boolean == boolean
andboolean != boolean
operators #1159 - Support DataType::Decimal(15, 2) in TPC-H benchmark #174
- Make
MemoryStream
public #150 - Add support for Parquet schema merging #132
- Add SQL support for IN expression #118
- Add logging to datafusion-cli #1789 (alamb)
- Add
approx_median()
aggregate function #1729 (realno) - Add join type for logical plan display #1674 [sql] (xudong963)
- Fix null comparison for Parquet pruning predicate #1595 (viirya)
- Add
corr
aggregate function #1561 (realno) - Add
covar
,covar_pop
andcovar_samp
aggregate functions #1551 (realno) - Add
approx_quantile()
aggregation function #1539 (domodwyer) - Initial MemoryManager and DiskManager APIs for query execution + External Sort implementation #1526 (yjshen)
- Add
stddev
andvariance
#1525 (realno) - Add
rem
operation for Expr #1467 (liukun4515) - support decimal data type in create table #1431 [sql] (liukun4515)
- Ordering by index in select expression #1419 [sql] (hntd187)
- Add support for
ORDER BY
on unprojected columns #1415 (viirya) - Support decimal for
min
andmax
aggregate #1407 (liukun4515) - Consolidate
ConstantFolding
andSimplifyExpression
#1375 (alamb) - Datafusion cli quiet mode command to contain option bool #1345 (Jimexist)
- Implement
array_agg
aggregate function #1300 (viirya) - Add a command to switch output format in cli #1284 (capkurmagati)
- Support
=
,<
,<=
,>
,>=
,!=
,is distinct from
,is not distinct from
forBooleanArray
#1163 (alamb)
Fixed bugs:
- Unsupported data type in hasher: Timestamp(Second, None) #1768
- SQL column identifiers should be converted to lowercase when unquoted #1746
- Data type Dictionary(Int32, Utf8) not supported for binary operation 'eq' on dyn arrays #1605
- datafusion doesn't process predicate pushdown correctly when there is outer join #1586
- casting
Int64
toFloat64
unsuccessfully caused tpch8 to fail #1576 - CTE/WITH .. UNION ALL confuses name resolution in WHERE #1509
- ORDER BY min(x) results in error
Plan("No field named 'foo.x'. Valid fields are 'MIN(foo.x)'.")
#1479 - Sort discards field metadata on the output schema #1476
- Datafusion should not strip out timezone information from existing types #1454
- Error on some queries: "column types must match schema types, expected XXX but found YYY" #1447
- Query failing to return any results when filter is an equality check on strings (bad statistics in parquet) #1433
- Field names containing period such as
f.c1
cannot be named in SQL query #1432 Select *
returns an unexpected result #1412- Turn off unused default features of chrono and ahash #1398
- real data type is float32 in PG database, but in the datafusion it is as float64 #1380
- TPC-H q10 performance regression (expression for filter with added alias is not pushed down) #1367
- ProjectionExec Loses Field Metadata #1361
- Support Filter on unprojected columns #1351
- NULLS ORDER is inconsistent with postgres #1343
- Fix bug while merging
RecordBatch
, addSortPreservingMerge
fuzz tester #1678 (alamb) - fix a cte block with same name for many times #1639 [sql] (xudong963)
- fix: casting Int64 to Float64 unsuccessfully caused tpch8 to fail #1601 (xudong963)
- Fix single_distinct_to_groupby for arbitrary expressions #1519 (james727)
- Fix SortExec discards field metadata on the output schema #1477 (alamb)
- fix calculate in many_to_many_hash_partition test. #1463 (Ted-Jiang)
- Add Timezone to Scalar::Time* types, and better timezone awareness to Datafusion's time types #1455 (maxburke)
- Support identifiers with
.
in them #1449 [sql] (alamb) - Fixes for working with functions in dataframes, additional documentation #1430 (tobyhede)
- [Minor] Fix
send_time
metric for hash-repartition #1421 (Dandandan) - fix: Select * returns an unexpected result #1413 [sql] (xudong963)
- Make cli handle multiple whitespaces #1388 (capkurmagati)
- Metadata is kept in projections for non-derived columns #1378 (hntd187)
- Fix Predicate Pushdown: split_members should be able to split aliased predicate #1368 (viirya)
- Change the arg names and make parameters more meaningful #1357 (liukun4515)
- collect table stats by default for listing table #1347 (houqp)
- fix: make nulls-order consistent with postgres #1344 [sql] (xudong963)
- Avoid changing expression names during constant folding #1319 (viirya)
- improve error message for invalid create table statement #1294 [sql] (houqp)
- Forbid creating the table with the same name #1288 (liukun4515)
Documentation updates:
- Clarify docs about
Accumulator::update
andAccumulator::update_batch
#1542 (alamb) - Fix duplicated
cargo run --example parquet_sql
#1482 (sergey-melnychuk) - add documentation to Datafusion cli's new commands #1348 (liukun4515)
- fix some clippy warnings from nightly channel #1277 [sql] (Jimexist)
Performance improvements:
- Parquet pruning predicate for
IS NULL
#1591 - Fix predicate pushdown for outer joins #1618 (james727)
- fix: sql planner creates cross join instead of inner join from select predicates #1566 [sql] (xudong963)
- Split fetch_metadata into fetch_statistics and fetch_schema #1365 (Dandandan)
- Optimize the performance queries with a single distinct aggregate #1315 (ic4y)
- Left join could use bitmap for left join instead of Vec<bool> #1291 (boazberman)
Closed issues:
- Add
release compile
to CI #1728 - DiskManager and TempFiles getting created several times per query #1690
- Add a test for the
pyarrow
feature in CI #1635 - SQL tests for when sorting exceeded available memory and had to spill to disk #1573
- Consolidate the N-way merging code and
SortPreservingMergeStream
(which has quite good tests of what is often quite tricky code, and it will be performance critical) #1572 - Consolidate the
SortExec
code (so there is only a single sort operator that does in memory sorting if it has enough memory budget but then spills to disk if needed). #1571 - Track memory usage in Non Limited Operators #1569
- [Question] Why does ballista store tables in the client instead of in the SchedulerServer #1473
- Consolidate Projection for Schema and RecordBatch #1425
- Support Sort on unprojected columns #1372
- Unused code in hash_aggregate #1362
- Why use the expr types before coercion to get the result type? #1358
- A problem about the projection_push_down optimizer gathers valid columns #1312
- apply constant folding to
LogicalPlan::Values
#1170 - reduce usage of
IntoIterator<Item = Expr>
in logical plan builder window fn #372 - Why does DataFusion throw a Tokio 0.2 runtime error? #176
- TPC-H Query 14 #165
- Length kernel returns bytes not character length #156
- Split the logical operators out into separate source files #115
Merged pull requests:
- Fixup some doc warnings #1811 (alamb)
- Ensure most of links in docs are correct #1808 [sql] (HaoYang670)
- Update CHANGELOG.md, update release scripts #1807 (alamb)
- Update versions for split crates #1803 (matthewmturner)
- Improve the error message and UX of tpch benchmark program #1800 (alamb)
- rename references of expr in logical plan module after datafusion-expr split #1797 (Jimexist)
- Update to sqlparser 0.14 #1796 [sql] (alamb)
- [split/13] move rest of expr to expr_fn in datafusion-expr module #1794 (Jimexist)
- Update datafusion versions #1793 (matthewmturner)
- Less verbose plans in debug logging #1787 (alamb)
- [split/11] split expr type and null info to be expr-schemable #1784 (Jimexist)
- Introduce
Row
format backed by raw bytes #1782 (yjshen) - rewrite predicates before pushing to union inputs #1781 (korowa)
- Update datafusion to use arrow 9.0.0 #1775 (alamb)
- [split/10] split up expr for rewriting, visiting, and simplification traits #1774 [sql] (Jimexist)
- #1768 Support TimeUnit::Second in hasher #1769 (jychen7)
- TPC-H benchmark can optionally write JSON output file with benchmark summary #1766 (andygrove)
- [split/8] move
Accumulator
andColumnarValue
to datafusion-expr #1765 (Jimexist) - [split/7] move built-in scalar function to datafusion-expr #1764 (Jimexist)
- [split/6] move signature, type signature, volatility to datafusion-expr #1763 (Jimexist)
- [split/9+12] move udf, udaf,
Expr
to datafusion-expr module #1762 [sql] (Jimexist) - [split/5] move window frame and operator to datafusion-expr module #1761 (Jimexist)
- [split/4] move scalar value to datafusion-common #1760 (Jimexist)
- [split/3] split datafusion expr module and move aggregate and window function expr #1759 (Jimexist)
- [split/2] move column and dfschema to datafusion-common module #1758 (Jimexist)
- Use ordered-float 2.10 #1756 (andygrove)
- [split/1] split datafusion-common module #1751 (Jimexist)
- use clap 3 style args parsing for datafusion cli #1749 (Jimexist)
- fix: Case insensitive unquoted identifiers in SQL #1747 [sql] (mkmik)
- Move more tests out of context.rs #1743 (alamb)
- Move optimize test out of context.rs #1742 (alamb)
- Fix typos in crate documentation #1739 (r4ntix)
- add
cargo check --release
to ci #1737 (xudong963) - Update parking_lot requirement from 0.11 to 0.12 #1735 (dependabot[bot])
- Create built-in scalar functions programmatically #1734 (HaoYang670)
- Prevent repartitioning of certain operator's direct children (#1731) #1732 (tustvold)
- API to get Expr's type and nullability without a
DFSchema
#1726 (alamb) - minor: fix
cargo run --release
error #1723 (xudong963) - substitute
parking_lot::Mutex
forstd::sync::Mutex
#1720 (xudong963) - Convert boolean case expressions to boolean logic #1719 (tustvold)
- Add Expression Simplification API #1717 (alamb)
- Create ListingTableConfig which includes file format and schema inference #1715 (matthewmturner)
- make
select_to_plan
clearer #1714 [sql] (xudong963) - Add upper bound for public function
signature
#1713 (HaoYang670) - Add tests and CI for optional pyarrow module #1711 (wjones127)
- Create SchemaAdapter trait to map table schema to file schemas #1709 (thinkharderdev)
- refine test in repartition.rs & coalesce_batches.rs #1707 (xudong963)
- Fuzz test for spillable sort #1706 (yjshen)
- Support
create_physical_expr
andExecutionContextState
orDefaultPhysicalPlanner
for faster speed #1700 (alamb) - Implement TableProvider for DataFrameImpl #1699 (cpcloud)
- Move timestamp related tests out of context.rs and into sql integration test #1696 (alamb)
- Lazy TempDir creation in DiskManager #1695 (alamb)
- Add
MemTrackingMetrics
to ease memory tracking for non-limited memory consumers #1691 (yjshen) - (minor) Reduce memory manager and disk manager logs from
info!
todebug!
#1689 (alamb) - Make
SortPreservingMergeStream
stable on input stream order #1687 (alamb) - Incorporate dyn scalar kernels #1685 (matthewmturner)
- Move
information_schema
tests out of execution/context.rs tosql_integration
tests #1684 (alamb) - Add a new metric type:
Gauge
+CurrentMemoryUsage
to metrics #1682 (yjshen) - refactor array_agg to not to have
update
andmerge
#1681 (Jimexist) - Use NamedTempFile rather than
String
in DiskManager #1680 (alamb) - upgrade clap to version 3 #1672 (Jimexist)
- Improve configuration and resource use of
MemoryManager
andDiskManager
#1668 (alamb) - feat: Support quarter granularity in date_trunc function #1667 (ovr)
- Fix can not load parquet table form spark in datafusion-cli. #1665 (Ted-Jiang)
- Make
MemoryManager
andMemoryStream
public #1664 (yjshen) - [Cleanup] Move
AggregatedMetricsSet
tometrics
for further reuse #1663 (yjshen) - fix: substr - correct behaivour with negative start pos #1660 (ovr)
- suppport bitwise and as an example #1653 [sql] (liukun4515)
- refine match pattern related code #1650 (xudong963)
- update md-5, sha2, blake2 #1647 (xudong963)
- Add
DataFusionError
->ArrowError
conversion #1643 (alamb) - Add
spill_count
andspilled_bytes
toBaselineMetrics
, test sort with spill #1641 (yjshen) - support hash decimal array and group by #1640 (liukun4515)
- Consolidate Schema and RecordBatch projection #1638 (alamb)
- Update hashbrown requirement from 0.11 to 0.12 #1631 (dependabot[bot])
- Update pyo3 requirement from 0.14 to 0.15 #1627 (dependabot[bot])
- Optimize
SortPreservingMergeStream
to avoidSortKeyCursor
sharing #1624 (yjshen) - Handle merging of evolved schemas in ParquetExec #1622 (thinkharderdev)
- feat: Support Substring(str [from int] [for int]) #1621 [sql] (ovr)
- feat: Support complex interval via IntervalMonthDayNano #1615 [sql] (ovr)
- consolidate binary_expr coercion rule code into
binary_rule.rs
module #1607 (alamb) - Fix comparison of dictionary arrays #1606 (alamb)
- add test for decimal to decimal #1603 (liukun4515)
- update nightly version #1597 (Jimexist)
- Consolidate sort and external_sort #1596 (yjshen)
- support from_slice for binary, string, and boolean array types #1589 (Jimexist)
- add from_slice trait to ease arrow2 migration #1588 (Jimexist)
- Implement ARRAY_AGG(DISTINCT ...) #1579 (james727)
- Rename sql integration tests from
mod
tosql_integration
#1575 (alamb) - minor: improve the benchmark readme #1567 (xudong963)
- Consolidate
batch_size
configuration inExecutionConfig
,RuntimeConfig
andPhysicalPlanConfig
#1562 (yjshen) - Update to rust 1.58 #1557 (xudong963)
- support mathematics operation for decimal data type #1554 (liukun4515)
- Address clippy warnings #1553 (sergey-melnychuk)
- enhance arithmetic operation for array with scalar #1552 (liukun4515)
- Remove unused
update
andmerge
implementations from Aggregates and supportingScalarValue
arithmetic #1550 (alamb) - Add batch operations to stddev #1547 (realno)
- Mark ARRAY_AGG(DISTINCT ...) not implemented #1534 (james727)
- Update to arrow-7.0.0 #1523 (alamb)
- Fix ORDER BY on aggregate #1506 (viirya)
- Add example on how to query multiple parquet files #1497 (nitisht)
- Refactor testing modules #1491 (hntd187)
- add rfcs for datafusion #1490 (xudong963)
- support comparison for decimal data type and refactor the binary coercion rule #1483 (liukun4515)
- Minor: Rename
predicate_builder
-->pruning_predicate
for consistency #1481 (alamb) - Tests for support try_cast/cast decimal to numeric #1465 (liukun4515)
- Avoid send empty batches for Hash partitioning. #1459 (Ted-Jiang)
- Planner code cleanup #1450 [sql] (alamb)
- Fix bug in projection: "column types must match schema types, expected XXX but found YYY" #1448 (alamb)
- Update arrow-rs to 6.4.0 and replace boolean comparison in datafusion with arrow compute kernel #1446 (xudong963)
- support cast/try_cast for decimal: signed numeric to decimal #1442 (liukun4515)
- Consolidate decimal error checking and improve error messages #1438 [sql] (alamb)
- use 0.13 sql parser #1435 (Jimexist)
- Minor Code cleanups #1428 (alamb)
- Clarify communication on bi-weekly sync #1427 (alamb)
- support sum/avg agg for decimal, change sum(float32) --> float64 #1408 [sql] (liukun4515)
- Fix bugs with nullability during rewrites: Combine
simplify
andSimplifier
#1401 (alamb) - Minimize features #1399 (carols10cents)
- Update rust vesion to 1.57 #1395 [sql] (xudong963)
- support decimal scalar value #1394 (liukun4515)
- Add coercion rules for AggregateFunctions #1387 (liukun4515)
- upgrade the arrow-rs version #1385 (liukun4515)
- add array agg name #1382 (liukun4515)
- Make tests for
simplify
andSimplifer
consistent #1376 (alamb) - Refactor: Consolidate expression simplification code in
simplify_expression.rs
#1374 (alamb) - remove unused code in hash_aggregate #1370 (ic4y)
- Use
BufReader
for LocalFileReader to revert performance regression in parquet reading #1366 (Dandandan) - Add unit test for constant folding on values #1355 (viirya)
- Extract logical plan: rename the plan name (follow up) #1354 [sql] (liukun4515)
- Moved aggr_test_schema to test_utils #1338 (rdettai)
- upgrade arrow-rs to 6.2.0 #1334 (liukun4515)
- Update release instructions #1331 (alamb)
- #1268: allow datafusion-cli to toggle quiet flag within CLI #1330 (jgoday)
- Extract Aggregate, Sort, and Join to struct from AggregatePlan #1326 (matthewmturner)
- Extract
EmptyRelation
,Limit
,Values
fromLogicalPlan
#1325 (liukun4515) - Extract CrossJoin, Repartition, Union in LogicalPlan #1322 (liukun4515)
- Fifth batch of updating sql tests to use assert_batches_eq #1318 (matthewmturner)
- Extract Explain, Analyze, Extension in LogicalPlan as independent struct #1317 [sql] (xudong963)
- Extract CreateMemoryTable, DropTable, CreateExternalTable in LogicalPlan as independent struct #1311 [sql] (liukun4515)
- Extract Projection, Filter, Window in LogicalPlan as independent struct #1309 (ic4y)
- Add PSQL comparison tests for except, intersect #1292 (mrob95)
- Extract logical plans in LogicalPlan as independent struct: TableScan #1290 (xudong963)
- Add statement helper command to cli #1285 (matthewmturner)
- Python bindings for window functions #819 [sql] (jgoday)
6.0.0 (2021-11-13)
Breaking changes:
- Removed deprecated with_concurrency #1200 (rdettai)
- File partitioning for ListingTable #1141 (rdettai)
- Add function volatility to Signature #1071 [sql] (pjmore)
- fix: allow duplicate field names in table join, fix output with duplicated names #1023 (houqp)
- Make TableProvider.scan() and PhysicalPlanner::create_physical_plan() async #1013 (rdettai)
- Reorganize table providers by table format #1010 (rdettai)
- Make Metrics::labels() public #999 (alamb)
- Rename NthValue::{first_value,last_value,nth_value} to satisfy clippy in Rust 1.55 #986 (alamb)
- Move CBOs and Statistics to physical plan #965 (rdettai)
- Update to sqlparser v 0.10.0 #934 [sql] (alamb)
- FilePartition and PartitionedFile for scanning flexibility #932 [sql] (yjshen)
- Improve SQLMetric APIs, port existing metrics #908 (alamb)
- Add support for EXPLAIN ANALYZE #858 [sql] (alamb)
- Rename concurrency to target_partitions #706 (andygrove)
Implemented enhancements:
- Add booleans support to the
CASE
statement #1156 - Implement General Purpose Constant Folding with the Expression Evaluator #1070
- Mark volatility categories of functions #1069
- Add "show" support to DataFrame API #937
- Add support for TRIM BOTH/LEADING/TRAILING #935
- Add "baseline" metrics to all built in operators #866
- Add SQL support for referencing fields in structs #119
- add filename completer for create table statement #1278 (Jimexist)
- Add drop table support #1266 [sql] (viirya)
- Dataframe supports except and update readme #1261 (xudong963)
- Implement EXCEPT & EXCEPT DISTINCT #1259 [sql] (xudong963)
- Add DataFrame support for
INTERSECT
and update readme #1258 (xudong963) - use arrow 6.1.0 #1255 (Jimexist)
- fix 1250, add editor support for datafusion cli with validation #1251 (Jimexist)
- Add support for
create table as
via MemTable #1243 [sql] (Dandandan) - Add cli show columns command to describe tables #1231 (Jimexist)
- datafusion-cli to add list table command #1229 (Jimexist)
- datafusion cli to handle EoF and interrupt signal #1225 (Jimexist)
- add \q as quit command and add ? for help #1224 (Jimexist)
- Add algebraic simplifications to constant_folding #1208 (matthewmturner)
- Improve GetIndexedFieldExpr adding utf8 key based access for struct v… #1204 [sql] (Igosuki)
- Fix
between
in select query #1202 [sql] (capkurmagati) - Move code to fold Stable functions like
now()
fromSimplifier
toConstEvaluator
#1176 (alamb) - DataFrame supports window function #1167 [sql] (xudong963)
- add values list expression #1165 [sql] (Jimexist)
- Add booleans support to the CASE statement #1161 (xudong963)
- Improve error messages when operations are not supported #1158 (alamb)
- Generic constant expression evaluation #1153 (alamb)
- python
lit
function to support bool and byte vec #1152 (Jimexist) - [nit] simplify datafusion optimizer module codes #1146 (panarch)
- Add ScalarValue support for arbitrary list elements #1142 (jonmmease)
- Multiple files per partitions for CSV Avro Json #1138 (rdettai)
- Implement INTERSECT & INTERSECT DISTINCT #1135 [sql] (xudong963)
- Simplify file struct abstractions #1120 (rdettai)
- Implement
is [not] distinct from
#1117 [sql] (Dandandan) - Clean up spawned task on drop for
RepartitionExec
,SortPreservingMergeExec
,WindowAggExec
#1112 (crepererum) - add hyperloglog implementation (
add
andcount
) #1095 (Jimexist) - Add ScalarValue::Struct variant #1091 (jonmmease)
- add digest(utf8, method) function and refactor all current hash digest functions #1090 (Jimexist)
- [crypto] add
blake3
algorithm todigest
function #1086 (Jimexist) - [crypto] add blake2b and blake2s functions #1081 (Jimexist)
- [nit] make schema qualifier error message in field lookup more readable #1079 (Jimexist)
- [window function] add
percent_rank
window function #1077 (Jimexist) - [window function] add
cume_dist
implementation #1076 (Jimexist) - Add a LogicalPlanBuilder::schema() function #1075 (alamb)
- Add support for UNION [DISTINCT] sql #1068 [sql] (xudong963)
- fix: fix joins on Float32/Float64 columns bug #1054 (francis-du)
- Update sqlparser-rs to 0.11 #1052 [sql] (alamb)
- Support querying CSV files without providing the schema #1050 [sql] (xudong963)
- remove hard coded partition count in ballista logicalplan deserialization #1044 (xudong963)
- feat: add lit_timestamp_nanosecond #1030 (NGA-TRAN)
- Ignore metadata on schema merge #1024 (Smurphy000)
- add ExecutionConfig.with_optimizer_rules #1022 (seddonm1)
- Add baseline execution stats to
WindowAggExec
andUnionExec
, and fixupCoalescePartitionsExec
#1018 (alamb) - Derive PartialOrd for Expr #1015 (alamb)
- Indexed field access for List #1006 [sql] (Igosuki)
- Add metrics for Limit and Projection, and CoalesceBatches #1004 (alamb)
- Update DataFusion to arrow 6.0 #984 (alamb)
- Implement Display for Expr, improve operator display #971 [sql] (matthewmturner)
- Add metrics for FilterExec #960 (alamb)
- Change compound column field name rules #952 (waynexia)
- ObjectStore API to read from remote storage systems #950 (yjshen)
- Add baseline metrics to
SortPreservingMergeExec
#948 (alamb) - Add support for TRIM LEADING/TRAILING/BOTH syntax #947 [sql] (adsharma)
- fixes #933 replace placeholder fmt_as fr ExecutionPlan impls #939 (tiphaineruy)
- Add metrics for SortExect + HashAggregateExec #938 (alamb)
- Add some additional asserts in
utils::from_plan
#930 (alamb) - Avro Table Provider #910 [sql] (Igosuki)
- Add BaselineMetrics, Timestamp metrics, add for
CoalescePartitionsExec
, rename output_time -> elapsed_compute #909 (alamb) - add cross join support to ballista #891 (houqp)
- Add Ballista support to DataFusion CLI #889 (andygrove)
- support like on DictionaryArray #876 (b41sh)
- Register table based on known schema without file IO #872 (Dandandan)
- Add support for PostgreSQL regex match #870 [sql] (b41sh)
- Include planning time in datafusion-cli printing #860 (Dandandan)
- Implement basic common subexpression eliminate optimization #792 (waynexia)
- Impl
ops::Not
forexpr
#763 (Jimexist)
Fixed bugs:
- Can not use
between
in the select list: #1196 - ORDER BY does not work with literals: Sort operation is not applicable to scalar value 'foo' #1195
- window functions with NULL literals in
partition by
andorder by
do not work: Internal("Sort operation is not applicable to scalar value NULL") #1194 - Operation name not included in internal errors -- Internal("Data type Boolean not supported for binary operation on dyn arrays") #1157
- Physical plan explain UNION query says "ExecutionPlan(PlaceHolder)" #933
- Can not use LIKE on DictionaryArray encoded strings #815
- physical_plan::repartition::tests::repartition_with_dropping_output_stream failing locally #614
- Fix some
BuiltinScalarFunction
panics with zero arguments #1249 (capkurmagati) - fix: not do boolean folding on NULL and/or expr #1245 (NGA-TRAN)
- ignore case of
with header row
in sql when creating external table #1237 [sql] (lichuan6) - fix: Min/Max aggregation data type should not be dictionary #1235 (NGA-TRAN)
- Fix build with
--no-default-features
#1219 (alamb) - Prevent "future cannot be sent between threads safely" compilation error #1155 (jonmmease)
- Clean up spawned task on drop for
AnalyzeExec
,CoalescePartitionsExec
,HashAggregateExec
#1121 (crepererum) - Clean up spawned task on
SortStream
drop #1105 (crepererum) - fix UNION ALL bug: thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', ./src/datatypes/schema.rs:165:10 #1088 (xudong963)
- python: fix generated table name in dataframe creation #1078 (houqp)
- fix subquery alias #1067 [sql] (xudong963)
- fix pattern handling in regexp_match function #1065 (houqp)
- fix: joins on Timestamp columns #1055 (francis-du)
- Fix metric name typo #943 (alamb)
- EXPLAIN ANALYZE should run all Optimizer passes #929 (alamb)
Documentation updates:
- update docs to fix DataFusion User Guide link #1238 (jiangzhx)
- [docs] datafusion cli run via homebrew #1198 (Jimexist)
- add support for unary and binary values in values list, update docs #1172 [sql] (Jimexist)
- Add additional docstring comments to
from_plan
#1168 (alamb) - [nit] fix document issue for
approx_distinct
#1110 (Jimexist) - implement
approx_distinct
function using HyperLogLog #1087 (Jimexist) - Remove unused
use
statements from examples #1032 (alamb) - consolidate datafusion docs with sphinx #993 (houqp)
- Updated user-guide library docs with optimized config #976 (matthewmturner)
- Improve User Guide #954 (andygrove)
- [MINOR] Fix typos in doc comments #945 (alamb)
- [DataFusion] - Add show and show_limit function for DataFrame #923 (francis-du)
- Typo fix in DataFusion crate documentation #914 (antoinewdg)
Performance improvements:
- Improve avro reader performance by avoiding some cloning on avro_rs::Value #1206 (Igosuki)
- optimize build profile for datafusion python binding, cli and ballista #1137 (houqp)
- Avoid stack overflow by reducing stack usage of
BinaryExpr::evaluate
in debug builds #1047 (alamb) - Add ScalarValue::eq_array optimized comparison function #844 (alamb)
- Rework GroupByHash to for faster performance and support grouping by nulls #808 (alamb)
Closed issues:
- InList expr with NULL literals do not work #1190
- update the homepage README to include values,
approx_distinct
, etc. #1171 - [Python]: Inconsistencies with Python package name #1011
- Wanting to contribute to project where to start? #983
- delete redundant code #973
- How to build DataFusion python wheel #853
- Add support for partition pruning #204
- [Datafusion] Support joins on TimestampMillisecond columns #187
- TPC-H Query 21 #173
- TPC-H Query 13 #164
- TPC-H Query 8 #162
- implement split_part(string, delimiter, position) #157
- Join Statement: Schema contains duplicate unqualified field name #155
- ParquetTable should avoid scanning all files twice #136
- Add support for reading partitioned Parquet files #133
- Add support for Parquet schema merging #132
- Catalog abstraction #126
- Optimizer rules should work with qualified column names #125
- Add optional qualifier to Expr::Column #121
- Implement modulus expression #99
- [Rust] Add constant folding to expressions during logically planning #98
- [Rust] Implement pretty print for physical query plan #93
- Can not group by boolean columns (add boolean to valid keys of groupBy) #91
- improve performance of building literal arrays #90
- [rust][datafusion] optimize count(*) queries on parquet sources #89
- Produce a design for a metrics framework #21
Merged pull requests:
- Add timezome string to stablize test #1265 (viirya)
- numerical_coercion pattern match optimize #1256 (Jimexist)
- fix and update window function sql tests #1059 (Jimexist)
- reduce ScalarValue from trait boilerplate with macro #989 (houqp)
For older versions, see apache/arrow/CHANGELOG.md
5.0.0 (2021-08-10)
Breaking changes:
- Box ScalarValue:Lists, reduce size by half size #788 (alamb)
- JOIN conditions are order dependent #778 (seddonm1)
- Show the result of all optimizer passes in EXPLAIN VERBOSE #759 (alamb)
- #723 Datafusion add option in ExecutionConfig to enable/disable parquet pruning #749 (lvheyang)
- Update API for extension planning to include logical plan #643 (alamb)
- Rename MergeExec to CoalescePartitionsExec #635 (andygrove)
- fix 593, reduce cloning by taking ownership in logical planner's
from
fn #610 (Jimexist) - fix join column handling logic for
On
andUsing
constraints #605 (houqp) - Rewrite pruning logic in terms of PruningStatistics using Array trait (option 2) #426 (alamb)
- Support reading from NdJson formatted data sources #404 (heymind)
- Add metrics to RepartitionExec #398 (andygrove)
- Use 4.x arrow-rs from crates.io rather than git sha #395 (alamb)
- Return Vec<bool> from PredicateBuilder rather than an
Fn
#370 (alamb) - Refactor: move RowGroupPredicateBuilder into its own module, rename to PruningPredicateBuilder #365 (alamb)
- [Datafusion] NOW() function support #288 (msathis)
- Implement select distinct #262 (Dandandan)
- Refactor datafusion/src/physical_plan/common.rs build_file_list to take less param and reuse code #253 (Jimexist)
- Support qualified columns in queries #55 (houqp)
- Read CSV format text from stdin or memory #54 (heymind)
- Use atomics for SQLMetric implementation, remove unused name field #25 (returnString)
Implemented enhancements:
- Allow extension nodes to correctly plan physical expressions with relations #642
- Filters aren't passed down to table scans in a union #557
- Support pruning for
boolean
columns #490 - Implement SQLMetrics for RepartitionExec #397
- DataFusion benchmarks should show executed plan with metrics after query completes #396
- Use published versions of arrow rather than github shas #393
- Add Compare to GroupByScalar #364
- Reusable "row group pruning" logic #363
- Add an Order Preserving merge operator #362
- Implement Postgres compatible
now()
function #251 - COUNT DISTINCT does not support dictionary types #249
- Use standard make_null_array for CASE #222
- Implement date_trunc() function #203
- COUNT DISTINCT does not support for
Float64
#199 - Update SQLMetric to use atomics rather than a Mutex #30
- Implement PartialOrd for ScalarValue #838 (viirya)
- Support date datatypes in max/min #820 (viirya)
- Implement vectorized hashing for DictionaryArray types #812 (alamb)
- Convert unsupported conditions in left right join to filters #796 [sql] (Dandandan)
- Implement streaming versions of Dataframe.collect methods #789 (andygrove)
- impl from str for column and scalar #762 (Jimexist)
- impl fmt::Display for PlanType #752 (Jimexist)
- Remove unnecessary projection in logical plan optimization phase #747 (waynexia)
- Support table columns alias #735 (Dandandan)
- Derive PartialEq for datasource enums #734 (alamb)
- Allow filetype to be lowercase, Implement FromStr for FileType #728 (Jimexist)
- Update to use arrow 5.0 #721 (alamb)
- #554: Lead/lag window function with offset and default value arguments #687 (jgoday)
- dedup using join column in wildcard expansion #678 (houqp)
- Implement metrics for HashJoinExec #664 (andygrove)
- Show physical plan with metrics in benchmark #662 (andygrove)
- Allow non-equijoin filters in join condition #660 (Dandandan)
- Add End-to-end test for parquet pruning + metrics for ParquetExec #657 (alamb)
- Add support for leading field in interval #647 (Dandandan)
- Remove hard-coded PartitionMode from Ballista serde #637 (andygrove)
- Ballista: Implement scalable distributed joins #634 (andygrove)
- implement rank and dense_rank function and refactor built-in window function evaluation #631 (Jimexist)
- Improve "field not found" error messages #625 (andygrove)
- Support modulus op #577 (gangliao)
- implement
std::default::Default
for execution config #570 (Jimexist) to_timestamp_millis()
,to_timestamp_micros()
,to_timestamp_seconds()
#567 (velvia)- Filter push down for Union #559 (Dandandan)
- Implement window functions with
partition_by
clause #558 (Jimexist) - support table alias in join clause #547 (houqp)
- Not equal predicate in physical_planning pruning #544 (jgoday)
- add error handling and boundary checking for window frames #530 (Jimexist)
- Implement window functions with
order_by
clause #520 (Jimexist) - support group by column positions #519 [sql] (jychen7)
- Implement constant folding for CAST #513 (msathis)
- Add window frame constructs - alternative #506 (Jimexist)
- Add
partition by
constructs in window functions and modify logical planning #501 (Jimexist) - Add support for boolean columns in pruning logic #500 (alamb)
- #215 resolve aliases for group by exprs #485 (jychen7)
- Support anti join #482 (Dandandan)
- Support semi join #470 (Dandandan)
- add
order by
construct in window function and logical plans #463 (Jimexist) - Remove reundant filters (e.g. c> 5 AND c>5 --> c>5) #436 (jgoday)
- fix: display the content of debug explain #434 (NGA-TRAN)
- implement lead and lag built-in window function #429 (Jimexist)
- add support for ndjson for datafusion-cli #427 (Jimexist)
- add
first_value
,last_value
, andnth_value
built-in window functions #403 (Jimexist) - export both
now
andrandom
functions #389 (Jimexist) - Function to create
ArrayRef
from an iterator of ScalarValues #381 (alamb) - Sort preserving merge (#362) #379 (tustvold)
- Add support for multiple partitions with SortExec (#362) #378 (tustvold)
- add window expression stream, delegated window aggregation to aggregate functions, and implement
row_number
#375 (Jimexist) - Add PartialOrd and Ord to GroupByScalar (#364) #368 (tustvold)
- Implement readable explain plans for physical plans #337 (alamb)
- Add window expression part 1 - logical and physical planning, structure, to/from proto, and explain, for empty over clause only #334 (Jimexist)
- Use NullArray to Pass row count to ScalarFunctions that take 0 arguments #328 (Jimexist)
- add --quiet/-q flag and allow timing info to be turned on/off #323 (Jimexist)
- Implement hash partitioned aggregation #320 (Dandandan)
- Support COUNT(DISTINCT timestamps) #319 (charlibot)
- add random SQL function #303 (Jimexist)
- allow datafusion cli to take -- comments #296 (Jimexist)
- Add json print format mode to datafusion cli #295 (Jimexist)
- Add print format param with support for tsv print format to datafusion cli #292 (Jimexist)
- Add print format param and support for csv print format to datafusion cli #289 (Jimexist)
- allow datafusion-cli to take a file param #285 (Jimexist)
- add param validation for datafusion-cli #284 (Jimexist)
- [breaking change] fix 265, log should be log10, and add ln #271 (Jimexist)
- Implement count distinct for dictionary arrays #256 (alamb)
- Count distinct floats #252 (pjmore)
- Add rule to eliminate
LIMIT 0
and replace it with anEmptyRelation
#213 (Dandandan) - Allow table providers to indicate their type for catalog metadata #205 (returnString)
- Use arrow eq kernels in CaseWhen expression evaluation #52 (Dandandan)
- Re-export Arrow and Parquet crates from DataFusion #39 (returnString)
- [DataFusion] Optimize hash join inner workings, null handling fix #24 (Dandandan)
- [ARROW-12441] [DataFusion] Cross join implementation #11 (Dandandan)
Fixed bugs:
- Projection pushdown removes unqualified column names even when they are used #617
- Panic while running join datatypes/schema.rs:165:10 #601
- Indentation is incorrect for joins in formatted physical plans #345
- Error while running
COUNT DISTINCT (timestamp)
: 'Unexpected DataType for list #314 - When joining two tables, get Error: Plan("Schema contains duplicate unqualified field name 'xxx'") #311
- Incorrect answers with SELECT DISTINCT queries #250
- Intermitent failure in CI join_with_hash_collision #227
Concat
from Dataframe API no longer accepts multiple expressions #226- Fix right, full join handling when having multiple non-matching rows at the left side #845 (Dandandan)
- Qualified field resolution too strict #810 [sql] (seddonm1)
- Better join order resolution logic #797 [sql] (seddonm1)
- Produce correct answers for Group BY NULL (Option 1) #793 (alamb)
- Use consistent version of string_to_timestamp_nanos in DataFusion #767 (alamb)
- #723 limit pruning rule to simple expression #764 (lvheyang)
- #699 fix return type conflict when calling builtin math fuctions #716 (lvheyang)
- Fix Date32 and Date64 parquet row group pruning #690 (alamb)
- Remove qualifiers on pushed down predicates / Fix parquet pruning #689 (alamb)
- use
Weak
ptr to break catalog list <> info schema cyclic reference #681 (crepererum) - honor table name for csv/parquet scan in ballista plan serde #629 (houqp)
- fix 621, where unnamed window functions shall be differentiated by partition and order by clause #622 (Jimexist)
- RFC: Do not prune out unnecessary columns with unqualified references #619 (alamb)
- [fix] select * on empty table #613 (rdettai)
- fix 592, support alias in window functions #607 (Jimexist)
- RepartitionExec should not error if output has hung up #576 (alamb)
- Fix pruning on not equal predicate #561 (alamb)
- hash float arrays using primitive usigned integer type #556 (houqp)
- Return errors properly from RepartitionExec #521 (alamb)
- refactor sort exec stream and combine batches #515 (Jimexist)
- Fix display of execution time in datafusion-cli #514 (Dandandan)
- Wrong aggregation arguments error. #505 (jgoday)
- fix window aggregation with alias and add integration test case #454 (Jimexist)
- fix: don't duplicate existing filters #409 (e-dard)
- Fixed incorrect logical type in GroupByScalar. #391 (jorgecarleitao)
- Fix indented display for multi-child nodes #358 (alamb)
- Fix SQL planner to support multibyte column names #357 (agatan)
- Fix wrong projection 'optimization' #268 (Dandandan)
- Fix Left join implementation is incorrect for 0 or multiple batches on the right side #238 (Dandandan)
- Count distinct boolean #230 (pjmore)
- Fix Filter / where clause without column names is removed in optimization pass #225 (Dandandan)
Documentation updates:
- No way to get to the examples from docs.rs #186
- Update docs to use vendored version of arrow #772 (alamb)
- Fix typo in DEVELOPERS.md #692 (lvheyang)
- update stale documentations related to window functions #598 (Jimexist)
- update readme to reflect work on window functions #471 (Jimexist)
- Add examples section to datafusion crate doc #457 (mluts)
- add invariants spec #443 (houqp)
- add output field name rfc #422 (houqp)
- Update more docs and also the developer.md doc #414 (Jimexist)
- use prettier to format md files #367 (Jimexist)
- Add new logo svg with white background #313 (parthsarthy)
- Add projects (Squirtle and Tensorbase) to list in readme #312 (parthsarthy)
- docs - fix the ballista link #274 (haoxins)
- misc(README): Replace Cube.js with Cube Store #248 (ovr)
- Initial docs for SQL syntax #242 (Dandandan)
- Deduplicate README.md #79 (msathis)
Performance improvements:
- Speed up inlist for strings and primitives #813 (Dandandan)
- perf: improve performance of
SortPreservingMergeExec
operator #722 (e-dard) - Optimize min/max queries with table statistics #719 (b41sh)
- perf: Improve materialisation performance of SortPreservingMergeExec #691 (e-dard)
- Optimize count(*) with table statistics #620 (Dandandan)
- optimize window function's
find_ranges_in_range
#595 (Jimexist) - Collapse sort into window expr and do sort within logical phase #571 (Jimexist)
- Use repartition in window functions to speed up #569 (Jimexist)
- Constant fold / optimize
to_timestamp
function during planning #387 (msathis) - Speed up
create_batch_from_map
#339 (Dandandan) - Simplify math expression code (use unary kernel) #309 (Dandandan)
Closed issues:
- Confirm git tagging strategy for releases #770
- arrow::util::pretty::pretty_format_batches missing #769
- move the
assert_batches_eq!
macros to a non part of datafusion #745 - fix an issue where aliases are not respected in generating downstream schemas in window expr #592
- make the planner to print more succinct and useful information in window function explain clause #526
- move window frame module to be in
logical_plan
#517 - use a more rust idiomatic way of handling nth_value #448
- create a test with more than one partition for window functions #435
- COUNT DISTINCT does not support for
Boolean
#202 - Read CSV format text from stdin or memory #198
- Fix null handling hash join #195
- Allow TableProviders to indicate their type for the information schema #191
- Make DataFrame extensible #190
- TPC-H Query 19 #170
- TPC-H Query 7 #161
- Upgrade hashbrown to 0.10 #151
- Implement vectorized hashing for hash aggregate #149
- More efficient LEFT join implementation #143
- Implement vectorized hashing #142
- RFC Roadmap for 2021 (DataFusion) #140
- Implement hash partitioning #131
- Grouping by column position #110
- [Datafusion] GROUP BY with a high cardinality doesn't seem to finish #107
- [Rust] Add support for JSON data sources #103
- [Rust] Implement metrics framework #95
- Publically export Arrow crate from datafusion #36
- Implement hash-partitioned hash aggregate #27
- Consider using GitHub pages for DataFusion/Ballista documentation #18
- Update "repository" in Cargo.toml #16
Merged pull requests:
- Use
RawTable
API in hash join #827 (Dandandan) - Add test for window functions on dictionary #823 (alamb)
- Update dependencies: prost to 0.8 and tonic to 0.5 #818 (alamb)
- Move
hash_array
into hash_utils.rs #807 (alamb) - Remove GroupByScalar and use ScalarValue in preparation for supporting null values in GroupBy #786 (alamb)
- fix 226, make
concat
,concat_ws
, andrandom
work withPython
crate #761 (Jimexist) - Test for parquet pruning disabling #754 (alamb)
- Add explain verbose with limit push down #751 (Jimexist)
- Move assert_batches_eq! macros to test_utils.rs #746 (alamb)
- Show optimized physical and logical plans in EXPLAIN #744 (alamb)
- update
python
crate to support latest pyo3 syntax and gil sematics #741 (Jimexist) - update
python
crate dependencies #740 (Jimexist) - provide more details on required .parquet file extension error message #729 (Jimexist)
- split up windows functions into a dedicated module with separate files #724 (Jimexist)
- Use pytest in integration test #715 (Jimexist)
- replace once iter chain with array::IntoIter #704 (houqp)
- avoid iterator materialization in column index lookup #703 (houqp)
- Fix build with 1.52.1 #696 (alamb)
- Fix test output due to logical merge conflict #694 (alamb)
- add more integration tests #668 (Jimexist)
- Bump arrow and parquet versions to 4.4 #654 (toddtreece)
- Add query 15 to TPC-H queries #645 (Dandandan)
- Improve error message and comments #641 (alamb)
- add integration tests for rank, dense_rank, fix last_value evaluation with rank #638 (Jimexist)
- round trip TPCH queries in tests #630 (houqp)
- use Into<String> as argument type wherever applicable #615 (houqp)
- reuse alias map in aggregate logical planning and refactor position resolution #606 (Jimexist)
- fix clippy warnings #581 (Jimexist)
- Add benchmarks to window function queries #564 (Jimexist)
- reuse code for now function expr creation #548 (houqp)
- turn on clippy rule for needless borrow #545 (Jimexist)
- Refactor hash aggregates's planner building code #539 (Jimexist)
- Cleanup Repartition Exec code #538 (alamb)
- reuse datafusion physical planner in ballista building from protobuf #532 (Jimexist)
- remove redundant
into_iter()
calls #527 (Jimexist) - Fix 517 - move
window_frames
module tological_plan
#518 (Jimexist) - Refactor window aggregation, simplify batch processing logic #516 (Jimexist)
- Add datafusion::test_util, resolve test data paths without env vars #498 (mluts)
- Avoid warnings in tests when compiling without default features #489 (alamb)
- update cargo.toml in python crate and fix unit test due to hash joins #483 (Jimexist)
- use prettier check in CI #453 (Jimexist)
- Optimize
nth_value
, removefirst_value
,last_value
structs and use idiomatic rust style #452 (Jimexist) - Fixed typo / logical merge conflict #433 (jorgecarleitao)
- include test data and add aggregation tests in integration test #425 (Jimexist)
- Add some padding around the logo #411 (parthsarthy)
- Benchmark subcommand to distinguish between DataFusion and Ballista #402 (jgoday)
- refactor datafusion/
scalar_value
to use more macro and avoid dup code #392 (Jimexist) - Update TPC-H benchmark to show physical plan when debug mode is enabled #386 (andygrove)
- Update arrow dependencies again #341 (alamb)
- Update arrow-rs deps #317 (alamb)
- Update PR template by commenting out instructions #315 (alamb)
- fix clippy warning #286 (Jimexist)
- add integration test to compare datafusion-cli against psql #281 (Jimexist)
- Update arrow deps #269 (alamb)
- Use multi-stage build dockerfile in datafusion-cli and reduce image size from 2.16GB to 89.9MB #266 (Jimexist)
- Enable redundant_field_names clippy lint #261 (Dandandan)
- fix clippy lint #259 (alamb)
- Move datafusion-cli to new crate #231 (Dandandan)
- Make test join_with_hash_collision deterministic #229 (Dandandan)
- Update arrow-rs deps (to fix build due to flatbuffers update) #224 (alamb)
- Use standard make_null_array for CASE #223 (alamb)
- update arrow-rs deps to latest master #216 (alamb)
- MINOR: Remove empty rust dir #61 (andygrove)
* This Changelog was automatically generated by github_changelog_generator