Release v1.2.0 · oap-project/gazelle_plugin

Gazelle 1.2.0 is the 2nd release from the repository oap-project(https://github.com/oap-project/gazelle_plugin).
In this release, we have implemented 11 new features support, fix 11 performance issues & bugs, and merged total of 48 PRs.

Here is the major highlight in Gazelle 1.2.0:

1.25X speed up on TPC-DS 103 queries.
Add RDD Cache support
Add Spill & UDF Support
Implement native Column to Row optimization
Further enhances the stability of performance for many fall back cases.

Gazelle Plugin

Features

No.	Description
#394	Support ColumnarArrowEvalPython operator
#368	Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0
#375	Implement a series of datetime functions
#183	Add Date/Timestamp type support
#362	make arrow-unsafe allocator as the default
#343	configurable codegen opt level
#333	Arrow Data Source: CSV format support fix
#223	Add Parquet write support to Arrow data source
#320	Add build option to enable unsafe Arrow allocator
#337	UDF: Add test case for validating basic row-based udf
#326	Update Scala unit test to spark-3.1.1

Performance

No.	Description
#400	Optimize ColumnarToRow Operator in NSE.
#411	enable ccache on C++ code compiling

Bugs Fixed

No.	Description
#358	Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds
#481	JVM heap memory leak on memory leak tracker facilities
#436	Fix for Arrow Data Source test suite
#317	persistent memory cache issue
#382	Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc
#384	ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#370	Failed to get time zone: NoSuchElementException: None.get
#360	Cannot compile master branch.
#341	build failed on v2 with -Phadoop-3.2

PRs

No.	Description
#489	[NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)
#486	[NSE-475] restore coalescebatches operator before window
#482	[NSE-481] JVM heap memory leak on memory leak tracker facilities
#470	[NSE-469] Lazy Read: Iterator objects are not correctly released
#464	[NSE-460] fix decimal partial sum in 1.2 branch
#439	[NSE-433]Support pre-built Jemalloc
#453	[NSE-254] remove arrow-data-source-common from jar with dependency
#452	[NSE-254]Fix redundant arrow library issue.
#432	[NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true
#426	[NSE-207] Fix aggregate and refresh UT test script
#442	[NSE-254]Issue0410 jar size
#441	[NSE-254]Issue0410 jar size
#440	[NSE-254]Solve the redundant arrow library issue
#437	[NSE-436] Fix for Arrow Data Source test suite
#387	[NSE-383] Release SMJ input data immediately after being used
#423	[NSE-417] fix sort spill on inplsace sort
#416	[NSE-207] fix left/right outer join in SMJ
#422	[NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator
#369	[NSE-417] Sort spill support framework
#401	[NSE-400] Optimize ColumnarToRow Operator in NSE.
#413	[NSE-411] adding ccache support
#393	[NSE-207] fix scala unit tests
#407	[NSE-403]Add Dataproc integration section to README
#406	[NSE-404]Modify repo name in documents
#402	[NSE-368]Update emr-6.3.0 support
#395	[NSE-394]Support ColumnarArrowEvalPython operator
#346	[NSE-317]fix columnar cache
#392	[NSE-382]Support GCP Dataproc 2.0
#388	[NSE-382]Fix Hadoop version issue
#385	[NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#374	[NSE-207] fix left anti join and support filter wo/ project
#376	[NSE-375] Implement a series of datetime functions
#373	[NSE-183] fix timestamp in native side
#356	[NSE-207] fix issues found in scala unit tests
#371	[NSE-370] Failed to get time zone: NoSuchElementException: None.get
#347	[NSE-183] Add Date/Timestamp type support
#363	[NSE-362] use arrow-unsafe allocator by default
#361	[NSE-273] Spark shim layer infrastructure
#364	[NSE-360] fix ut compile and travis test
#264	[NSE-207] fix issues found from join unit tests
#344	[NSE-343]allow to config codegen opt level
#342	[NSE-341] fix maven build failure
#324	[NSE-223] Add Parquet write support to Arrow data source
#321	[NSE-320] Add build option to enable unsafe Arrow allocator
#299	[NSE-207] fix unsuppored types in aggregate
#338	[NSE-337] UDF: Add test case for validating basic row-based udf
#336	[NSE-333] Arrow Data Source: CSV format support fix
#327	[NSE-326] update scala unit tests to spark-3.1.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.2.0

Gazelle Plugin

Features

Performance

Bugs Fixed

PRs