Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

v1.2.0

Compare
Choose a tag to compare
@zhixingheyi-tian zhixingheyi-tian released this 03 Sep 06:36
730f5e9

Gazelle 1.2.0 is the 2nd release from the repository oap-project(https://github.com/oap-project/gazelle_plugin).
In this release, we have implemented 11 new features support, fix 11 performance issues & bugs, and merged total of 48 PRs.

Here is the major highlight in Gazelle 1.2.0:

  • 1.25X speed up on TPC-DS 103 queries.
  • Add RDD Cache support
  • Add Spill & UDF Support
  • Implement native Column to Row optimization
  • Further enhances the stability of performance for many fall back cases.

Gazelle Plugin

Features

No. Description
#394 Support ColumnarArrowEvalPython operator
#368 Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0
#375 Implement a series of datetime functions
#183 Add Date/Timestamp type support
#362 make arrow-unsafe allocator as the default
#343 configurable codegen opt level
#333 Arrow Data Source: CSV format support fix
#223 Add Parquet write support to Arrow data source
#320 Add build option to enable unsafe Arrow allocator
#337 UDF: Add test case for validating basic row-based udf
#326 Update Scala unit test to spark-3.1.1

Performance

No. Description
#400 Optimize ColumnarToRow Operator in NSE.
#411 enable ccache on C++ code compiling

Bugs Fixed

No. Description
#358 Running TPC DS all queries with native-sql-engine for 10 rounds will have performance degradation problems in the last few rounds
#481 JVM heap memory leak on memory leak tracker facilities
#436 Fix for Arrow Data Source test suite
#317 persistent memory cache issue
#382 Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc
#384 ColumnarBatchScanExec reading parquet failed on java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#370 Failed to get time zone: NoSuchElementException: None.get
#360 Cannot compile master branch.
#341 build failed on v2 with -Phadoop-3.2

PRs

No. Description
#489 [NSE-481] JVM heap memory leak on memory leak tracker facilities (Arrow Allocator)
#486 [NSE-475] restore coalescebatches operator before window
#482 [NSE-481] JVM heap memory leak on memory leak tracker facilities
#470 [NSE-469] Lazy Read: Iterator objects are not correctly released
#464 [NSE-460] fix decimal partial sum in 1.2 branch
#439 [NSE-433]Support pre-built Jemalloc
#453 [NSE-254] remove arrow-data-source-common from jar with dependency
#452 [NSE-254]Fix redundant arrow library issue.
#432 [NSE-429] TPC-DS Q14a/b get slowed down within setting spark.oap.sql.columnar.sortmergejoin.lazyread=true
#426 [NSE-207] Fix aggregate and refresh UT test script
#442 [NSE-254]Issue0410 jar size
#441 [NSE-254]Issue0410 jar size
#440 [NSE-254]Solve the redundant arrow library issue
#437 [NSE-436] Fix for Arrow Data Source test suite
#387 [NSE-383] Release SMJ input data immediately after being used
#423 [NSE-417] fix sort spill on inplsace sort
#416 [NSE-207] fix left/right outer join in SMJ
#422 [NSE-421]Disable the wholestagecodegen feature for the ArrowColumnarToRow operator
#369 [NSE-417] Sort spill support framework
#401 [NSE-400] Optimize ColumnarToRow Operator in NSE.
#413 [NSE-411] adding ccache support
#393 [NSE-207] fix scala unit tests
#407 [NSE-403]Add Dataproc integration section to README
#406 [NSE-404]Modify repo name in documents
#402 [NSE-368]Update emr-6.3.0 support
#395 [NSE-394]Support ColumnarArrowEvalPython operator
#346 [NSE-317]fix columnar cache
#392 [NSE-382]Support GCP Dataproc 2.0
#388 [NSE-382]Fix Hadoop version issue
#385 [NSE-384] "Select count(*)" without group by results in error: java.lang.IllegalArgumentException: not all nodes and buffers were consumed
#374 [NSE-207] fix left anti join and support filter wo/ project
#376 [NSE-375] Implement a series of datetime functions
#373 [NSE-183] fix timestamp in native side
#356 [NSE-207] fix issues found in scala unit tests
#371 [NSE-370] Failed to get time zone: NoSuchElementException: None.get
#347 [NSE-183] Add Date/Timestamp type support
#363 [NSE-362] use arrow-unsafe allocator by default
#361 [NSE-273] Spark shim layer infrastructure
#364 [NSE-360] fix ut compile and travis test
#264 [NSE-207] fix issues found from join unit tests
#344 [NSE-343]allow to config codegen opt level
#342 [NSE-341] fix maven build failure
#324 [NSE-223] Add Parquet write support to Arrow data source
#321 [NSE-320] Add build option to enable unsafe Arrow allocator
#299 [NSE-207] fix unsuppored types in aggregate
#338 [NSE-337] UDF: Add test case for validating basic row-based udf
#336 [NSE-333] Arrow Data Source: CSV format support fix
#327 [NSE-326] update scala unit tests to spark-3.1.1