Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from StarRocks:main #5

Merged
merged 116 commits into from
May 17, 2024
Merged

[pull] main from StarRocks:main #5

merged 116 commits into from
May 17, 2024

Conversation

pull[bot]
Copy link

@pull pull bot commented May 9, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

trueeyu and others added 13 commits May 9, 2024 17:20
…ble binary file starrocks_be (#45042)" (#45355)

Signed-off-by: trueeyu <lxhhust350@qq.com>
Signed-off-by: Albert T. Wong <atwong@alumni.uci.edu>
Signed-off-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
Co-authored-by: evelyn.zhaojie <98087056+evelynzhaojie@users.noreply.github.com>
Signed-off-by: simo <48942089+wangsimo0@users.noreply.github.com>
Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com
Co-authored-by: evelyn.zhaojie <98087056+evelynzhaojie@users.noreply.github.com>
Signed-off-by: starrocks-xupeng <xupeng@starrocks.com>
Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
Co-authored-by: evelyn.zhaojie <98087056+evelynzhaojie@users.noreply.github.com>
Signed-off-by: Kevin Xiaohua Cai <caixiaohua@starrocks.com>
Signed-off-by: packy92 <wangchao@starrocks.com>
…45344)

Signed-off-by: silverbullet233 <3675229+silverbullet233@users.noreply.github.com>
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
…fix the CVE-2023-25194 vulnerability (#45234)

Signed-off-by: Rohit Satardekar <rohitrs1983@gmail.com>
…hema.tables (#45351)

Signed-off-by: HangyuanLiu <460660596@qq.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation title needs [type] labels May 9, 2024
@pull pull bot added ⤵️ pull and removed documentation Improvements or additions to documentation title needs [type] labels May 9, 2024
…um` (#43616)

Why I'm doing:
Rigjht now hdfs scanner optimization on count(1) is to output const column of expected count.

And we can see in extreme case(large dataset), the chunk number flows in pipeline will be extremely huge, and operator time and overhead time is not neglectable.

And here is a profile of select count(*) from hive.hive_ssb100g_parquet.lineorder. To reproduce this extreme case, I've changed code to scale morsels by 20x and repeat row groups by 10x.

in concurrency=1 case , total time is 51s

         - OverheadTime: 25s37ms
           - __MAX_OF_OverheadTime: 25s111ms
           - __MIN_OF_OverheadTime: 24s962ms

             - PullTotalTime: 12s376ms
               - __MAX_OF_PullTotalTime: 13s147ms
               - __MIN_OF_PullTotalTime: 11s885ms
What I'm doing:
Rewrite the count(1) query to sum like. So each row group reader will only emit at one chunk(size = 1).

And total time is 9s.

Original plan is like

+----------------------------------+
| Explain String                   |
+----------------------------------+
| PLAN FRAGMENT 0                  |
|  OUTPUT EXPRS:18: count          |
|   PARTITION: UNPARTITIONED       |
|                                  |
|   RESULT SINK                    |
|                                  |
|   4:AGGREGATE (merge finalize)   |
|   |  output: count(18: count)    |
|   |  group by:                   |
|   |                              |
|   3:EXCHANGE                     |
|                                  |
| PLAN FRAGMENT 1                  |
|  OUTPUT EXPRS:                   |
|   PARTITION: RANDOM              |
|                                  |
|   STREAM DATA SINK               |
|     EXCHANGE ID: 03              |
|     UNPARTITIONED                |
|                                  |
|   2:AGGREGATE (update serialize) |
|   |  output: count(*)            |
|   |  group by:                   |
|   |                              |
|   1:Project                      |
|   |  <slot 20> : 1               |
|   |                              |
|   0:HdfsScanNode                 |
|      TABLE: lineorder            |
|      partitions=1/1              |
|      cardinality=600037902       |
|      avgRowSize=5.0              |
+----------------------------------+
And rewritted plan is like

+-----------------------------------+
| Explain String                    |
+-----------------------------------+
| PLAN FRAGMENT 0                   |
|  OUTPUT EXPRS:18: count           |
|   PARTITION: UNPARTITIONED        |
|                                   |
|   RESULT SINK                     |
|                                   |
|   3:AGGREGATE (merge finalize)    |
|   |  output: sum(18: count)       |
|   |  group by:                    |
|   |                               |
|   2:EXCHANGE                      |
|                                   |
| PLAN FRAGMENT 1                   |
|  OUTPUT EXPRS:                    |
|   PARTITION: RANDOM               |
|                                   |
|   STREAM DATA SINK                |
|     EXCHANGE ID: 02               |
|     UNPARTITIONED                 |
|                                   |
|   1:AGGREGATE (update serialize)  |
|   |  output: sum(19: ___count___) |
|   |  group by:                    |
|   |                               |
|   0:HdfsScanNode                  |
|      TABLE: lineorder             |
|      partitions=1/1               |
|      cardinality=1                |
|      avgRowSize=1.0               |
+-----------------------------------+
Fixes #45242

Signed-off-by: yanz <dirtysalt1987@gmail.com>
@github-actions github-actions bot added the documentation Improvements or additions to documentation label May 10, 2024
xiangguangyxg and others added 10 commits May 10, 2024 10:31
…45266)

Signed-off-by: xiangguangyxg <xiangguangyxg@gmail.com>
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
Signed-off-by: trueeyu <lxhhust350@qq.com>
Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com
Co-authored-by: evelyn.zhaojie <98087056+evelynzhaojie@users.noreply.github.com>
Signed-off-by: yandongxiao <yandongxiao@starrocks.com>
Signed-off-by: packy92 <wangchao@starrocks.com>
…ler (#45241)

Signed-off-by: shuming.li <ming.moriarty@gmail.com>
Signed-off-by: HangyuanLiu <460660596@qq.com>
Signed-off-by: HangyuanLiu <460660596@qq.com>
Signed-off-by: hellolilyliuyi <96421222+hellolilyliuyi@users.noreply.github.com>
Signed-off-by: 絵空事スピリット <wanglichen@starrocks.com>
Signed-off-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
Co-authored-by: 絵空事スピリット <wanglichen@starrocks.com>
Co-authored-by: evelyn.zhaojie <everlyn.zhaojie@gmail.com>
Signed-off-by: zihe.liu <ziheliu1024@gmail.com>
MatthewH00 and others added 29 commits May 15, 2024 18:57
Signed-off-by: MatthewH00 <1639097204@qq.com>
Signed-off-by: hmx <1639097204@qq.com>
Signed-off-by: shuming.li <ming.moriarty@gmail.com>
Signed-off-by: luohaha <18810541851@163.com>
…ribution key is NULL (#45537)

Signed-off-by: HangyuanLiu <460660596@qq.com>
…e profile report (#45675)

Signed-off-by: PengFei Li <lpengfei2016@gmail.com>
Signed-off-by: Smith Cruise <chendingchao1@126.com>
Signed-off-by: stdpain <drfeng08@gmail.com>
Signed-off-by: gengjun-git <gengjun@starrocks.com>
Why I'm doing:
For the CVE problem, we need to upgrade Hadoop SDK from 3.3.6 -> 3.4.0
It will introduce aws java SDK v2, so we can delete SDK v1.

Signed-off-by: Smith Cruise <chendingchao1@126.com>
…20 (#45678)

Signed-off-by: evelynzhaojie <everlyn.zhaojie@gmail.com
Signed-off-by: meegoo <meegoo.sr@gmail.com>
…metadata locks in FE (#45526)

Signed-off-by: HangyuanLiu <460660596@qq.com>
Signed-off-by: zombee0 <ewang2027@gmail.com>
…set is empty (#45715)

Signed-off-by: luohaha <18810541851@163.com>
Signed-off-by: Smith Cruise <chendingchao1@126.com>
Signed-off-by: Murphy <mofei@starrocks.com>
Add titles for the intro pages. In the future if we use auto-generated nav these are required.

Signed-off-by: DanRoscigno <dan@roscigno.com>
Signed-off-by: Alex Zhu <zhuming9011@gmail.com>
…ata mode (#45665)

Signed-off-by: Alex Zhu <zhuming9011@gmail.com>
Signed-off-by: srlch <linzichao@starrocks.com>
Signed-off-by: HangyuanLiu <460660596@qq.com>
Signed-off-by: Dejun Xia <xiadejun@starrocks.com>
@node node merged commit 3e1b7e2 into vivo:main May 17, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⤵️ pull documentation Improvements or additions to documentation title needs [type]
Projects
None yet
Development

Successfully merging this pull request may close these issues.