Why causes hash corruption error when reading hive bucket table in trino? #10112

ckddn9496 · 2021-11-30T02:51:38Z

ckddn9496
Nov 30, 2021

First of all, the format of the data cannot be disclosed, but the bucket table is created through the following table creation statements and options in hive.

Create DDL

CREATE EXTERNAL TABLE `schema1.ex1`( `col1` string, `col2` string, `col3` string, `col4` string ) PARTITIONED BY(`date` string) CLUSTERED BY (col1) SORTED BY(col2) INTO 32 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
...
TBL PROPERTIES(
'bucketing_version=2',
'orc.compress=ZLIB')

Set Property

set hive.enforce.bucketing = true; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.max.dynamic.partitions=1000; set hive.exec.max.dynamic.partitions.pernode=1000; set hive.execution.engine=mr; set hive.mapreduce.job.queuename=root.queue1;

Insert Data (1 months)

Insert into table `schema1.ex1` partition(date) select col1, col2, col3, col4 from `schema1.existing_table` where date between '2021-09-01' and '2021-10-01'

Read the data (in trino)

Then, When I read the hive bucket table in trino, I got the following error

'io.trino.spi.TrinoException: Hive table is corrupt. File 'hdfs://cluster1/hive/warehouse/schema1/ex1/date=2021-09-02/000026_0' is for bucket 26, but contains a row for bucket 9.

I don't know why the above error occurred. I would appreciate it if you could tell me the cause of the error and the solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why causes hash corruption error when reading hive bucket table in trino? #10112

{{title}}

Replies: 0 comments

Select a reply

Why causes hash corruption error when reading hive bucket table in trino? #10112

ckddn9496 Nov 30, 2021

Replies: 0 comments

ckddn9496
Nov 30, 2021