Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] ArrayIndexOutOfBoundsException after switching to parquet format from orc #4796

Open
1 of 2 tasks
xccui opened this issue Dec 28, 2024 · 1 comment
Open
1 of 2 tasks
Labels
bug Something isn't working

Comments

@xccui
Copy link
Member

xccui commented Dec 28, 2024

Search before asking

  • I searched in the issues and found nothing similar.

Paimon version

b508c19

Compute Engine

Flink 1.18.1

Minimal reproduce step

Use parquet format to write data with array-row nested schemas. It worked well with the orc file format before.

What doesn't meet your expectations?

java.io.IOException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:234)
	at org.apache.paimon.flink.sink.TableWriteOperator.prepareCommit(TableWriteOperator.java:127)
	at org.apache.paimon.flink.sink.RowDataStoreWriteOperator.prepareCommit(RowDataStoreWriteOperator.java:198)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.emitCommittables(PrepareCommitOperator.java:104)
	at org.apache.paimon.flink.sink.PrepareCommitOperator.endInput(PrepareCommitOperator.java:92)
	at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.endOperatorInput(StreamOperatorWrapper.java:96)
	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.endInput(RegularOperatorChain.java:97)
	at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:68)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:562)
	at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:858)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:807)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:932)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.get(Unknown Source)
	at org.apache.paimon.compact.CompactFutureManager.obtainCompactResult(CompactFutureManager.java:67)
	at org.apache.paimon.compact.CompactFutureManager.innerGetCompactionResult(CompactFutureManager.java:53)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactManager.getCompactionResult(MergeTreeCompactManager.java:223)
	at org.apache.paimon.mergetree.MergeTreeWriter.trySyncLatestCompaction(MergeTreeWriter.java:328)
	at org.apache.paimon.mergetree.MergeTreeWriter.prepareCommit(MergeTreeWriter.java:276)
	at org.apache.paimon.operation.AbstractFileStoreWrite.prepareCommit(AbstractFileStoreWrite.java:218)
	at org.apache.paimon.operation.MemoryFileStoreWrite.prepareCommit(MemoryFileStoreWrite.java:155)
	at org.apache.paimon.table.sink.TableWriteImpl.prepareCommit(TableWriteImpl.java:253)
	at org.apache.paimon.flink.sink.StoreSinkWriteImpl.prepareCommit(StoreSinkWriteImpl.java:229)
	... 16 more
Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException
	at org.apache.paimon.reader.RecordReaderIterator.<init>(RecordReaderIterator.java:40)
	at org.apache.paimon.reader.RecordReader.toCloseableIterator(RecordReader.java:210)
	at org.apache.paimon.mergetree.compact.ChangelogMergeTreeRewriter.rewriteOrProduceChangelog(ChangelogMergeTreeRewriter.java:133)
	at org.apache.paimon.mergetree.compact.ChangelogMergeTreeRewriter.upgrade(ChangelogMergeTreeRewriter.java:200)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.upgrade(MergeTreeCompactTask.java:124)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.rewrite(MergeTreeCompactTask.java:146)
	at org.apache.paimon.mergetree.compact.MergeTreeCompactTask.doCompact(MergeTreeCompactTask.java:105)
	at org.apache.paimon.compact.CompactTask.call(CompactTask.java:49)
	at org.apache.paimon.compact.CompactTask.call(CompactTask.java:34)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	... 1 more
Caused by: java.lang.ArrayIndexOutOfBoundsException

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@xccui xccui added the bug Something isn't working label Dec 28, 2024
@Stephen0421
Copy link
Contributor

Stephen0421 commented Dec 29, 2024

Hi, can you provide the schema and the example data? Cause I try to reproduce the exception but I can't, my array-row schema is like below:
Array<Row<metricName String,metricValue DOUBLE,timestamp BIGINT,host STRING,yarn_cluster_name STRING,quantile STRING,ssg STRING,tm_id STRING,task_id STRING,task_name STRING,task_attempt_num INTEGER,subtask_index INTEGER,operator_id STRING,operator_name STRING,operator_subtask_index INTEGER>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants