-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of parquet writes for read-modify-write #4212
Conversation
5554ed0
to
4d9ac43
Compare
engine/table/src/main/java/io/deephaven/engine/util/file/TrackedFileHandleFactory.java
Outdated
Show resolved
Hide resolved
....deephaven.parquet.table.ParquetTableReadWriteTest_root/tempDir/readModifyWriteTests.parquet
Outdated
Show resolved
Hide resolved
engine/table/src/main/java/io/deephaven/engine/util/file/TrackedFileHandleFactory.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
...nsions/parquet/table/src/test/java/io/deephaven/parquet/table/ParquetTableReadWriteTest.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
for (Map.Entry<String, GroupingColumnWritingInfo> entry : groupingColumnsWritingInfoMap.entrySet()) { | ||
final String groupingColumnName = entry.getKey(); | ||
final Table auxiliaryTable = groupingAsTable(t, groupingColumnName); | ||
final String parquetColumnName = entry.getValue().parquetColumnName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General thing to think about:
ParquetWriteInstructions
are used at multiple levels. This code could do with a cleanup around control flow and the responsibilities of each layer.
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTableWriter.java
Outdated
Show resolved
Hide resolved
extensions/parquet/table/src/main/java/io/deephaven/parquet/table/ParquetTools.java
Outdated
Show resolved
Hide resolved
} | ||
throw e; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we didn't have to declare this in throws
. Your existing version is better.
Bug fix, closes #2831
Currently, while writing a parquet file, we overwrite if there is any existing files at that path. And in case the write fails (for whatever reason), we delete the partially written file. This is particularly bad for the read-modify-write scenario where we open a file handle to an existing parquet file, do some table operations and then try to write it at the same path. In the current case, we can overwrite the file while we are reading from it which will lead to a bad state.
In this PR, we are changing the write functionality by first writing to a temporary file in the same directory. Once the write completes, we place the file at its destination path and swap out if there is any existing files at that path.