Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table merge crashes without error #3094

Closed
victor-ab opened this issue Jan 1, 2025 · 3 comments
Closed

Table merge crashes without error #3094

victor-ab opened this issue Jan 1, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@victor-ab
Copy link

Delta-rs version:
0.22.3

Binding:

I am using polars, which uses table.merge(data, **delta_merge_options)

writer_props = WriterProperties(
            compression="UNCOMPRESSED",
            column_properties={
                "file_hash": ColumnProperties(
                    bloom_filter_properties=BloomFilterProperties(
                        set_bloom_filter_enabled=True,
                        fpp=0.01,
                    ),
                    dictionary_enabled=True,
                ),
                "file_content": ColumnProperties(max_statistics_size=0),
            },
            statistics_truncate_length=200,
        )

df.write_delta(
                's3://mytable',
                mode="overwrite",
                storage_options=storage_options,
                delta_write_options={
                    "schema_mode": "overwrite",
                    "writer_properties": writer_props,
                    "configuration": {"delta.logRetentionDuration": "interval 1 second"},
                },
            )

df.write_delta(
                    's3://mytable',
                    mode="merge",
                    storage_options=storage_options,
                    delta_merge_options={
                        "predicate": "target.file_hash = source.file_hash",
                        "source_alias": "source",
                        "target_alias": "target",
                        "writer_properties": writer_props,
                    },
                )
                .when_matched_update(updates={"updated_at": "source.updated_at"})
                .when_not_matched_insert_all()
                .execute()
            )

Environment:

  • Cloud provider: S3
  • OS: Windows WSL
  • Other: Polars 1.18.0

Bug

[2025-01-01T20:09:18Z DEBUG deltalake_aws::credentials] Located cached credentials
[2025-01-01T20:09:18Z DEBUG deltalake_aws::credentials] Cached credentials are still valid, returning
[2025-01-01T20:09:18Z DEBUG hyper_util::client::legacy::pool] reuse idle connection for ("https", s3.dualstack.us-east-2.amazonaws.com)
[2025-01-01T20:09:18Z DEBUG hyper_util::client::legacy::pool] pooling idle connection for ("https", s3.dualstack.us-east-2.amazonaws.com)
[2025-01-01T20:09:39Z DEBUG hyper_util::client::legacy::client] client connection error: connection closed before message completed
[2025-01-01T20:09:39Z DEBUG hyper_util::client::legacy::client] client connection error: connection closed before message completed
[2025-01-01T20:09:40Z DEBUG hyper_util::client::legacy::client] client connection error: connection closed before message completed
[2025-01-01T20:10:14Z DEBUG hyper_util::client::legacy::pool] pooling idle connection for ("https", s3.dualstack.us-east-2.amazonaws.com)
[2025-01-01T20:10:16Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:16Z DEBUG deltalake_core::operations::writer] Writing file with estimated size 135682254 to disk.
[2025-01-01T20:10:16Z DEBUG deltalake_aws::credentials] AWSForObjectStore is unlocking..
[2025-01-01T20:10:16Z DEBUG deltalake_aws::credentials] Located cached credentials
[2025-01-01T20:10:16Z DEBUG deltalake_aws::credentials] Cached credentials are still valid, returning
[2025-01-01T20:10:16Z DEBUG hyper_util::client::legacy::pool] reuse idle connection for ("https", s3.dualstack.us-east-2.amazonaws.com)
[2025-01-01T20:10:27Z DEBUG hyper_util::client::legacy::pool] pooling idle connection for ("https", s3.dualstack.us-east-2.amazonaws.com)
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:27Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:28Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:29Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:30Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:31Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:32Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:32Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:33Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:34Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:35Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:35Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:36Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:37Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:37Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:37Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:38Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:38Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:38Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:39Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:39Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:41Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
[2025-01-01T20:10:41Z DEBUG deltalake_core::operations::write] write_execution_plan_with_predicate did not send any batches, no sender.
DEBUG Command exited with signal: Some(9)

What happened:
It crashes without any error

What you expected to happen:
I expected it to execute the merge

How to reproduce it:
I am trying to reproduce it with some dummy data. Will share if I manage to do it.

@victor-ab victor-ab added the bug Something isn't working label Jan 1, 2025
@ion-elgreco
Copy link
Collaborator

@victor-ab that's sigkill, which generally happens when you run into OOM

@rtyler
Copy link
Member

rtyler commented Jan 2, 2025

{"delta.logRetentionDuration": "interval 1 second"},

🤯 As @ion-elgreco this is an out of memory killer terminating the process which typically means too much data is being used at once for the merge window, so reducing that should help.

This line stuck out to me, are you trying to disable transaction history on the table? What's the purpose of such an aggressively small duration?

@victor-ab
Copy link
Author

Hi @ion-elgreco and @rtyler, thanks for the responses.

The {"delta.logRetentionDuration": "interval 1 second"}, was for testing purposes. You can disregard this

I just confirmed, it was indeed an OOM. I didn't consider it before because I was merging just 50 rows into a table of 2 parquets of ~100MB, but I had 20GB+ of free memory!

I see other issues such as #2573 and #2968 which are probably related. Feel free to close this issue, but I think this memory problem should be more deeply investigated.

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants