Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Pipe load creation fails if it tries to load too many of files #49852

Open
tgho-brrrr opened this issue Aug 15, 2024 · 0 comments
Open
Assignees
Labels
type/bug Something isn't working

Comments

@tgho-brrrr
Copy link

tgho-brrrr commented Aug 15, 2024

The S3 connection seems to timeout... Is there a config that I can change? It's maybe the 3 wildcards (*) that make the scanning very slow ? As when I created a pipe to ingest from another prefix with only 2 wildcards (no filename column) it worked, even with slight more files.

Steps to reproduce the behavior (Required)

        CREATE PIPE ingest PROPERTIES ( "AUTO_INGEST" = "FALSE", "BATCH_SIZE" = "64GB", "BATCH_FILES"="1024" )
        AS INSERT INTO mydb.mytable
        SELECT date, filename, c1, c2, c3
        FROM FILES (
            "path" = "s3://BUCKET/PREFIX/date=*/filename=*/*",
            "format" = "parquet",
            "aws.s3.region" = "eu-west-1",
            "aws.s3.use_aws_sdk_default_behavior" = "true",
             "columns_from_path" = "date, filename"
        );

Expected behavior (Required)

Successful creation of pipe;

Real behavior (Required)

Access storage error. Error message: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@3e5549aa[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@be02396[Wrapped task = software.amazon.awssdk.core.internal.http.timers.SyncTimeoutTask@2c529256]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@1a12c692[Shutting down, pool size = 5, active threads = 0, queued tasks = 2, completed tasks = 36172]

Sometimes when I retry I also get this error:
Access storage error. Error message: `s3://BUCKET': FileSystem is closed!

Note that I am using starrocks in shared data mode and the same bucket is used to store the parquet files to load and the data of the cluster.

StarRocks version (Required)

  • You can get the StarRocks version by executing SQL select current_version()

3.3.0-19a3f66

@tgho-brrrr tgho-brrrr added the type/bug Something isn't working label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants