Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Event Hub binding stops working if storage act returns 503 #3573

Open
MattCosturos opened this issue Oct 18, 2024 · 1 comment
Open
Labels
kind/bug Something isn't working

Comments

@MattCosturos
Copy link

Expected Behavior

Azure Event Hub binding will continue working after receiving a 503 error from the storage act used for checkpointing.

Actual Behavior

The event hub binding dies after receiving a 503 error from table storage. The component does not appear to retry the calls to the storage act, and just gives up. Logs become filled with Error from event processor: the Processor has been stopped. Create a new instance to start processing again

We are using a standard storage act and not a premium act. I could switch to a premium act for better throughput, but I would still have concerns that the component could still fail to process messages if it experiences any sort of transient error.

Can we somehow configure a retry policy to restart this component instead of filling the logs with an error?

time="2024-10-17T19:48:21.031356514Z" level=warning msg="The default value for 'spec.metric.http.increasedCardinality' will change to 'false' in Dapr 1.15 or later" app_id=component-telemetry instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.runtime.diagnostics type=log ver=1.14.4
time="2024-10-17T19:48:24.243756578Z" level=warning msg="The default value for 'spec.metric.http.increasedCardinality' will change to 'false' in Dapr 1.15 or later" app_id=component-telemetry instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.runtime.http type=log ver=1.14.4
time="2024-10-18T01:25:52.104204839Z" level=error msg="Error processing events from partition client: error receiving events: client has been closed by user" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:25:56.922083584Z" level=error msg="Error processing events from partition client: error receiving events: client has been closed by user" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:02.932347006Z" level=error msg="Error from event processor: GET https://my-storage-act.blob.core.windows.net/dapr-component-telemetry\n--------------------------------------------------------------------------------\nRESPONSE 503: 503 The server is busy.\nERROR CODE: ServerBusy\n--------------------------------------------------------------------------------\n\ufeff<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ServerBusy</Code><Message>The server is busy.\nRequestId:9e9a9f71-c01e-0015-15fc-202deb000000\nTime:2024-10-18T01:25:52.0107247Z</Message></Error>\n--------------------------------------------------------------------------------\n" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:07.643955727Z" level=error msg="Error from event processor: GET https://my-storage-act.blob.core.windows.net/dapr-component-telemetry\n--------------------------------------------------------------------------------\nRESPONSE 503: 503 The server is busy.\nERROR CODE: ServerBusy\n--------------------------------------------------------------------------------\n\ufeff<?xml version=\"1.0\" encoding=\"utf-8\"?><Error><Code>ServerBusy</Code><Message>The server is busy.\nRequestId:9e9aad9c-c01e-0015-75fc-202deb000000\nTime:2024-10-18T01:25:56.8329114Z</Message></Error>\n--------------------------------------------------------------------------------\n" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:07.933024175Z" level=error msg="Error from event processor: the Processor has been stopped. Create a new instance to start processing again" app_id=component-telemetry component="component-telemetry-1 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4
time="2024-10-18T01:26:12.644981572Z" level=error msg="Error from event processor: the Processor has been stopped. Create a new instance to start processing again" app_id=component-telemetry component="component-telemetry-0 (bindings.azure.eventhubs/v1)" instance=component-telemetry-microservice-54454fbdd7-mxr8s scope=dapr.contrib type=log ver=1.14.4

Steps to Reproduce the Problem

Unable to reproduce on demand. Leave the component running long enough and this can happen.

@MattCosturos MattCosturos added the kind/bug Something isn't working label Oct 18, 2024
@oising
Copy link

oising commented Nov 14, 2024

This just happened to us -- on DAPR 1.14.4, except it was failure to connect to the storage account that triggered it.

time="2024-11-12T01:47:38.209787491Z" level=error msg="Error from event processor: Get \"https://REDACTED.blob.core.windows.net/checkpoints?comp=list&include=metadata&prefix=REDACTED.servicebus.windows.net%2Fdevice-out%2Fdispatcher%2Fownership%2F&restype=container\": dial tcp:REDACTED:443: connect: connection refused" app_id=dispatcher component="binding-device-out-input (bindings.azure.eventhubs/v1)" instance=dispatcher-848b9dd7fd-2cg2w scope=dapr.contrib type=log ver=1.14.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants