Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] _msearch API doesn't properly handle task cancellation #17004

Closed
msfroh opened this issue Jan 10, 2025 · 0 comments · Fixed by #17005
Closed

[BUG] _msearch API doesn't properly handle task cancellation #17004

msfroh opened this issue Jan 10, 2025 · 0 comments · Fixed by #17005
Labels
bug Something isn't working Search:Resiliency v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0

Comments

@msfroh
Copy link
Collaborator

msfroh commented Jan 10, 2025

Describe the bug

The _msearch API will execute some number of requests concurrently (up to max_concurrent_searches), but will queue up any requests beyond that and execute them in a callback once initial requests complete.

If the task times out (or is otherwise canceled), we still try to execute the subsequent requests anyway. This throws an uncaught exception with the following stack trace:

TaskCancelledException[The parent task was cancelled, shouldn't start any child tasks, channel closed]
at org.opensearch.tasks.TaskManager$CancellableTaskHolder.registerChildNode(TaskManager.java:680)
at org.opensearch.tasks.TaskManager.registerChildNode(TaskManager.java:350)
at org.opensearch.action.support.TransportAction.registerChildNode(TransportAction.java:78)
at org.opensearch.action.support.TransportAction.execute(TransportAction.java:97)
at org.opensearch.client.node.NodeClient.executeLocally(NodeClient.java:112)
at org.opensearch.client.node.NodeClient.doExecute(NodeClient.java:99)
at org.opensearch.client.support.AbstractClient.execute(AbstractClient.java:480)
at org.opensearch.client.support.AbstractClient.search(AbstractClient.java:611)
at org.opensearch.action.search.TransportMultiSearchAction.executeSearch(TransportMultiSearchAction.java:180)
at org.opensearch.action.search.TransportMultiSearchAction$1.lambda$handleResponse$0(TransportMultiSearchAction.java:200)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:964)

This results in a "zombie" indices:data/read/msearch task.

Related component

Search:Resiliency

To Reproduce

N/A

Expected behavior

If the parent msearch task is canceled, we should not try to run any more search requests.

Additional Details

N/A

@msfroh msfroh added bug Something isn't working untriaged labels Jan 10, 2025
@reta reta added v3.0.0 Issues and PRs related to version 3.0.0 v2.19.0 Issues and PRs related to version 2.19.0 and removed untriaged labels Jan 11, 2025
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Search:Resiliency v2.19.0 Issues and PRs related to version 2.19.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants