Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete is currently in progress. - Watchlist removal sometimes gets stuck #10044

Closed
Kaloszer opened this issue Feb 26, 2024 · 23 comments
Closed
Assignees
Labels
WatchList watchLists

Comments

@Kaloszer
Copy link

Kaloszer commented Feb 26, 2024

Describe the bug
Sometimes the REST API/CLI/Portal - Remove watchlist will fail to remove and be stuck for an indefinite period of time
To Reproduce
Steps to reproduce the behavior:

  1. Remove a large watchlist
  2. It will sometimes get stuck on 'Delete is currently in progress.'
  3. Wait for an undefined amount of time
  4. Watchlist is removed

Expected behavior
Watchlist removal process does not get stuck or stops/retries if it fails to do so after some predefined time.

Screenshots
image

Additional context
Do note that said watchlist was tried to be removed with both AZ CLI/Rest API/Portal. In each of those cases it tells me 'Removed watchlist 'name' successfully!'

But it never actually goes away from the environment and causes the deployment process to fail.
This is what MSFT recommends as there is no 'replace' watchlist function. To update a watchlist (e.g. remove entries) you need to remove it and then push it.

https://learn.microsoft.com/en-us/azure/sentinel/watchlists-manage#:~:text=If%20you%27ve%20deleted%20an%20item%20from%20your%20watchlist%20file%20and%20upload%20it%2C%20bulk%20update%20won%27t%20delete%20the%20item%20in%20the%20existing%20watchlist.%20Delete%20the%20watchlist%20item%20individually.%20Or%2C%20when%20you%20have%20a%20lot%20of%20deletions%2C%20delete%20and%20recreate%20the%20watchlist.

Again, this has been occuring for over a year now but it's never been resolved/addressed. It's an intermittent issue that causes some pain for us. The 'undefined' amount of time can be even up to a week! - if it were 30 minutes it wouldn't be that bad. But it can block us from updating a critical watchlist.

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer , Thanks for flagging this issue, we will investigate this issue and get back to you with some updates by 01Mar24. Thanks!

@v-sudkharat
Copy link
Contributor

Hi @Kaloszer, Have you open any support case for this issue? As you mentioned this issue occur if operation perform only for large Watchlist, not for any specific one.

Thanks!

@Kaloszer
Copy link
Author

Kaloszer commented Mar 5, 2024

@v-sudkharat - what is this repository for if not for raising issues :)?

It's been happening for years:
https://learn.microsoft.com/en-us/answers/questions/506415/cannot-delete-watchlist-upload-is-currently-in-pro

More than two weeks with this problem in the production environment. Please, can someone help us? . We opened a case weeks ago, but they don't solve ANYTHING.

https://techcommunity.microsoft.com/t5/microsoft-sentinel/sentinel-watchlist-stuck-in-queued-state-for-days/m-p/3747521

I spoke with Microsoft support who cleared the queued watchlists in a few different workspaces. They confirmed that there was no option for me to do this from the Azure Portal and if it happens again to just contact them to fix.
Bit of an odd one.

It seems to be a known issue according to that last entry - and having to rely on MSFT to clear some queue is not really a solution.

@v-sudkharat
Copy link
Contributor

@Kaloszer, We appreciate you have highlighted this issue with us over this repo, It would be great if you let us know if you have already open support case for this issue, so we can check on it and take follow up on that to get unblocked you from this issue.

Thanks!

@Kaloszer
Copy link
Author

Kaloszer commented Mar 5, 2024

I don't have the issue currently but that does not mean that the issue does not exist and should be addressed.

@v-sudkharat
Copy link
Contributor

Hi @Kaloszer, we will check this issue with our concerned team, and will update you. Thanks!

@v-muuppugund
Copy link
Contributor

@v-sudkharat - what is this repository for if not for raising issues :)?

It's been happening for years: https://learn.microsoft.com/en-us/answers/questions/506415/cannot-delete-watchlist-upload-is-currently-in-pro

More than two weeks with this problem in the production environment. Please, can someone help us? . We opened a case weeks ago, but they don't solve ANYTHING.

https://techcommunity.microsoft.com/t5/microsoft-sentinel/sentinel-watchlist-stuck-in-queued-state-for-days/m-p/3747521

I spoke with Microsoft support who cleared the queued watchlists in a few different workspaces. They confirmed that there was no option for me to do this from the Azure Portal and if it happens again to just contact them to fix.
Bit of an odd one.

It seems to be a known issue according to that last entry - and having to rely on MSFT to clear some queue is not really a solution.

Hi @Kaloszer ,Will do detailed analysis on this issue and will update you

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer ,After analysis ,identified the scenarios of the issue ,working on replicating the issue at my environment,will update you

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer ,Still need some more time for replicating the issue,,will update you

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer ,It seems the issue is with large size and working on having the similar watchlist,will update once replicated from my end and will try to delete in another approach

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer ,Still need some time as unable to find the large size watchlist for replicating the issue, with larger size data,will update you

@Kaloszer
Copy link
Author

@v-muuppugund Easiest way to replicate this issue would be in my opinion to have a loop of a large watchlist that will do the following logic:

deploy > remove > deploy > remove -> and so on for a few hours. Eventually you will hit the aforementioned issue where the deployment will fail.

If luck would have it you'd also experience the behaviour I mentioned earlier - that even if you attempt to remove it from the portal/az cli - even though it's saying it's 'removed' -> when you look at the pipeline (that would be still running as it should retry on failure) it would still be giving you the 'failed to remove' error.

@v-muuppugund
Copy link
Contributor

@v-muuppugund Easiest way to replicate this issue would be in my opinion to have a loop of a large watchlist that will do the following logic:

deploy > remove > deploy > remove -> and so on for a few hours. Eventually you will hit the aforementioned issue where the deployment will fail.

If luck would have it you'd also experience the behaviour I mentioned earlier - that even if you attempt to remove it from the portal/az cli - even though it's saying it's 'removed' -> when you look at the pipeline (that would be still running as it should retry on failure) it would still be giving you the 'failed to remove' error.

Apologies for the delayed response,Sure @Kaloszer ,Will check and let you know

@v-muuppugund
Copy link
Contributor

@v-muuppugund Easiest way to replicate this issue would be in my opinion to have a loop of a large watchlist that will do the following logic:
deploy > remove > deploy > remove -> and so on for a few hours. Eventually you will hit the aforementioned issue where the deployment will fail.
If luck would have it you'd also experience the behaviour I mentioned earlier - that even if you attempt to remove it from the portal/az cli - even though it's saying it's 'removed' -> when you look at the pipeline (that would be still running as it should retry on failure) it would still be giving you the 'failed to remove' error.

Apologies for the delayed response,Sure @Kaloszer ,Will check and let you know

Hi @Kaloszer ,unable to replicate the issue,We will discuss the issue in today's call

@v-muuppugund
Copy link
Contributor

Hi @Kaloszer ,As discussed on Monday i.e. 8Apr2024 , reached our backend team over an email, will update over an email once have an update, we are closing your issue (#10107) as per our standard operating procedures. If you still need support for this issue, feel free to re-open at any time. Thank you for your co-operation!

@Kaloszer
Copy link
Author

@v-muuppugund Before closing the issue, provide a tracking Id to reference.

@v-muuppugund v-muuppugund reopened this Apr 11, 2024
@v-muuppugund
Copy link
Contributor

@Kaloszer As we are unable to raise ICM independently, so reached CSS team and they suggested support case from azure subscription, so they can assist you.

@Kaloszer
Copy link
Author

Hey @v-muuppugund, what would be the correct way to report this so we can actually get a resolution to long standing issues and bugs?

It seems that this is a never-ending circle, raise issue here then here, get no actual solution and it gets bounced around for months until we actually reach the product team.

In a previous UEBA issue it was addressed by the product team eventually after raising said support case:
#8883

For this set of issues:
#10044
#10106
#10107 (we need the root cause or assurances this wont happen again)

I could of course raise another set of support cases but it seems as if for this one I'm also getting a run around and it will take some time to get the issue to the product team.

As we're an MSSP most of the time we can't raise support cases within a customers tenant (as either they don't have a support plan paid for, or we don't have access and will never get it.

Just for sanity sake, what should and should not be reported in this particular repository, would product issues go here, or should they go straight through internal Azure support system? Just to skip the run-around.

@v-muuppugund
Copy link
Contributor

Hey @v-muuppugund, what would be the correct way to report this so we can actually get a resolution to long standing issues and bugs?

It seems that this is a never-ending circle, raise issue here then here, get no actual solution and it gets bounced around for months until we actually reach the product team.

In a previous UEBA issue it was addressed by the product team eventually after raising said support case: #8883

For this set of issues: #10044 #10106 #10107 (we need the root cause or assurances this wont happen again)

I could of course raise another set of support cases but it seems as if for this one I'm also getting a run around and it will take some time to get the issue to the product team.

As we're an MSSP most of the time we can't raise support cases within a customers tenant (as either they don't have a support plan paid for, or we don't have access and will never get it.

Just for sanity sake, what should and should not be reported in this particular repository, would product issues go here, or should they go straight through internal Azure support system? Just to skip the run-around.

HI @Kaloszer , I understood your point, as I don't have access to backend system, so reached our team internally on this ,as per their suggestion asked to raise azure support case for this issue,

@v-sudkharat
Copy link
Contributor

shared response in comment -#10107 (comment)

@Kaloszer
Copy link
Author

Still happening from time to time, no real resolution for this issue other than waiting for a few days...

@lachlancraigie
Copy link

lachlancraigie commented Dec 19, 2024

@v-sudkharat can we reopen this?
@Kaloszer Encountering this bug and found this issue.

I have a logic app that creates a watchlist based on the output of various graph API requests/Azure rest api requests (approximately 10-20k rows). Every night when the logic app triggers, its set to delete the former day's watchlist and recreate. I've added a 6 minute delay in between the deletion and creation phases, but it continually breaks with the error "Cannot upload watchlist 'Resources'. Delete is currently in progress"

I need the delete/recreate step, because otherwise the watchlist will hit the million row capacity after a few weeks of the logic app doing a nightly update on the state of resources.

Image

The watchlist itself is in a 'deleting' state saying 87% completed, this is a frozen number and has not progressed in several hours

Image

This is a pretty bad bug.. resource deletion shouldn't take days

@joydeepdutt
Copy link

joydeepdutt commented Dec 24, 2024

Recommended using the 2024-04-01-preview or later API versions to avoid this issue when calling the API.

This API version will return a 202 (Accepted) status code for the "Delete a Watchlist" action, along with an "Azure-AsyncOperation" header, which can be used to poll for the status of the delete operation. The upload operation also returns a similar status header.

For logic apps, our internal teams are working on updating the actions to use the new API version. In the meantime, we are in the process of releasing V2 actions specifically for delete operations, with deployment expected in the next few weeks.

Once deployed, you should use this V2 action for delete operations and use the Azure-AsyncOperation header to poll for the status of the delete and only start the upload operation once the delete status is 100%.

If you cannot wait for the deployment and need a fix sooner, we recommend using the 2024-04-01-preview REST API with the HTTP logic app connector (instead of Sentinel) and using the headers to poll statuses.

Please note that adding a delay of hours might not always work because we don't know how much delay the operation would require technically due to the environmental complexity in each customer's environment. As an immediate solution, the best approach is to follow the workaround by using the HTTP connector to mock an API call.

Another workaround : Use of the new alias names for the watchlist. ( temporary)

++++++++++++++
Recommended solution 1:
To set up the workflow for deleting a watchlist in Sentinel and using the Azure-AsyncOperation header to poll for the status of the delete operation.

  1. Use the 2024-04-01-preview API Version: Ensure you are using the 2024-04-01-preview or later API versions to avoid issues with deleting and immediately uploading watchlists with the same alias. You can find more information about the API version lifecycle  here: | Microsoft Learn](https://learn.microsoft.com/en-us/azure/ai-services/openai/api-version-deprecation
  2. Delete a Watchlist: When calling the API to delete a watchlist, the API will return a 202 (Accepted) status code along with an Azure-AsyncOperation header. This header contains a URL that you can use to poll for the status of the delete operation. You can refer to the Watchlists - Delete - REST API (Azure Sentinel) | Microsoft Learn for details.
  3. Poll for Status: Use the Azure-AsyncOperation header URL to poll for the status of the delete operation. Continue polling until the delete status is 100%. Detailed information on how to track the status of asynchronous operations can be found at Status of asynchronous operations - Azure Resource Manager | Microsoft Learn
  4. Upload Watchlist: Once the delete operation is complete, you can proceed with uploading the new watchlist. You can refer to the [Watchlists - Create Or Update - REST API (Azure Sentinel)]](https://learn.microsoft.com/en-us/rest/api/securityinsights/watchlists/create-or-update?view=rest-securityinsights-2024-09-01) for more details.

Here is a sample workflow using the HTTP logic app connector (Please feel free to edit as per your business needs)

  1. Create a Logic App: Start by creating a new logic app in Azure.
  2. Add HTTP Action for Delete Operation:
    Add an HTTP action to call the delete watchlist API.
    Set the method to DELETE.
    Set the URI to the delete watchlist API endpoint.
    Capture the Azure-AsyncOperation header from the response.
  3. Add HTTP Action to Poll for Status:
    Add another HTTP action to poll the status of the delete operation.
    Set the method to GET.
    Set the URI to the value of the Azure-AsyncOperation header.
    Use a loop to continue polling until the status indicates that the delete operation is complete
  4. Add HTTP Action for Upload Operation:
    Once the delete operation is complete, add an HTTP action to call the upload watchlist API.
    Set the method to POST.
    Set the URI to the upload watchlist API endpoint.

For more detailed information and examples, you can refer to the Status of asynchronous operations - Azure Resource Manager[3].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WatchList watchLists
Projects
None yet
Development

No branches or pull requests

5 participants