Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to delete more than a few hundred items #1090

Closed
VVH opened this issue Aug 2, 2019 · 9 comments
Closed

Unable to delete more than a few hundred items #1090

VVH opened this issue Aug 2, 2019 · 9 comments

Comments

@VVH
Copy link
Contributor

VVH commented Aug 2, 2019

Tried to delete Whitman data at a relatively quiet time--end of a day on a Friday. Selecting 100 items at a time I successfully deleted around 300 items (consisting of several thousand pages). This worked ok a few times, but at the 400 or 500 mark Cloudflare 502 bad gateway notification appeared suggesting I try again in a few minutes. I did had had the same result 2x. Gave around 10 minutes between the 2nd and 3rd try. I sometimes get a 504 error.

image

How can we reproduce the bug?
Steps to reproduce the behavior:

  1. Admin, items, Whitman, not published.
  2. Select 100 items to delete
  3. This will take a little while and then a notification pops up asking if you're sure. Click yes (at bottom of page)
  4. Sometimes the error message pops up, sometimes the items delete ok. Keep going till you hit the error?

What is the expected behavior?
Maybe this is the expected behavior?

I'd love to be able to delete more items at once. That might not be possible, but would be helpful to know roughly how much a CM can expect to do in a given window.

@acdha
Copy link
Member

acdha commented Aug 2, 2019

This is due to the number of related objects involved, so there's no simple way to know without counting all of the related objects first since some items may have far more assets, transcriptions, tags, etc. than others.

If this is something you need to do regularly, we can buy some performance by optimizing the query pattern (the confirmation display is slow because it involves looking up information for lots of related objects one by one unless those are prefetched, and the deletion process uses that same path to log the object deletion history) but the would need to write a custom admin delete method which could do that more efficiently with some of the data integrity safeguards.

@elainekamlley
Copy link
Collaborator

Thanks for info @acdha! Deleting may not be as common as unpublishing but in the case of Whitman it was an honest mistake that could earnestly happen again. Ultimately, keeping the application organized and holding items we intend to use makes it easier to keep track (especially as we continue to add more) and reduces risk of publishing items we are not supposed to.

What is the level of effort to creating a custom delete, @acdha @rstorey? This would be helpful so we can prioritize this against other work.

@rstorey
Copy link
Member

rstorey commented Aug 6, 2019

We need to write a custom delete anyway because otherwise the stuff for the items / assets being deleted that's in S3 becomes orphaned.

@VVH
Copy link
Contributor Author

VVH commented Sep 11, 2019

Just an update that I'm currently able to delete around 500-1000 items every time I try before I get the above error message. At the current rate that will take me roughly 88 days to do manually. I seem to be able to only do this roughly once a day. 2x a day if I'm lucky.

@acdha
Copy link
Member

acdha commented Sep 12, 2019

Is there any harm in leaving them there until someone has time to write a custom bulk delete handler which will also prune the corresponding S3 objects? If these have to go now, the fastest way would be to delete the transcriptions and/or assets first so the item deletion doesn't have to process those related records first.

@VVH
Copy link
Contributor Author

VVH commented Sep 12, 2019

These can stay put for now. It just means we need to subtract them from our total items count when we talk about our data. To clarify, these were imported but never published so they won't have associated transcriptions.

@lalgee
Copy link
Collaborator

lalgee commented Sep 14, 2020

This issue of being unable to delete efficiently also affects Catt, NAWSA, and Blackwell. We've discovered a duplication issue which was replicated into BTP from loc.gov. A significant number of assets will have to be deleted from each of these campaigns. This issue is not pressing, but the deletion will need to occur before export for ingest to loc.gov.

@jkueloc
Copy link
Collaborator

jkueloc commented Jun 9, 2021

after discussion with @rabiloc - working #1450 1st

@jkueloc
Copy link
Collaborator

jkueloc commented Nov 30, 2021

Work this period has been on understanding the information presented above, also looking at the work done (code and conversation) on #1257 and researching and trying to work through how to add to the various pieces to the code base in order to add the deletion of the related S3 objects. This is also in line with 'Sunsetting of Campaigns'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants