Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an endpoint to recheck an entire folder for large images #1139

Open
manthey opened this issue May 2, 2023 · 3 comments
Open

Add an endpoint to recheck an entire folder for large images #1139

manthey opened this issue May 2, 2023 · 3 comments
Assignees

Comments

@manthey
Copy link
Member

manthey commented May 2, 2023

This would check if any files could be used as large images that aren't marked as such, and would check if existing large_images are still readable, and, if not, unmark and try to remark them. It should have a recurse option.

Ideally, this would be exposed as a local job so that it can be cancelled.

@manthey
Copy link
Member Author

manthey commented Jul 12, 2023

This is related to #873.

@manthey
Copy link
Member Author

manthey commented Jun 26, 2024

I think the basic process would be to iterate through all the items in a folder or folder tree. Pseudocode:

from girder_large_image.models.image_item import ImageItem

for item in Folder().childItems():
    if item.get('largeImage'):
        if item['largeImage']['expected']:
            # we asked to create a large image and it hasn't finished.  Either the process is in the works or failed;
            # maybe a flag on the endpoint would optionally `ImageItem().delete(item)` in this case and then fall through
            # to the condition is largeImage isn't present in item.
            pass
        else:
            try:
                ItemItem().getMetadata(item)
                # all well, continue
                continue
            except Exception:
                 # we failed to open the item and get its image metadata, therefore:
                 previousFileId = item['largeImage'].get('originalId', item['largeImage']['fileId'])
                 ImageItem().delete(item)
                 # also allow creating jobs based on an option
                 ImageItem().createImageItem(item, File().load(previousFileId))
    else:
        # item was never a large image, can we ask to make sure it can't be?
        # get file id the same way as the createTiles method in the girder large_image rest endpoint
        # also allow creating jobs based on an option
        ImageItem().createImageItem(item, File().load(previousFileId))

Options we could want for this endpoint (this can be multiple PRs):

  • recurse the specified folder
  • delete creation jobs (default False)
  • use creation jobs if we can't read files as images directly (default False)
  • optional criteria to force undoing the largeImage and redoing it (the two examples I have are a regex for the item name or a source name)
  • always use a job when creating the largeImage (the same action as when we ctrl-click the large image button in the UI)

@manthey
Copy link
Member Author

manthey commented Jun 26, 2024

There are a few motivations for this endpoint.:

  • It is possible that the format support was added or improved since the items were uploaded
  • Previous versions might have picked a poor source because of a partial upload. Example, mrxs files have a folder of data and an item that can be interpreted as a JPEG. If the item is selected as a large image before the folder is fully uploaded, we read it with the pil source as a jpeg; redoing it will read it with the openslide source as a mrxs file (and have much more detail).
  • Some files may have gone offline; unmarking them reduces the attempt to use them (this is probably indicative of bad asset management, but that should be addressed differently).
  • If we optionally use a job to reencode an image, it will use more space but also be faster to access. We have cases where we have a folder full of un-optimized images and reencoding all of them in one API call would be handy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants