FilesetWrapper lazy loads Images and UsedFiles #405

will-moore · 2024-03-27T14:38:28Z

Fixes #404.

When we do conn.getObject('Fileset', fileset_id) we don't try to load all Images and Files up front.
Instead, these are lazily loaded as required using the same pattern as for other Wrapper objects.

Test that fileset.copyImages() and fileset.listFiles() behaviour don't change:

E.g. test script should show no change:

python test.py 123

import argparse

from omero.cli import cli_login
from omero.gateway import BlitzGateway

def main(args):
    parser = argparse.ArgumentParser()
    parser.add_argument('fileset', type=int)
    args = parser.parse_args(args)
    fileset_id = args.fileset

    with cli_login() as cli:
        conn = BlitzGateway(client_obj=cli._client)

        fileset = conn.getObject('Fileset', fileset_id)

        for i in fileset.copyImages():
            print("Image", i.id, i.name)

        for f in fileset.listFiles():
            print("File", f.id, f.name)


if __name__ == '__main__':
    import sys
    main(sys.argv[1:])

sbesson · 2024-03-28T08:38:53Z

Maybe I am missing something obvious but shouldn't _FilesetWrapper._getQueryString() also be updated to remove the fetch commands in order to achieve proper lazy loading?

will-moore · 2024-03-28T08:43:27Z

@sbesson - Apologies, that's a copy/paste error on my part. Thanks for catching!
I have had issues in the past with pip install -e for omero-py so I was editing in a different location.
I certainly used the updated query string in testing.
Fixed in b86cd89

sbesson · 2024-03-28T09:08:16Z

Thanks. Two additional questions:

arguably, what is proposed is as a breaking change as a consumer relying conn.getObject('Fileset', fileset_id) to return a loaded object would need to modify their code to call copyImage and/or listFiles. Should this consider some form of backwards-compatibility e.g. using the opts parameter to toggle lazy loading?
when using lazy loading, should _FilesetWrapper._getQueryString simply use BlitzObjectWrapper._getQueryString which fetches the owner and creation event in addition to the object?

will-moore · 2024-03-28T10:21:03Z

I'm not sure if this is considered a breaking change, at least not to the BlitzGateway API.
For a consumer to use the FilesetWrapper as in the example above, nothing should change, and it shouldn't make any difference that the loading happens under the hood during the initial getObject() call or subsequent calls (except that if you're only loading images or files (and not both) then the new behaviour should be faster).

If you consider the API to extend to the underlying fileset._obj object behaviour, and you want to do this:

fileset = conn.getObject('Fileset', fileset_id)
fileset._obj.copyUsedFiles()

then this will break. This is kinda edge-case behaviour, so I'm not sure that it justifies a major release, but if so then I think it's worth doing because it's important that the default behaviour of conn.getObject('Fileset', fileset_id) is changed to avoid OOM errors.

Using opts to specify whether we want to load more of the graph up front has been useful for other objects, primarily for use in the JSON API, where we want to pass the underlying omero.model objects to the encoder. In the case where we are using the BlitzGateway API to traverse the graph, lazy loading is the preferred behaviour, particularly in this case.

I'm happy to add opts for load_images and load_files, as long as the default opts are False, but I also feel that this feature isn't really needed yet so I'm OK with leaving it till it is required.

In the meantime I'll remove the _FilesetWrapper._getQueryString as suggested is it's not needed.

sbesson

Tested against a representative IDR plate (1K images, 20K files) - see https://idr.openmicroscopy.org/webclient/?show=plate-10263

>>> from omero.gateway import BlitzGateway
>>> conn=BlitzGateway("public","public",host="idr.openmicroscopy.org",secure=True)
>>> conn.connect()
True
>>> fileset = conn.getObject("Fileset",6311594)
>>> fileset._obj._usedFilesLoaded
False
>>> fileset._obj._imagesLoaded
False
>>> print(len(fileset.copyImages()))
1152
>>> fileset._obj._imagesLoaded
True
>>> fileset._obj._usedFilesLoaded
False
>>> print(len(fileset.listFiles()))
23045
>>> fileset._obj._usedFilesLoaded
True

In terms of response time, getObject("Fileset", 6311594) returned instantly while fileset.listFiles() took several seconds.

Following up on the discussion in #405, internal fields prefixed with _ should not be considered as part of the public API inline. In that sense, I agree there is no change in behavior as accessing the images and files associated with a fileset should happen using the public copyImages() and listFiles APIs as described in https://omero.readthedocs.io/en/stable/developers/Python.html#filesets-added-in-omero-5-0. The only difference introduced by this PR is that these API calls will no longer return instantly due to the fetching and their response time will depend on the number of underlying objects. I'll leave others to comment but it might be worth mentioning this aspect in the docstring and/or the reference documentation.

Overall, I agree that the current behavior is dangerous especially when loading filesets with large numbers of images and/or plates and lazy loading addresses this issue.

FilesetWrapper lazy loads Images and UsedFiles

cf3f968

will-moore requested a review from knabar March 27, 2024 17:14

Remove Images and Files from FilesetWrapper._getQueryString()

b86cd89

remove _FilesetWrapper._getQueryString()

5ad53eb

sbesson approved these changes Mar 29, 2024

View reviewed changes

jburel mentioned this pull request Apr 24, 2024

Session ID and constructor #400

Merged

jburel merged commit eea0b87 into ome:master Apr 24, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesetWrapper lazy loads Images and UsedFiles #405

FilesetWrapper lazy loads Images and UsedFiles #405

will-moore commented Mar 27, 2024 •

edited

Loading

sbesson commented Mar 28, 2024

will-moore commented Mar 28, 2024

sbesson commented Mar 28, 2024

will-moore commented Mar 28, 2024

sbesson left a comment

FilesetWrapper lazy loads Images and UsedFiles #405

FilesetWrapper lazy loads Images and UsedFiles #405

Conversation

will-moore commented Mar 27, 2024 • edited Loading

sbesson commented Mar 28, 2024

will-moore commented Mar 28, 2024

sbesson commented Mar 28, 2024

will-moore commented Mar 28, 2024

sbesson left a comment

Choose a reason for hiding this comment

will-moore commented Mar 27, 2024 •

edited

Loading