Funlib persistence update #322

pattonw · 2024-11-05T19:49:33Z

Upgrade to funlib.persistence 0.5.

This update makes a one big improvement:
Custom Array class no longer needed. We used this mostly just to apply preprocessing lazily to large arrays. New funlib Array class uses dask internally which comes with much better support for lazy array operations than we built for ourselves. The ZarrArray and NumpyArray class which were used extensively throughout DaCapo have now been replaced with simple funlib.persistence.Arrays.

A minor incompatibility:
funlib.persistence.Array has a convention (for now) that all axes have names, but non-spatial axes have a "^" in their name. This will be fixed in the near future. For now, DaCapo convention needed to change a little bit to adapt to this. We now have to use "c^" and "b^" for channel and batch dimensions instead of just "c" and "b".

TODOs:
This pull request is not quire ready to merge. I pass the tests run with pytest, and the minimal_tutorial notebook executes. But there is a lot of code that is not tested. Specifically many of the ArrayConfig subclasses are not yet tested and some are missing implementations.

Here are the Preprocessing array configs, whether or not their implementation is complete, and their code coverage:

Best practice would be to add tests before merging, but I want to put this here so others can test it

fixing batch dim bugs (batch norm requires batch dimension even in predict mode) Seems to also fix the strange loss spike. I think it was due to setting model into eval mode and then not resetting to training at the end

Exceptions DVID and Resampled arrays

pattonw · 2024-11-05T21:08:49Z

Added support for most of the remaining preprocessing operations.
Exceptions are:

DVID arrays, there is no numpy like interface that I know of for DVID arrays. You have to access data via special getter methods. This means we probably need a custom handler to turn it into something numpy array like.
Resampled arrays. I'm pretty confident this can be done purely with dask, I just don't know how yet.

mzouink · 2024-11-06T14:51:04Z

i tried to run minimal example:
got error on :

dacapo/examples/starter_tutorial/minimal_tutorial.py

Lines 111 to 118 in 108db88

    
           cell_array = prepare_ds( 
        
               "cells3d.zarr", 
        
               "raw", 
        
               Roi((0, 0, 0), cell_data.shape[1:]) * voxel_size, 
        
               voxel_size=voxel_size, 
        
               dtype=np.uint8, 
        
               num_channels=None, 
        
           )

because it doesn't fit to the new prepare_ds function
https://github.com/funkelab/funlib.persistence/blob/3c0760e48edf1b287c4f75d7d11dc6b775332b2b/funlib/persistence/arrays/datasets.py#L121-L132
is there anyway to have a wrapper function (can have deprecated flag) that can support the old structure of the function. will be easier than looking for all the use of of prepare_ds and change them

pattonw · 2024-11-06T15:11:48Z

Oh sorry, I only fixed the version in docs/source/notebooks/...
I can fix the version in examples as well

I would recommend against a wrapper. I think it would be confusing to have two different versions of the same function that behave differently

pattonw added 16 commits October 28, 2024 15:52

update dependencies

fd04379

update to funlib.persistence

94c9894

update predict local

89add10

fix bug in constant array

db84f33

remove unnecessary print statements

285e869

remove unnecessary print statement

d177bda

minor improvements in type hints and fix small bugs

e013652

fix watershed post processor and affinities predictor

7b72857

fix predict local

148251d

fix binary segmentation postprocessors

4eeabc9

update dependencies

d8d056b

import zarr before using it

978950e

fix local predict

1cae5cf

fixing batch dim bugs (batch norm requires batch dimension even in predict mode) Seems to also fix the strange loss spike. I think it was due to setting model into eval mode and then not resetting to training at the end

update minimal tutorial

0cc6db1

Add support for most of the remaining arrays.

d67f098

Exceptions DVID and Resampled arrays

black formatting

41cf3a9

fix mypy errors

215d8b4

pattonw and others added 4 commits November 6, 2024 07:13

remove extra notebooks, these should be built by sphinx

8fdcdd1

update starter_tutorial to match doc example

ec8f7a6

update github docs workflow to execute tutorial from examples

161e753

Merge branch 'main' into funlib-persistence

1d5c501

mzouink merged commit aeb77a6 into janelia-cellmap:main Nov 6, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Funlib persistence update #322

Funlib persistence update #322

pattonw commented Nov 5, 2024 •

edited

Loading

pattonw commented Nov 5, 2024

mzouink commented Nov 6, 2024

pattonw commented Nov 6, 2024

Funlib persistence update #322

Funlib persistence update #322

Conversation

pattonw commented Nov 5, 2024 • edited Loading

pattonw commented Nov 5, 2024

mzouink commented Nov 6, 2024

pattonw commented Nov 6, 2024

pattonw commented Nov 5, 2024 •

edited

Loading