Skip to content

High level dataset API overview

R Schwanhold edited this page Mar 3, 2020 · 2 revisions

This wiki page gives you a basic understanding of the high-level dataset API.
For more detailed examples see webknossos-cuber/tests/test_dataset.py.

There are three different dataset types (WKDataset, TiffDataset, and TiledTiffDataset) which support a very similar interface.
The main difference between the TiffDataset and the TiledTiffDataset is that the TiffDataset stores a z-layer in a single image, whereas TiledTiffDataset divides a z-layer into multiple images.

The essential operations for datasets are creating, opening, reading data and writing data. The datasource-properties.json gets updated automatically.
Here are some examples for working with the high-level dataset API:

Creating a WKDataset:

ds = WKDataset.create("path_to_dataset/wk_dataset", scale=(1, 1, 1))
ds.add_layer("color", "color")

ds.get_layer("color").add_mag("1")
ds.get_layer("color").add_mag("2-2-1")

# The directories are created automatically
assert path.exists("path_to_dataset/wk_dataset/color/1")
assert path.exists("path_to_dataset/wk_dataset/color/2-2-1")

assert len(ds.properties.data_layers) == 1
assert len(ds.properties.data_layers["color"].wkw_magnifications) == 2

Similar to the WKDataset, this also works for TiffDatasets:

ds = TiffDataset.create("path_to_dataset/tiff_dataset", scale=(1, 1, 1))
ds.add_layer("color", Layer.COLOR_TYPE)

ds.get_layer("color").add_mag("1")
ds.get_layer("color").add_mag("2-2-1")

...

To create a TiledTiffDatasets, you also have to specify the tile_size:

ds = TiledTiffDataset.create(
        "./testoutput/TiledTiffDataset",
        scale=(1, 1, 1),
        tile_size=(32, 64),
        pattern="{xxx}/{yyy}/{zzz}.tif",
    )
ds.add_layer("color", Layer.COLOR_TYPE)

ds.get_layer("color").add_mag("1")
ds.get_layer("color").add_mag("2-2-1")

...

Opening datasets:

wk_ds = WKDataset("path_to_dataset/wk_dataset")
...

tiff_ds = TiffDataset("path_to_dataset/tiff_dataset")
...

tiled_tiff_ds = TiledTiffDataset("path_to_dataset/tiled_tiff_dataset")
...

Reading and writing data (this also works the same way for the TiffDataset and TiledTiffDataset):

wk_ds = WKDataset("path_to_dataset/wk_dataset")
mag = wk_ds.add_layer("another_layer", Layer.COLOR_TYPE, num_channels=3).add_mag("1")

data = (np.random.rand(3, 250, 250, 250) * 255).astype(np.uint8)
mag.write(data)

assert np.array_equal(data, mag.read(size=(250, 250, 10)))

The high-level dataset API also introduces the concept of a View. A View is a handle to a specific bounding box in the dataset. Views can be used to read and write data. The advantage is that Views can be passed around.

wk_view = WKDataset("path_to_dataset/wk_dataset").get_view(
     "another_layer", 
     "1", 
     size=(32, 32, 32),
     offset=(10,10,10)
)

data = (np.random.rand(3, 20, 20, 20) * 255).astype(np.uint8)
wk_view.write(data)
...

The TiledTiffDataset also supports a method to return the data of a specific tile:

tiled_tiff_ds = TiledTiffDataset.create(
    "path_to_dataset/tiled_tiff_dataset",
    scale=(1, 1, 1),
    tile_size=(32, 64),
    pattern="{xxxx}_{yyyy}_{zzzz}.tif",
)

mag = tiled_tiff_ds.add_layer("color", "color").add_mag("1")

data = (np.random.rand(250, 200, 10) * 255).astype(np.uint8)
mag.write(data, offset=(5, 5, 5))

assert mag.get_tile(1, 1, 6).shape == (1, 32, 64, 1)

# the method get_tile returns the content of the image with the specified x-, y-, and z-value
assert np.array_equal(
    mag.get_tile(1, 2, 6)[0, :, :, 0],
    TiffReader("./testoutput/tiled_tiff_dataset/color/1/001_002_006.tif").read(),
)
Clone this wiki locally