Replies: 2 comments 3 replies
-
Heads up that there's now a MVP (minimal viable pipeline) from STAC API queries to a graph LR
subgraph STAC DataPipeLine
A["IterableWrapper (list[dict])"] --> B
B["PySTACAPISearcher (list[pystac_client.ItemSearch])"] --> C
C["Mapper (list[pystac.ItemCollection])"] --> D
D["StackstacStacker (list[xarray.DataArray])"]
end
where the steps are:
Hoping to finish this by the end of the week 🤞, and will cut a new v0.5.0 release soon after 😁 |
Beta Was this translation helpful? Give feedback.
-
Note that zen3geo v0.6.0 comes with an XpySTACAssetReader DataPipe for reading STAC assets backed by COG/NetCDF/Zarr files, done in #87. This is essentially a wrapper around |
Beta Was this translation helpful? Give feedback.
-
To enable cloud-native, streaming machine learning data pipelines based on SpatioTemporal Asset Catalogs (STAC)!
A torch DataPipe is a way of doing composition over inheritance. The philosophy is to have each
DataPipe
do one thing and do it well similar to the UNIX philosophy of pipe-ing one piece of text to another command. The pipe syntax also has parallels with the method chaining way ofpandas
(seepandas.DataFrame.pipe
).📖 STAC Readers
There are 4 parts as per https://stacspec.org/en/about/stac-spec, and one idea to have individual DataPipes for each of the STAC Item/Catalog/Collection/API as hinted in microsoft/torchgeo#412 (comment)
PySTACItemReader
wrappingpystac.Item.from_file
(✨ PySTACItemReaderIterDataPipe for reading STAC Items #46)PySTACCatalogReader
wrappingpystac.Catalog.from_file
for static catalogs.PySTACCollectionReader
wrappingpystac.ItemCollection.from_file
PySTACAPISearcher
wrapping e.g.pystac_client.Client.search
for dynamic catalogs (✨ PySTACAPISearchIterDataPipe to query dynamic STAC Catalogs #59)See also https://stacspec.org/en/about/stac-spec/
💾 STAC I/O
Coming from the STAC Readers, the STAC objects (Item, ItemCollection, etc) would then need to be read into memory using some I/O library. These I/O libraries would handle the stacking of Assets as mentioned in microsoft/torchgeo#412 (comment). E.g.
StackstacStacker
wrappingstackstac.stack
which returns anxarray.DataArray
(✨ StackSTACStackerIterDataPipe for stacking STAC items #61)ODCstacLoader
wrappingodc.stac.load
which returns anxarray.Dataset
Note: See also opendatacube/odc-stac#54 (comment) for differences between
stackstac
andodc-stac
🐕🦺 STAC services (requiring authentication)
planetary_computer
has their STAC catalog at https://planetarycomputer.microsoft.com/api/stac/v1/, and there are some (but not all) Collections which require signing/authenticationradiant-mlhub
has their own STAC catalog/API library, as mentioned in Add STACAPI dataset microsoft/torchgeo#412 (comment)Note: The authentication/signing can be handled via the
parameters
and/ormodifier
parameters inpystac_client.Client.open
(I think).🥤 Example 'DataPipeLine'
🧑🤝🧑 Open for contributions
Anyone is welcome to comment on the details (e.g. naming the DataPipes, what else is needed, etc), or open a Pull Request directly to implement a DataPipe (see https://zen3geo.readthedocs.io/en/latest/CONTRIBUTING.html#running-things-locally on getting started)!
One thing to note is that I've designed
zen3geo
explicitly so that dependencies are optional by default, so if someone doesn't useodc-stac
for example, they shouldn't have to install it. Just bear this in mind when you're writing up the code.Cc @jamesvrt, @rbavery, @KennSmithDS
Originally discussed in microsoft/torchgeo#412, xref microsoft/torchgeo#576
Beta Was this translation helpful? Give feedback.
All reactions