Skip to content

Commit

Permalink
[docs] AssetSpec IO managers via metadata (dagster-io#23279)
Browse files Browse the repository at this point in the history
## Summary & Motivation

With `SourceAsset`s being deprecated, posting this PR to kick off a
discussion around what we tell users who have been relying on its
`io_manager_key` attribute.

**Another related option** would be to expose an `IOManagerMetadataSet`
instead of a constant.

```python
from dagster import AssetSpec, IOManagerMetadataSet

my_source_asset = AssetSpec(
    key=AssetKey("my_source_asset"), metadata=**IOManagerMetadataSet(io_manager_key="s3_io_manager")
)
```

**A third option** would be to expose an `AssetSpecWithIOManager`
subclass that internally uses the metadata scheme above.

**A fourth option** would be a `with_io_manager` static method on
`AssetSpec` that constructs an `AssetSpec` using the metadata schema
bove.

**A fifth option** would be to deprecate the use of IO managers on
non-executable assets. I am not a fan of this option, because I think a
big part of the beauty of the IO manager system is being able to depend
on an asset and treat the process that generates that asset as a black
box. Having one set of capabilities for loading executable assets and
another for non-executable assets seems arbitrary.

## How I Tested These Changes
  • Loading branch information
sryza authored Aug 1, 2024
1 parent 114f3ac commit 8817056
Show file tree
Hide file tree
Showing 2 changed files with 15 additions and 7 deletions.
12 changes: 8 additions & 4 deletions docs/content/concepts/io-management/io-managers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -240,21 +240,25 @@ defs = Definitions(

### Using I/O managers to load source data

Asset definitions often depend on data assets that are generated outside of Dagster, or in a different code location within Dagster, and it's often useful to use I/O managers to load the data from these assets. To represent one of these assets, you can create an asset definition that includes an I/O manager, but no materialization function. Your other assets can then depend on it and load data from it, just as they would with a materializable asset.
Asset definitions often depend on data assets that are generated outside of Dagster, or in a different code location within Dagster, and it's often useful to use I/O managers to load the data from these assets. You can use an <PyObject object="AssetSpec" /> to define an asset with no materialization function, and you can assign an I/O manager to it by including an entry in its metadata dictionary with the `"dagster/io_manager_key"` key. Your other assets can then depend on it and load data from it, just as they would with a materializable asset.

For example:

```python file=/concepts/io_management/source_asset.py
from dagster import AssetKey, SourceAsset, asset
from dagster import AssetKey, AssetSpec, Definitions, asset

my_source_asset = SourceAsset(
key=AssetKey("my_source_asset"), io_manager_key="s3_io_manager"
my_source_asset = AssetSpec(
key=AssetKey("my_source_asset"),
metadata={"dagster/io_manager_key": "s3_io_manager"},
)


@asset
def my_derived_asset(my_source_asset):
return my_source_asset + [4]


defs = Definitions(assets=[my_source_asset, my_derived_asset])
```

### Asset input I/O managers
Expand Down
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
from dagster import AssetKey, SourceAsset, asset
from dagster import AssetKey, AssetSpec, Definitions, asset

my_source_asset = SourceAsset(
key=AssetKey("my_source_asset"), io_manager_key="s3_io_manager"
my_source_asset = AssetSpec(
key=AssetKey("my_source_asset"),
metadata={"dagster/io_manager_key": "s3_io_manager"},
)


@asset
def my_derived_asset(my_source_asset):
return my_source_asset + [4]


defs = Definitions(assets=[my_source_asset, my_derived_asset])

0 comments on commit 8817056

Please sign in to comment.