Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dagster-airlift] Move cheatsheet to equivalents page #26115

Open
wants to merge 1 commit into
base: dpeng817/airflow-dagster-equivalents
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 259 additions & 0 deletions docs/content/integrations/airlift/airflow-dagster-equivalents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -370,3 +370,262 @@ Airflow Pools allow users to limit the number of concurrent tasks that can be ru
### Airflow Task Groups

Airflow task groups allow you to organize tasks into hierarchical groups within the Airflow UI for a particular DAG. Dagster has _global_ asset groups which can be applied to any asset. Learn more about [asset groups](/concepts/assets/software-defined-assets#assigning-assets-to-groups).

## Cheatsheet

Here's a cheatsheet for Airflow users migrating to Dagster:

<table
className="table"
style={{
width: "100%",
}}
>
<thead>
<tr>
<th
style={{
width: "25%",
}}
>
Airflow concept
</th>
<th
style={{
width: "30%",
}}
>
Dagster concept
</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Directed Acyclic Graphs (DAG)</td>
<td>
<a href="/concepts/ops-jobs-graphs/jobs">Jobs</a>
</td>
<td></td>
</tr>
<tr>
<td>Task</td>
<td>
<a href="/concepts/ops-jobs-graphs/ops">Ops</a>
</td>
<td></td>
</tr>
<tr>
<td>Datasets</td>
<td>
<a href="/concepts/assets/software-defined-assets">Assets</a>
</td>
<td>
Dagster assets are more powerful and mature than datasets and include
support for things like{" "}
<a href="/concepts/partitions-schedules-sensors/partitioning-assets">
partitioning
</a>
.
</td>
</tr>
<tr>
<td>Connections/Variables</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/configuration/config-schema">
Run configuration
</a>
</li>
<li>
<a href="/concepts/configuration/configured">Configured API</a>{" "}
(Legacy)
</li>
<li>
<a href="/guides/dagster/using-environment-variables-and-secrets">
Environment variables
</a>{" "}
(Dagster+ only)
</li>
</ul>
</td>
<td></td>
</tr>
<tr>
<td>DagBags</td>
<td>
<a href="/concepts/code-locations">Code locations</a>
</td>
<td>
Multiple isolated code locations with different system and Python
dependencies can exist within the same Dagster instance.
</td>
</tr>
<tr>
<td>DAG runs</td>
<td>Job runs</td>
<td></td>
</tr>
<tr>
<td>
<code>depends_on_past</code>
</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/partitions-schedules-sensors/partitions">
Partitions
</a>
</li>
<li>
<a href="/concepts/partitions-schedules-sensors/backfills">
Backfills
</a>
</li>
<li>
<a href="/concepts/automation/declarative-automation">
Declarative Automation
</a>
</li>
</ul>
</td>
<td>
An asset can{" "}
<a
href="https://github.com/dagster-io/dagster/discussions/11829"
target="new"
>
depend on earlier partitions of itself
</a>
. When this is the case, <a href="/concepts/partitions-schedules-sensors/backfills">
backfills
</a> and <a href="/concepts/automation/declarative-automation">
Declarative Automation
</a> will only materialize later partitions after earlier partitions have
completed.
</td>
</tr>
<tr>
<td>Executors</td>
<td>
<a href="/deployment/executors">Executors</a>
</td>
<td></td>
</tr>
<tr>
<td>Hooks</td>
<td>
<a href="/concepts/resources">Resources</a>
</td>
<td>
Dagster <a href="/concepts/resources">resource</a> contain a superset of
the functionality of hooks and have much stronger composition
guarantees.
Comment on lines +522 to +524
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammatical fix needed: "Dagster resource" should be "Dagster resources" since it's referring to the plural concept.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

</td>
</tr>
<tr>
<td>Instances</td>
<td>
<a href="/deployment/dagster-instance">Instances</a>
</td>
<td></td>
</tr>
<tr>
<td>Operators</td>
<td>None</td>
<td>
Dagster uses normal Python functions instead of framework-specific
operator classes. For off-the-shelf functionality with third-party
tools, Dagster provides{" "}
<a href="/integrations">integration libraries</a>.
</td>
</tr>
<tr>
<td>Pools</td>
<td>
<a href="/deployment/run-coordinator">Run coordinators</a>
</td>
<td></td>
</tr>
<tr>
<td>Plugins/Providers</td>
<td>
<a href="/integrations">Integrations</a>
</td>
<td></td>
</tr>
<tr>
<td>Schedulers</td>
<td>
<a href="/concepts/automation/schedules">Schedules</a>
</td>
<td></td>
</tr>
<tr>
<td>Sensors</td>
<td>
<a href="/concepts/partitions-schedules-sensors/sensors">Sensors</a>
</td>
<td></td>
</tr>
<tr>
<td>SubDAGs/TaskGroups</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/ops-jobs-graphs/graphs">Graphs</a>
</li>
<li>
<a href="/concepts/metadata-tags/tags">Tags</a>
</li>
<li>
<a href="/concepts/assets/software-defined-assets#grouping-assets">
Asset groups
</a>
</li>
</ul>
</td>
<td>
Dagster provides rich, searchable{" "}
<a href="/concepts/metadata-tags">
metadata and <a href="/concepts/metadata-tags/tags">tagging</a>
</a>{" "}
support beyond what’s offered by Airflow.
</td>
Comment on lines +590 to +595
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested <a> tags will cause rendering issues in the browser. Consider replacing the text with:

Dagster provides rich, searchable metadata and <a href="/concepts/metadata-tags/tags">tagging</a> support beyond what's offered by Airflow.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

</tr>
<tr>
<td>
<code>task_concurrency</code>
</td>
<td>
<a href="/guides/limiting-concurrency-in-data-pipelines#limiting-opasset-concurrency-across-runs">
Asset/op-level concurrency limits
</a>
</td>
<td></td>
</tr>
<tr>
<td>Trigger</td>
<td>
<a href="/concepts/webserver/ui">Dagster UI Launchpad</a>
</td>
<td>
Triggering and configuring ad-hoc runs is easier in Dagster which allows
them to be initiated through the{" "}
<a href="/concepts/webserver/ui">Dagster UI</a>, the{" "}
<a href="/concepts/webserver/graphql">GraphQL API</a>, or the CLI.
</td>
</tr>
<tr>
<td>XComs</td>
<td>
<a href="/concepts/io-management/io-managers">I/O managers</a>
</td>
<td>
I/O managers are more powerful than XComs and allow the passing large
datasets between jobs.
Comment on lines +626 to +627
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor grammatical fix needed: "allow the passing large datasets" should be "allow the passing of large datasets"

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

</td>
</tr>
</tbody>
</table>