Skip to content

Commit

Permalink
[dagster-airlift] Move cheatsheet to equivalents page
Browse files Browse the repository at this point in the history
  • Loading branch information
dpeng817 committed Nov 23, 2024
1 parent 0d3dc29 commit 3b64a79
Showing 1 changed file with 259 additions and 0 deletions.
259 changes: 259 additions & 0 deletions docs/content/integrations/airlift/airflow-dagster-equivalents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -370,3 +370,262 @@ Airflow Pools allow users to limit the number of concurrent tasks that can be ru
### Airflow Task Groups

Airflow task groups allow you to organize tasks into hierarchical groups within the Airflow UI for a particular DAG. Dagster has _global_ asset groups which can be applied to any asset. Learn more about [asset groups](/concepts/assets/software-defined-assets#assigning-assets-to-groups).

## Cheatsheet

Here's a cheatsheet for Airflow users migrating to Dagster:

<table
className="table"
style={{
width: "100%",
}}
>
<thead>
<tr>
<th
style={{
width: "25%",
}}
>
Airflow concept
</th>
<th
style={{
width: "30%",
}}
>
Dagster concept
</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>Directed Acyclic Graphs (DAG)</td>
<td>
<a href="/concepts/ops-jobs-graphs/jobs">Jobs</a>
</td>
<td></td>
</tr>
<tr>
<td>Task</td>
<td>
<a href="/concepts/ops-jobs-graphs/ops">Ops</a>
</td>
<td></td>
</tr>
<tr>
<td>Datasets</td>
<td>
<a href="/concepts/assets/software-defined-assets">Assets</a>
</td>
<td>
Dagster assets are more powerful and mature than datasets and include
support for things like{" "}
<a href="/concepts/partitions-schedules-sensors/partitioning-assets">
partitioning
</a>
.
</td>
</tr>
<tr>
<td>Connections/Variables</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/configuration/config-schema">
Run configuration
</a>
</li>
<li>
<a href="/concepts/configuration/configured">Configured API</a>{" "}
(Legacy)
</li>
<li>
<a href="/guides/dagster/using-environment-variables-and-secrets">
Environment variables
</a>{" "}
(Dagster+ only)
</li>
</ul>
</td>
<td></td>
</tr>
<tr>
<td>DagBags</td>
<td>
<a href="/concepts/code-locations">Code locations</a>
</td>
<td>
Multiple isolated code locations with different system and Python
dependencies can exist within the same Dagster instance.
</td>
</tr>
<tr>
<td>DAG runs</td>
<td>Job runs</td>
<td></td>
</tr>
<tr>
<td>
<code>depends_on_past</code>
</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/partitions-schedules-sensors/partitions">
Partitions
</a>
</li>
<li>
<a href="/concepts/partitions-schedules-sensors/backfills">
Backfills
</a>
</li>
<li>
<a href="/concepts/automation/declarative-automation">
Declarative Automation
</a>
</li>
</ul>
</td>
<td>
An asset can{" "}
<a
href="https://github.com/dagster-io/dagster/discussions/11829"
target="new"
>
depend on earlier partitions of itself
</a>
. When this is the case, <a href="/concepts/partitions-schedules-sensors/backfills">
backfills
</a> and <a href="/concepts/automation/declarative-automation">
Declarative Automation
</a> will only materialize later partitions after earlier partitions have
completed.
</td>
</tr>
<tr>
<td>Executors</td>
<td>
<a href="/deployment/executors">Executors</a>
</td>
<td></td>
</tr>
<tr>
<td>Hooks</td>
<td>
<a href="/concepts/resources">Resources</a>
</td>
<td>
Dagster <a href="/concepts/resources">resource</a> contain a superset of
the functionality of hooks and have much stronger composition
guarantees.
</td>
</tr>
<tr>
<td>Instances</td>
<td>
<a href="/deployment/dagster-instance">Instances</a>
</td>
<td></td>
</tr>
<tr>
<td>Operators</td>
<td>None</td>
<td>
Dagster uses normal Python functions instead of framework-specific
operator classes. For off-the-shelf functionality with third-party
tools, Dagster provides{" "}
<a href="/integrations">integration libraries</a>.
</td>
</tr>
<tr>
<td>Pools</td>
<td>
<a href="/deployment/run-coordinator">Run coordinators</a>
</td>
<td></td>
</tr>
<tr>
<td>Plugins/Providers</td>
<td>
<a href="/integrations">Integrations</a>
</td>
<td></td>
</tr>
<tr>
<td>Schedulers</td>
<td>
<a href="/concepts/automation/schedules">Schedules</a>
</td>
<td></td>
</tr>
<tr>
<td>Sensors</td>
<td>
<a href="/concepts/partitions-schedules-sensors/sensors">Sensors</a>
</td>
<td></td>
</tr>
<tr>
<td>SubDAGs/TaskGroups</td>
<td>
<ul style={{ marginTop: "0em" }}>
<li style={{ marginTop: "0em" }}>
<a href="/concepts/ops-jobs-graphs/graphs">Graphs</a>
</li>
<li>
<a href="/concepts/metadata-tags/tags">Tags</a>
</li>
<li>
<a href="/concepts/assets/software-defined-assets#grouping-assets">
Asset groups
</a>
</li>
</ul>
</td>
<td>
Dagster provides rich, searchable{" "}
<a href="/concepts/metadata-tags">
metadata and <a href="/concepts/metadata-tags/tags">tagging</a>
</a>{" "}
support beyond what’s offered by Airflow.
</td>
</tr>
<tr>
<td>
<code>task_concurrency</code>
</td>
<td>
<a href="/guides/limiting-concurrency-in-data-pipelines#limiting-opasset-concurrency-across-runs">
Asset/op-level concurrency limits
</a>
</td>
<td></td>
</tr>
<tr>
<td>Trigger</td>
<td>
<a href="/concepts/webserver/ui">Dagster UI Launchpad</a>
</td>
<td>
Triggering and configuring ad-hoc runs is easier in Dagster which allows
them to be initiated through the{" "}
<a href="/concepts/webserver/ui">Dagster UI</a>, the{" "}
<a href="/concepts/webserver/graphql">GraphQL API</a>, or the CLI.
</td>
</tr>
<tr>
<td>XComs</td>
<td>
<a href="/concepts/io-management/io-managers">I/O managers</a>
</td>
<td>
I/O managers are more powerful than XComs and allow the passing large
datasets between jobs.
</td>
</tr>
</tbody>
</table>

0 comments on commit 3b64a79

Please sign in to comment.