Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3142] [Feature] Make the current git branch (if any) available in the dbt Jinja context #8690

Open
3 tasks done
b-per opened this issue Sep 22, 2023 · 7 comments · May be fixed by #8693
Open
3 tasks done

[CT-3142] [Feature] Make the current git branch (if any) available in the dbt Jinja context #8690

b-per opened this issue Sep 22, 2023 · 7 comments · May be fixed by #8693
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@b-per
Copy link
Contributor

b-per commented Sep 22, 2023

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

I would like to be able to access the current git branch name from the dbt context in order to be able to run some Jinja code depending on its value.

If we had current_branch available as a Jinja variable, we could potentially generate models in schemas/database (by adding some logic in generate_schema_name()) that depend on this branch name.

This could be useful for longer living branch where multiple developers will work on the same feature.

We would need to handle the case where dbt is run on code that is not a git repo/branch and maybe return an empty value then.

Describe alternatives you've considered

People could potentially define a var in dbt_project.yml to hard code what branch they are on. This variable would then have the same value as the current branch but the drawbacks are that:

  • people need to remember to set this variable every time
  • there will often be merge conflicts to solve on the line where the var is defined as each branch would define its own value

Who will this benefit?

  • people who work on multiple Feature Branches at the same time
  • teams where multiple people work on the same feature branches
  • people having long-living feature branches for specific purpose

Are you interested in contributing this feature?

Yes!

Anything else?

No response

@b-per b-per added enhancement New feature or request triage labels Sep 22, 2023
@github-actions github-actions bot changed the title [Feature] Make the current git branch (if any) available in the dbt Jinja context [CT-3142] [Feature] Make the current git branch (if any) available in the dbt Jinja context Sep 22, 2023
@jtcohen6
Copy link
Contributor

Thanks @b-per! There's been previous discussion about this:

The most recent comment (from February) has the exact same request & use case, and I think I buy it. When you have multiple developers working on a single "feature" over the course of a few weeks, you may want them to:

  • work in a common git branch & dev schema (dev_feature_xyz)
  • add the git branch to their personalized target.schema, so it doesn't clobber their concurrent work on other features (e.g. dev_jerco_feature_xyz)

What are the risks?

dbt <> git interaction. dbt doesn't install git as a Python package dependency; it just uses the git available in the OS, and shells out to it inside a subprocess. This can get pretty gross. Right now, all git interactions are limited to dbt deps, and quite unrelated to all other dbt functionality. But if we were to start running git commands as part of resolving Jinja context methods... There is pygit2, which might be a better / lighter-weight way to do something as simple as "tell me the current branch name"?

Partial parsing. I think this could have some wacky interactions with partial parsing. If you change your git branch, and you use the git_branch variable in your custom generate_schema_name macro (which is resolved at parse time) — in order to re-resolve all those schema configs, either:

  • you need to trigger a full re-parse yourself (--no-partial-parse)
  • dbt detects that your generate_schema_macro depends on this variable, that the variable has been modified, and then triggers a full re-parse accordingly

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Sep 22, 2023
@b-per b-per linked a pull request Sep 22, 2023 that will close this issue
7 tasks
@b-per
Copy link
Contributor Author

b-per commented Sep 25, 2023

I am a bit stuck on the partial parsing side of things but the rest (feature and testing) should be OK.

I have also added a git_sha variable for the latest sha based on git log. I think that it could be useful in query_comment etc...

Thinking about it now, would we want to add this info in run_results.json and/or manifest.json as well? Would we want to have the branch information in the Metadata API for example?

@aranke
Copy link
Member

aranke commented Sep 26, 2023

I'm commenting here since I was tagged on the PR.
First off, thanks for opening this discussion and a corresponding PR, @b-per!

The big question I have from reading this issue is:

Why is saving the Git branch in an env_var (similar to https://stackoverflow.com/a/10915331) not a viable alternative?

A few other considerations from an engineer's perspective (more for @jtcohen6):

  1. Do we want to add another dependency to dbt-core and be responsible for vendoring and distributing it?
    • Especially if we only use this dependency in one place?
  2. What's the impact on performance, especially in giant Git projects?
  3. Philosophically, do we want to align dbt-core and git closer together, or do we want to keep them more independent?

And one more thing: can we experiment with using Dulwich in the PR? From the homepage:

Dulwich is a Python implementation of the Git file formats and protocols, which does not depend on Git itself.

All functionality is available in pure Python. Optional C extensions can be built for improved performance.

This might be a way to mitigate the dbt <> git interaction called out above, but will probably be much slower (and maybe that's ok?).

I'm excited to see where this discussion goes and what solution we come up with!

@b-per
Copy link
Contributor Author

b-per commented Sep 26, 2023

My personal takes:

  • making it available in an env_var
    • if this was handled by dbt Cloud, I think that this would be the way to go
    • in dbt-core I feel like the default behaviour to make data available is the Jinja context and not env_vars
  • getting git info from Python
    • pygit2 seems to have a very lightweight requirements list on its own so I don't think that this would create a dependency mess
    • my guess is also that a library based on C bindings will be faster than a Python one but I have no data to back it up

@KarolinaGojny
Copy link

Hi There,
I'm looking for the exact feature described here.
The goal is to generate models in schemas created based on current git branch name, without setting the variable manually after switching to another branch.
@b-per Did you guys figure out anything in this topic? I would be grateful for any clue or info about status of this enhancement

@mahiki
Copy link

mahiki commented Sep 1, 2024

Hi I'm here from searching for "dbt run set target to current git branch".

Looks like there isn't a native way to do this interactively. In github actions CICD the pull step is going to select a branch and that will be accessible as a variable for dbt commands, so yeah my use-case is also REPL development workflow.

Here's how I solve it with the just.systems taskrunner:

I have a just environment that computes the current git branch name, which also match the names of my targets.

# ./justfile
set positional-arguments := true
git_branch := `git symbolic-ref --short HEAD`

# select dev branch if not on main or stable (ie feature branches, etc)
branch := if (git_branch) == "main" { "main" } else if (git_branch) == "stable" { "stable" } else { "dev" }
current_git_commit := `git rev-parse HEAD | cut -c 1-8`

# just --list
_default:
  @just --list --unsorted

# dbt run --target <git branch name is inserted here> <the rest of your command>
run *args:
  dbt run  --target {{branch}} "$@"

The just commands insert the target flag and branch name ahead of my dbt run commands.

# currently 'dev' branch
just run
    # dbt run --target dev "$@"

@vbgcwood
Copy link

vbgcwood commented Jan 10, 2025

@mahiki I considered using CI/CD, but there's a gap left in the workflow with that route. Developers / Analytic Engineers, they might be running their dbt project locally prior to a commit--to make sure it all runs at the very least. When they do so, the idea is that a schema is automatically created based on the git branch (e.g., curated_sales_{git-branch} -> curated_sales_feature_xyz123). This should ideally allow the engineers to work on several features independently without clobbering each other.

Asking for my own curiosity; do you only deploy to the development branch of your analytical infrastructure via CI/CD, or can your engineers write directly to it?

Edit: Maybe one approach that gets both benefits: Engineers get more static "user driven branches" in the dev branch of the analytical infrastructure (e.g., curated_sales_usr_{user-name}) which allows them to test/deploy things without needing to synchronize the schema with the feature name. Then, after git push origin my_new_feature, CI/CD pipeline can adjust the schema name to be "feature driven" (e.g., curated_sales_feat_{git-branch}). User driven branches can be private while feature driven branches can be public, allowing for collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants