Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added plugin for dbt CLI #328

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

guru-pochineni
Copy link

@guru-pochineni guru-pochineni commented Jun 29, 2023

Overview

Added support for using 1Password to supply credentials when connecting to Redshift and other sources via dbt CLI.

dbt Redshift prefers password-based authentication with passwords stored in plain text file in project root folder or in ~/.dbt/profiles.yml file. This plugin removes the need for storing DB passwords in plain text and injects them via env variables as and when required. This also works with other data sources in dbt that allow auth via password.

Type of change

  • Created a new plugin
  • Improved an existing plugin
  • Fixed a bug in an existing plugin
  • Improved contributor utilities or experience

Related Issue(s)

NA

Description

dbt is a popular data transformation tool. It allows you to connect to multiple data sources (data warehouses and data lakes) to run SQL queries for analytics. Since it can connect to multiple sources, connection setting varies for each source, and it is encapsulated in profiles.yml file.

We cannot use a single shell plugin to serve all sources as connection schema varies by source. This plugin attempts to provide a way to authenticate to AWS Redshift and other sources that support password-based authentication.

How To Test (Redshift)

Pre-requisites

  1. Redshift Cluster. (If you don't have redshift cluster, postgres db also works)
  2. Install dbt-core and dbt-redshift (dbt-postgres if using postgres) packages. Guide
  3. Create a new entry in your 1Password with credentials for connecting to Redshift. Fill host, port, database, user and password. Type can be 'postgres'.
  4. Create new project or clone existing dbt sample project
  5. In the projects.yml file add the content below.
config:
1pass_redshift:
  target: dev
  outputs:
    dev:
      type: redshift
      host: "{{ env_var('DBT_HOST') }}"
      port: "{{ env_var('DBT_PORT') | as_number }}"
      user: "{{ env_var('DBT_USER') }}"
      pass: "{{ env_var('DBT_PASSWORD') }}"
      dbname: "{{ env_var('DBT_DB') }}"
      schema: "some_schema_name"
  1. Run op init dbt and use the item created in step 3.
  2. You can now run dbt commands to connect to Redshift without having passwords exposed in the profiles.yml file.

Run log


<user_name>@ test-folder % dbt run --select model_1 model_2 --profile 1pass_redshift --target dev
###################################################################################################
# WARNING: 'dbt' is not from the official registry.                                       #
# Only proceed if you are the developer of 'dbt'.                                         #
# Otherwise, delete the file at /Users/<user_name>/.config/op/plugins/local/dbt. #
###################################################################################################
13:51:41  Running with dbt=1.5.2
13:51:43  Found 2 models, 0 tests, 0 snapshots, 0 analyses, 572 macros, 0 operations, 0 seed files, 0 sources, 0 exposures, 0 metrics
13:51:43   
13:52:33  Concurrency: 1 threads (target='dev')
13:52:33  
13:52:33  1 of 2 START sql table model some_schema_name.model_1 ........................................... [RUN]
13:52:42  1 of 2 OK created sql table model some_schema_name.model_1 ...................................... [SELECT in 8.65s]
13:52:42  2 of 2 START sql table model some_schema_name.model_2 ........................................................ [RUN]
13:52:58  2 of 2 OK created sql table model some_schema_name.model_2 ................................................... [SELECT in 16.78s] 
13:53:02  
13:53:02  Finished running 2 table models in 0 hours 1 minutes and 18.38 seconds (78.38s).
13:53:02  
13:53:02  Completed successfully
13:53:02  
13:53:02  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

Testing this plugin is an involved process as we need to have DB, some data in it along with dbt project setup to fully run it.

Supported Data Sources via dbt

  1. AWS Redshift
  2. Postgres
  3. Apache Spark (ODBC, HTTP connections)
  4. Databricks
  5. Snowflake (Username-password auth only)
  6. Starbust/Trino (LDAP)
  7. Azure Synapse and Fabric Synapse
  8. Dremio

We cover password based auth for all [data platforms supported by dbt except BigQuery via this extension. Naming may be little different for few sources, like in Databricks we configure token in password field.

References

  1. AWS workshop for dbt
  2. dbt-core GitHub repo
  3. dbt-redshift GitHub repo

PS: Sharing this plugin via GitHub for hackathon as I use this some form of this daily for work. Not endorsing dbt, AWS Redshift or my employer with these changes.

Additional information

Accompanied blog post: Running dbt cli with 1Password 🚀

@techcraver
Copy link
Contributor

Hello! Don't forget to also write an accompanying blog post on Hashnode with the tags 1Password and BuildWith1Password (not just putting #1Password in the text - but use the Hashnode tags in the CMS. :)

The full instructions are here.

@arunsathiya arunsathiya self-requested a review June 29, 2023 16:36
@arunsathiya arunsathiya added waiting-on-reviewer signals that a certain PR is waiting for a review from a 1Password team member hashnode hackathon Ideas and inspiration for the hackathon running from June 1st - June 30th labels Jun 29, 2023
@guru-pochineni guru-pochineni changed the title Added plugin for DBT Redshift Added plugin for DBT CLI Jun 29, 2023
@guru-pochineni
Copy link
Author

@techcraver & @arunsathiya Thank you for promptly assigning reviewers and correct tags. I have updated the plugin to expand from Redshift to almost all major data platforms for password auth by DBT by making few fields optional.

I use Redshift version of this for my daily workflow. Regardless of hackathon, I hope you are fine with this contribution going to prod?

I have only tested in Redshift, but verified via docs that it can expand to other sources as well.


Are there any plans to support dynamically choosing 1password item say based on arguments from CLI in shell plugin? I currently create multiple empty folders say for dev and prod connections to achieve this. In the dbt CLI, one can pass profile and target as args for selecting connection.

Combining this with allowing multiple connection templates for a single plugin will make the plugin usage very elegant. Happy to contribute PRs if something similar is on our roadmap.

@arunsathiya
Copy link
Contributor

I have updated the plugin to expand from Redshift to almost all major data platforms for password auth by DBT by making few fields optional.

Love it!

I use Redshift version of this for my daily workflow. Regardless of hackathon, I hope you are fine with this contribution going to prod?

Very much, thank you for this valuable contribution @guru-pochineni!

We'll review this work soon, but please be informed that given that the functionality of the plugin has increased, it may take some time to review it in detail.

Are there any plans to support dynamically choosing 1password item say based on arguments from CLI in shell plugin? I currently create multiple empty folders say for dev and prod connections to achieve this. In the dbt CLI, one can pass profile and target as args for selecting connection.

Could you share some examples here? What arguments are different between the platforms (Redshift and others)? And, how do you envision the 1Password item selection to happen -- at the time of command usage or something else?

At this time though, what's possible is op plugin clear dbt and the next run of dbt or op plugin init dbt will prompt for choosing the 1Password item again.

@techcraver
Copy link
Contributor

This is so cool @guru-pochineni ! I come from the data world and I know all my dbt using friends are going to LOOVE this.

@techcraver
Copy link
Contributor

hey @guru-pochineni - one note for your Overview section. The branding/capitalization on dbt. It's never DBT - always 'dbt'. :)

Copy link
Member

@hculea hculea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, for now 😄 Thank you for the contribution!

Comment on lines +48 to +54
var defaultEnvVarMapping = map[string]sdk.FieldName{
"DBT_HOST": fieldname.Host,
"DBT_PORT": fieldname.Port,
"DBT_USER": fieldname.User,
"DBT_PASSWORD": fieldname.Password,
"DBT_DB": fieldname.Database,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I gather from the documentation, these environment variables are only used to be referenced in a configuration file, is that correct?

With these provisioned, a config file such as the one documented here: https://docs.getdbt.com/reference/dbt-jinja-functions/profiles-yml-context

is still necessary for the plugin to work. Is there some documentation that I have missed?

Copy link
Member

@hculea hculea Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this is indeed what you are documenting in the PR description as well.

Rather than creating an additional requirement for the user to write their own config file, I would prefer if we were to provision a configuration file directly, using provision.TempFile, as in plugins such as mysql or akamai.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion. Users would still need to configure profiles.yml file as lot of other attributes can be tweaked in it. I will add changes to incorporate provision.TempFile and keep env variables as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for the clarification.

@libutcher do you know of any other plugins that have a similar behaviour? Asking you since you tested/documented their vast majority, I wonder if there is already any occurrence of a plugin that requires additional setup on top of the 1Password CLI-related stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up here, it doesn't look like there are any plugins with that behaviour. @guru-pochineni, are you still planning to add changes to incorporate provision.TempFile?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't got the chance to make changes. Let me take a look into this next weekend.

@guru-pochineni guru-pochineni changed the title Added plugin for DBT CLI Added plugin for dbt CLI Jun 30, 2023
@hculea hculea requested review from hculea and removed request for arunsathiya July 5, 2023 07:33
@accraw accraw added in-progress this PR is being worked on/comments are in the process of being addressed by the contributor and removed waiting-on-reviewer signals that a certain PR is waiting for a review from a 1Password team member labels Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashnode hackathon Ideas and inspiration for the hackathon running from June 1st - June 30th in-progress this PR is being worked on/comments are in the process of being addressed by the contributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants