Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/renamed columns #46

Closed
wants to merge 7 commits into from
Closed

Bug/renamed columns #46

wants to merge 7 commits into from

Conversation

fivetran-catfritz
Copy link
Contributor

PR Overview

This PR will address the following Issue/Feature:

This PR will result in the following new package version:

Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:

PR Checklist

Basic Validation

Please acknowledge that you have successfully performed the following commands locally:

  • dbt run –full-refresh && dbt test
  • dbt run (if incremental models are present) && dbt test

Before marking this PR as "ready for review" the following have been applied:

  • The appropriate issue has been linked, tagged, and properly assigned
  • All necessary documentation and version upgrades have been applied
  • docs were regenerated (unless this PR does not include any code or yml updates)
  • BuildKite integration tests are passing
  • Detailed validation steps have been provided below

Detailed Validation

Please share any and all of your validation steps:

If you had to summarize this PR in an emoji, which would it be?

💃

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-catfritz great work on this so far! I really like the creative solution you have applied here to seamlessly coalesce the fields. I was able to do a first pass and just have a few comments that jumped out at me when taking a look.

Let me know if you would like to sync on any of my comments in more detail. Thanks!

{% macro coalesce_w_renamed_col(original_column_name, datatype=dbt.type_string()) %}
{# This macro accomodates Fivetran connectors that keep the original salesforce field naming conventions without underscores #}

{%- set renamed_col = original_column_name.replace('_', '') -%}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we 100% sure this is always the behavior? Maybe we run this by the eng team members in the Height ticket to double check and confirm this is a safe assumption.

Copy link
Contributor

@fivetran-joemarkiewicz fivetran-joemarkiewicz Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example can we be 100% certain that account_source will be the original name and the renamed will be accountsource? I want to be sure there are no exceptions.

It looks like from your individual source macro updates this does seem to be the case, but just double checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our data, all the examples of cols we use follow this pattern (I'll post the col list in height), which also is supported by what engineering shared. There are exceptions, but these cols tend to end with a double underscore likename__c, but we don't use any columns named like that.

That being said we were missing some columns in our data to be 100%, so it would make sense to be able to provide an alternate to the rule. I made the default behavior to to remove underscores for the renamed name, but now an exact name can be passed as an argument if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great solution! This way if we need to make any adjustments in the future we have a seamless route to do so.

billing_state,
billing_state_code,
billing_street,
{{ coalesce_w_renamed_col('account_number') }},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very creative solution!

Comment on lines 6 to 8
coalesce(cast({{ renamed_col }} as {{ datatype }}),
cast({{ original_column_name }} as {{ datatype }}))
as {{ original_column_name }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on possibly allowing some wiggle room if we don't always want the original_column_name to be the final name of the field in the model? For example, what if in the future we have a field is_partner_marketing_action_active (this is a real field that could exist) and the original field name is IsPartnerMarketingActionActive. I don't know if we would exactly want that to be the final field name. Maybe we would want to have the final name be is_pm_action_active.

I know this is not a scenario right now, but maybe something worth future proofing just in case. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a good idea and shouldn't be difficult to implement!

{%- set renamed_col = original_column_name.replace('_', '') -%}

coalesce(cast({{ renamed_col }} as {{ datatype }}),
cast({{ original_column_name }} as {{ datatype }}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know Snowflake provides the ability to enable case-sensitivity (e.g. LastActivityDate). I wonder if this FF does not do the same case-insensitivity approach we see with the standard connector when syncing to Snowflake. I have a small feeling that these original field names may sync like this. Do you know if this could be a problem with the approach we are taking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll look into it since right now I'm not sure the best way to handle that scenario. In our example data, everything is lowercased, though I do see in the engineering doc that originally it would be camel case. Let me know if you have any suggestions, but otherwise I'll think on this!

Copy link
Contributor Author

@fivetran-catfritz fivetran-catfritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank. you for taking a look @fivetran-joemarkiewicz!

Comment on lines 6 to 8
coalesce(cast({{ renamed_col }} as {{ datatype }}),
cast({{ original_column_name }} as {{ datatype }}))
as {{ original_column_name }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is a good idea and shouldn't be difficult to implement!

{% macro coalesce_w_renamed_col(original_column_name, datatype=dbt.type_string()) %}
{# This macro accomodates Fivetran connectors that keep the original salesforce field naming conventions without underscores #}

{%- set renamed_col = original_column_name.replace('_', '') -%}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our data, all the examples of cols we use follow this pattern (I'll post the col list in height), which also is supported by what engineering shared. There are exceptions, but these cols tend to end with a double underscore likename__c, but we don't use any columns named like that.

That being said we were missing some columns in our data to be 100%, so it would make sense to be able to provide an alternate to the rule. I made the default behavior to to remove underscores for the renamed name, but now an exact name can be passed as an argument if necessary.

{%- set renamed_col = original_column_name.replace('_', '') -%}

coalesce(cast({{ renamed_col }} as {{ datatype }}),
cast({{ original_column_name }} as {{ datatype }}))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll look into it since right now I'm not sure the best way to handle that scenario. In our example data, everything is lowercased, though I do see in the engineering doc that originally it would be camel case. Let me know if you have any suggestions, but otherwise I'll think on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants