-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/renamed columns #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-catfritz great work on this so far! I really like the creative solution you have applied here to seamlessly coalesce the fields. I was able to do a first pass and just have a few comments that jumped out at me when taking a look.
Let me know if you would like to sync on any of my comments in more detail. Thanks!
macros/coalesce_w_renamed_col.sql
Outdated
{% macro coalesce_w_renamed_col(original_column_name, datatype=dbt.type_string()) %} | ||
{# This macro accomodates Fivetran connectors that keep the original salesforce field naming conventions without underscores #} | ||
|
||
{%- set renamed_col = original_column_name.replace('_', '') -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we 100% sure this is always the behavior? Maybe we run this by the eng team members in the Height ticket to double check and confirm this is a safe assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example can we be 100% certain that account_source
will be the original name and the renamed will be accountsource
? I want to be sure there are no exceptions.
It looks like from your individual source macro updates this does seem to be the case, but just double checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our data, all the examples of cols we use follow this pattern (I'll post the col list in height), which also is supported by what engineering shared. There are exceptions, but these cols tend to end with a double underscore likename__c
, but we don't use any columns named like that.
That being said we were missing some columns in our data to be 100%, so it would make sense to be able to provide an alternate to the rule. I made the default behavior to to remove underscores for the renamed name, but now an exact name can be passed as an argument if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great solution! This way if we need to make any adjustments in the future we have a seamless route to do so.
billing_state, | ||
billing_state_code, | ||
billing_street, | ||
{{ coalesce_w_renamed_col('account_number') }}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very creative solution!
macros/coalesce_w_renamed_col.sql
Outdated
coalesce(cast({{ renamed_col }} as {{ datatype }}), | ||
cast({{ original_column_name }} as {{ datatype }})) | ||
as {{ original_column_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on possibly allowing some wiggle room if we don't always want the original_column_name to be the final name of the field in the model? For example, what if in the future we have a field is_partner_marketing_action_active
(this is a real field that could exist) and the original field name is IsPartnerMarketingActionActive
. I don't know if we would exactly want that to be the final field name. Maybe we would want to have the final name be is_pm_action_active
.
I know this is not a scenario right now, but maybe something worth future proofing just in case. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a good idea and shouldn't be difficult to implement!
{%- set renamed_col = original_column_name.replace('_', '') -%} | ||
|
||
coalesce(cast({{ renamed_col }} as {{ datatype }}), | ||
cast({{ original_column_name }} as {{ datatype }})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know Snowflake provides the ability to enable case-sensitivity (e.g. LastActivityDate
). I wonder if this FF does not do the same case-insensitivity approach we see with the standard connector when syncing to Snowflake. I have a small feeling that these original field names may sync like this. Do you know if this could be a problem with the approach we are taking here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll look into it since right now I'm not sure the best way to handle that scenario. In our example data, everything is lowercased, though I do see in the engineering doc that originally it would be camel case. Let me know if you have any suggestions, but otherwise I'll think on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank. you for taking a look @fivetran-joemarkiewicz!
macros/coalesce_w_renamed_col.sql
Outdated
coalesce(cast({{ renamed_col }} as {{ datatype }}), | ||
cast({{ original_column_name }} as {{ datatype }})) | ||
as {{ original_column_name }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this is a good idea and shouldn't be difficult to implement!
macros/coalesce_w_renamed_col.sql
Outdated
{% macro coalesce_w_renamed_col(original_column_name, datatype=dbt.type_string()) %} | ||
{# This macro accomodates Fivetran connectors that keep the original salesforce field naming conventions without underscores #} | ||
|
||
{%- set renamed_col = original_column_name.replace('_', '') -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our data, all the examples of cols we use follow this pattern (I'll post the col list in height), which also is supported by what engineering shared. There are exceptions, but these cols tend to end with a double underscore likename__c
, but we don't use any columns named like that.
That being said we were missing some columns in our data to be 100%, so it would make sense to be able to provide an alternate to the rule. I made the default behavior to to remove underscores for the renamed name, but now an exact name can be passed as an argument if necessary.
{%- set renamed_col = original_column_name.replace('_', '') -%} | ||
|
||
coalesce(cast({{ renamed_col }} as {{ datatype }}), | ||
cast({{ original_column_name }} as {{ datatype }})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll look into it since right now I'm not sure the best way to handle that scenario. In our example data, everything is lowercased, though I do see in the engineering doc that originally it would be camel case. Let me know if you have any suggestions, but otherwise I'll think on this!
PR Overview
This PR will address the following Issue/Feature:
This PR will result in the following new package version:
Please provide the finalized CHANGELOG entry which details the relevant changes included in this PR:
PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please share any and all of your validation steps:
If you had to summarize this PR in an emoji, which would it be?
💃