-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bugfix/hubspot-duplicates #5
Conversation
@@ -45,14 +45,15 @@ engagement_emails as ( | |||
engagement_email.email_to_email, | |||
engagement_email.email_cc_email, | |||
engagement_email.email_from_email as commenter_email, | |||
contacts.contact_name as commenter_name | |||
{{ fivetran_utils.string_agg(field_to_agg="contacts.contact_name", delimiter="','") }} as commenter_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a few cases where there are multiple contacts associated with an engagement email. This resulted in fannout. The stringagg will ensure all parties are included in the resulting data, but also not cause any fannout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
engagement_details as ( | ||
select | ||
deal_id, | ||
deal_name, | ||
url_reference, | ||
created_on, | ||
source_relation, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct engagement_type", delimiter="', '") }} as engagement_type, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct contact_name", delimiter="', '") }} as contact_name, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct created_by", delimiter="', '") }} as created_by, | ||
{{ fivetran_utils.string_agg(field_to_agg="distinct company_name", delimiter="', '") }} as company_name | ||
from engagement_detail_prep | ||
group by 1,2,3,4,5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous model. The joins in the prev cte are not 1:1. So we need to do some creative aggregating to make sure we retain all the necessary information, but to not cause any fannouts.
@@ -1,8 +1,8 @@ | |||
{{ | |||
config( | |||
materialized='table' if unified_rag.is_databricks_sql_warehouse() else 'incremental', | |||
partition_by = {'field': 'most_recent_chunk_update', 'data_type': 'date', 'granularity': 'month'} | |||
if target.type not in ['spark', 'databricks'] else ['most_recent_chunk_update'], | |||
partition_by = {'field': 'update_date', 'data_type': 'date'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran into some issues with the incremental logic on BQ. These changes helped address those issues although it did require adding a new field (which has been documented and docs regen'd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fivetran-joemarkiewicz good catches, lgtm
PR to address some duplicate HubSpot records uncovered when testing the model on a new schema. Additionally, added unique and not null tests to the final unified rag output model.