Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/hubspot-duplicates #5

Merged
merged 1 commit into from
Oct 22, 2024
Merged

Conversation

fivetran-joemarkiewicz
Copy link
Collaborator

PR to address some duplicate HubSpot records uncovered when testing the model on a new schema. Additionally, added unique and not null tests to the final unified rag output model.

@@ -45,14 +45,15 @@ engagement_emails as (
engagement_email.email_to_email,
engagement_email.email_cc_email,
engagement_email.email_from_email as commenter_email,
contacts.contact_name as commenter_name
{{ fivetran_utils.string_agg(field_to_agg="contacts.contact_name", delimiter="','") }} as commenter_name
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few cases where there are multiple contacts associated with an engagement email. This resulted in fannout. The stringagg will ensure all parties are included in the resulting data, but also not cause any fannout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Comment on lines +87 to +99
engagement_details as (
select
deal_id,
deal_name,
url_reference,
created_on,
source_relation,
{{ fivetran_utils.string_agg(field_to_agg="distinct engagement_type", delimiter="', '") }} as engagement_type,
{{ fivetran_utils.string_agg(field_to_agg="distinct contact_name", delimiter="', '") }} as contact_name,
{{ fivetran_utils.string_agg(field_to_agg="distinct created_by", delimiter="', '") }} as created_by,
{{ fivetran_utils.string_agg(field_to_agg="distinct company_name", delimiter="', '") }} as company_name
from engagement_detail_prep
group by 1,2,3,4,5
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous model. The joins in the prev cte are not 1:1. So we need to do some creative aggregating to make sure we retain all the necessary information, but to not cause any fannouts.

@@ -1,8 +1,8 @@
{{
config(
materialized='table' if unified_rag.is_databricks_sql_warehouse() else 'incremental',
partition_by = {'field': 'most_recent_chunk_update', 'data_type': 'date', 'granularity': 'month'}
if target.type not in ['spark', 'databricks'] else ['most_recent_chunk_update'],
partition_by = {'field': 'update_date', 'data_type': 'date'}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran into some issues with the incremental logic on BQ. These changes helped address those issues although it did require adding a new field (which has been documented and docs regen'd).

Copy link
Contributor

@fivetran-avinash fivetran-avinash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivetran-joemarkiewicz good catches, lgtm

@fivetran-joemarkiewicz fivetran-joemarkiewicz merged commit 956b0d6 into main Oct 22, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants