Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating Causal Identification module #1166

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Conversation

cetagostini
Copy link
Contributor

@cetagostini cetagostini commented Nov 4, 2024

Description

Short description: Integration of CausalGraphModel in BaseMMM Class

This update integrates a CausalGraphModel into the BaseMMM class, allowing for automated causal identification based on backdoor criteria, assuming a given Directed Acyclic Graph (DAG).

Summary of Changes

  1. Added Causal Graph Option:

    • The BaseMMM class now accepts an optional dag parameter, which can be provided either as a string (DOT format) or a networkx.DiGraph.
    • If dag is provided, a CausalGraphModel is instantiated to analyze causal relationships and determine necessary adjustment sets.
  2. Automatic Minimal Adjustment Set Handling:

    • The BaseMMM initialization now includes logic to calculate the minimal adjustment set required to estimate the causal effect of the treatment variables (assume to be media channels) on the outcome.
    • control_columns are automatically updated to include variables from the minimal adjustment set only.
    • If the variable yearly_seasonality is not in the minimal adjustment set, the yearly_seasonality parameter is set to None, effectively disabling it in the model.
  3. Warnings for Missing Adjustment Sets:

    • If a minimal adjustment set cannot be identified, a warning is issued, and not modifications are made during the initialization.

Code Example

Here's how to initialize BaseMMM with a DAG for causal inference:

dag_str = """
digraph {
    x1 -> y;
    x2 -> y;
    yearly_seasonality -> y;
    event_1 -> y;
    event_2 -> y;
}
"""

mmm = MMM(
    model_config=my_model_config,
    sampler_config=my_sampler_config,
    date_column="date_week",
    adstock=GeometricAdstock(l_max=8),
    saturation=LogisticSaturation(),
    channel_columns=["x1", "x2"],
    control_columns=["event_1", "event_2"],
    yearly_seasonality=2,  # Disabled if 'yearly_seasonality' is not in minimal adjustment set
    dag=dag_str,
    outcome_column="y",
)

Related Issue

  • Closes #
  • Related to #

Checklist

Modules affected

  • MMM
  • CLV

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):

📚 Documentation preview 📚: https://pymc-marketing--1166.org.readthedocs.build/en/1166/

@github-actions github-actions bot added the MMM label Nov 4, 2024
@cetagostini cetagostini requested review from wd60622 and juanitorduz and removed request for wd60622 November 4, 2024 23:35
@wd60622
Copy link
Contributor

wd60622 commented Nov 4, 2024

What is z in the 2nd body example? Would that be in the model?

@wd60622 wd60622 added causal inference enhancement New feature or request labels Nov 4, 2024
@cetagostini
Copy link
Contributor Author

What is z in the 2nd body example? Would that be in the model?

Old example, I did the correction!

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added the docs Improvements or additions to documentation label Nov 13, 2024
Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 89.09091% with 6 lines in your changes missing coverage. Please review.

Project coverage is 95.27%. Comparing base (39028fe) to head (fb9993e).

Files with missing lines Patch % Lines
pymc_marketing/mmm/causal.py 84.61% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1166      +/-   ##
==========================================
- Coverage   95.34%   95.27%   -0.07%     
==========================================
  Files          47       48       +1     
  Lines        4963     5018      +55     
==========================================
+ Hits         4732     4781      +49     
- Misses        231      237       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@cetagostini cetagostini marked this pull request as draft November 15, 2024 16:44
@github-actions github-actions bot added the tests label Nov 16, 2024
@cetagostini cetagostini requested a review from wd60622 November 16, 2024 22:22
@cetagostini cetagostini marked this pull request as ready for review November 16, 2024 22:22
@cetagostini
Copy link
Contributor Author

Something blocking this merge? @juanitorduz @wd60622

Copy link
Contributor

@wd60622 wd60622 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need some mmm tests as well

pymc_marketing/model_builder.py Outdated Show resolved Hide resolved
@cetagostini
Copy link
Contributor Author

cetagostini commented Dec 16, 2024

Current error its not a code issue, its on the docs/readthedocs.org:pymc-marketing:

writing output... [ 75%] api/generated/pymc_marketing.mmm.mmm.BaseMMM.get_errors
Command killed due to timeout or excessive memory consumption

@juanitorduz
Copy link
Collaborator

Something blocking this merge? @juanitorduz @wd60622

I will take another look next week 🙏 . In the meantime I agree we need to add more tests :)

@cetagostini
Copy link
Contributor Author

@juanitorduz @wd60622

Hey guys I added a few test to the causal model initialization in the MMM class. Everything should be to merge, main behavior its on the test_causal.py and the initialization its under test_mmm.py

@cetagostini cetagostini requested a review from wd60622 December 24, 2024 17:32
@wd60622 wd60622 added this to the 0.11.0 milestone Dec 29, 2024
@juanitorduz
Copy link
Collaborator

I will take a detailed review this week now that we finally pushed the awaiting customer choice model 🙏

Copy link
Collaborator

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @carlosagostini ! Sorry for the late review #shameonme

I left some comments. One important observation is the missing opportunities on variance reduction with non-minima sets (please see comment below). There are other small comments and suggested changes regarding some minor code modularization.

treatment : list[str]
A list of treatment variable names.
outcome : str
The outcome variable name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add https://github.com/py-why/dowhy a a Reference in the class description?

)

def get_unique_adjustment_nodes(self) -> list[str]:
"""Compute the minimal adjustment set required for backdoor adjustment across all treatments.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand more on the meaning of the minimal set (think about new comers)? Can you also also add references? I suggest

Causal Inference in Statistics
A Primer
By Judea Pearl, Madelyn Glymour, Nicholas P. Jewell · 2016

Comment on lines +40 to +41
Provides methods to analyze causal relationships and determine the minimal adjustment set
for backdoor adjustment between treatment and outcome variables.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes, external regressors are not in the minimal set but help decreasing variance; see https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html#good-controls

Concretely:

Anytime we have a control that is a good predictor of the outcome, even if it is not a confounder, adding it to our model is a good idea.

So I am hesitant to remove, for example seasonality, if it is not in the minimal set. WDYT?

Comment on lines +131 to +139
if unique_controls:
warnings.warn(
f"Columns {unique_controls} are not in the adjustment set. Controls are being modified.",
stacklevel=2,
)

control_columns = list(common_controls - set(channel_columns))

self.minimal_adjustment_set = control_columns + list(channel_columns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hesitant on this step because my comment on variance reduction above. Maybe we can have an additional parameter, something like minimal or maximal set . WDYD?

Comment on lines +206 to +229
# Initialize causal graph if provided
if self.dag is not None and self.outcome_node is not None:
if self.treatment_nodes is None:
self.treatment_nodes = self.channel_columns
warnings.warn(
"No treatment nodes provided, using channel columns as treatment nodes.",
stacklevel=2,
)
self.causal_graphical_model = CausalGraphModel.build_graphical_model(
graph=self.dag,
treatment=self.treatment_nodes,
outcome=self.outcome_node,
)

self.control_columns = self.causal_graphical_model.compute_adjustment_sets(
control_columns=self.control_columns,
channel_columns=self.channel_columns,
)

if "yearly_seasonality" not in self.causal_graphical_model.adjustment_set:
warnings.warn(
"Yearly seasonality excluded as it's not required for adjustment.",
stacklevel=2,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be split into at least one or two functions and just called at initialization (+ unit test for each function)

Copy link
Collaborator

Add subtitle like: business problem


View entire conversation on ReviewNB

Copy link
Collaborator

Shall we remove the first data points which are generated by the natural fact that we can not adstock much for the initial point ?


View entire conversation on ReviewNB

Copy link
Collaborator

+1


View entire conversation on ReviewNB

Copy link

review-notebook-app bot commented Jan 2, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-01-02T19:39:43Z
----------------------------------------------------------------

Observe that the "over control" can reduce variance on the estimation, see https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html#good-controls

Anytime we have a control that is a good predictor of the outcome, even if it is not a confounder, adding it to our model is a good idea. It helps lowering the variance of our treatment effect estimates. 

Copy link

review-notebook-app bot commented Jan 2, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-01-02T19:39:44Z
----------------------------------------------------------------

Line #11.    sns.lineplot(x="date_week", y="competitor_offers", data=df, color="C3", ax=ax);

Add title


Copy link

review-notebook-app bot commented Jan 2, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-01-02T19:39:45Z
----------------------------------------------------------------

Latex error? $x_1$ or $x_{1t}$


@juanitorduz
Copy link
Collaborator

I think the previous comments on the notebooks have not been addressed yet ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
causal inference docs Improvements or additions to documentation enhancement New feature or request MMM tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants