Causal-Inference

Causal inference can be defined as the process by which causes are inferred from the data. In this project, data from breast cancer diagnosis is analyzed and causes inferred from this analysis.

Steps followed

Perform a causal inference task using Pearl’s framework;
Infer the causal graph from observational data and then validate the graph
Merge machine learning with causal inference;

Data and Features

The data is extracted from Kaggle.. Features in the data are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.

Attribute Information:

ID number
Diagnosis (M = malignant, B = benign)

The remaining (3-32)

Ten real-valued features are computed for each cell nucleus:
radius (mean of distances from center to points on the perimeter)
texture (standard deviation of gray-scale values)
Perimeter
Area
smoothness (local variation in radius lengths)
compactness (perimeter^2 / area - 1.0)
concavity (severity of concave portions of the contour)
concave points (number of concave portions of the contour)
Symmetry

The mean, standard error and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features

Data Exploration

Conducted an exploratory data analysis on the data & communicated useful insights. This includes: identification and treating all missing values and outliers in the dataset by using appropriate methods,performing feature extraction and scaling This is found on the notebooks folder

Causal Learning

Split data into training and hold-out set
Create a causal graph using all training data and get the insights (this will be considered the ground truth)
Create new causal graphs using increasing fractions of the data and compare with the ground truth graph
The comparison can be done with a Jaccard Similarity Index, measuring the intersection and union of the graph edges
After reaching a stable causal graph, select only variables that point directly to the target variable
Train one model using all variables and another using only the variables selected by the graph
Measure how much each of the models overfit the hold-out set created in step

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.dvc		.dvc
.github		.github
Data		Data
notebooks		notebooks
scripts		scripts
.dvcignore		.dvcignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal-Inference

Steps followed

Data and Features

Attribute Information:

The remaining (3-32)

Data Exploration

Causal Learning

About

Releases

Packages

Languages

Doro97/Breast-Cancer-Diagnosis

Folders and files

Latest commit

History

Repository files navigation

Causal-Inference

Steps followed

Data and Features

Attribute Information:

The remaining (3-32)

Data Exploration

Causal Learning

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages