Final project of the course MATH-517 Statistical Computation and Visualization (EPFL, Fall 2023).
The goal of this project is to study the performance of inference using the Expectation Maximization algorithm and various imputation methods with different missing data mechanisms.
The report is available here.
The requirements are listed in the requirements.txt file. To install them, run the following command in the root directory of the project:
pip install -r requirements.txt
To run the notebooks, first move them in the source folder. Some deprecated notebooks, need the library.py file to run.
main-project-l-b-g/
│
├── docs/ # Report documents
│ ├── report.pdf # PDF of the final report
│ ├── report.html # HTML of the final report
│ ├── img/ # Images used in the report and not directly generated by the code
│ └── report_files/ # Ignore, files for quarto rendering
|
├── src/ # Source code directory
│ ├── mask_utils.py # Code from public repo to generate masks for missing data
│ ├── produce_NA.py # Code from public repo to generate missing values in a complete dataset
| ├── utils.py # Functions to create masks missing data (from public repository).
│ ├── data_generation.py # Code to generate both complete and incomplete data (Gaussian and Student-t)
│ ├── updated_impyute.py # Code from public repo
│ ├── imputation.py # Code to impute data and perform inference given missing data.
│ │ # Also computes MSEs given complete data
| ├── visualization.py.py # Functions to plot results
│ ├── stat_utils.py # General utility functions in statistics
│ |
| ├── data/ # Data directory
| │ └── winequality-white.csv # Real data where the underlying model is unknown
| │
| ├── notebooks/ # Jupyter notebooks used for observations and testing
| │ └── library.py # Old and deprecated library needed to run some of the notebooks.
| |
| ├── img/ # Folder with saved plots
| |
| └── misc/ # Ignore
|
├── requirements.txt # List of project required libraries
|
├── .gitignore # Specifies intentionally untracked files to ignore
|
├── Makefile # Makefile to render the report
|
└── README.md # Detailed description of the project
Using quarto is a one-liner (quarto render src --to html
or quarto render src --to pdf
), but the provided Makefile makes it even easier:
make html
make pdf
make # both pdf and html
The resulting webpage is in docs/index.html
, which can be used directly with Github Pages. The pdf is at docs/report.pdf
Last edited: 2024-01-07