House Price Analysis in Madrid

This project analyzes house prices in Madrid, Spain using Python and several machine learning libraries. The project assumes a basic understanding of data analysis and machine learning concepts, and requires the following steps to install and use:

Installation

Create a Python environment using your preferred method (e.g. conda, virtualenv, etc.).
Activate the environment and navigate to the project directory.
Install the required packages using pip and the requirements.txt file:

pip install -r requirements.txt

Install the utils module by running the following command from the project directory:

pip install -e src/

Start a JupyterLab server by running the following command:

jupyter lab

Alternatively, you can use the ipykernel package to select the kernel directly from the environment inside VSCode.

Usage

Navigate to the notebooks directory and open the desired notebook.
Execute the cells in the notebook to preprocess the data, perform exploratory data analysis, and build and evaluate machine learning models.
The data is stored in the data directory, which contains four subfolders:
- raw: contains the raw training and testing data in CSV format.
- processed: contains the processed data in CSV format.
- models: contains the trained machine learning models as pickle files, along with performance metrics as JSON files.
- submission: contains the submission files in CSV format.
The src directory contains a Python module with the necessary sklearn transformers for ETL and utility functions.
The notebooks directory contains the notebooks to execute to verify all the steps followed for the analysis of the houses in Madrid.

Directory Structure

house_price_analysis/
├── data/
│   ├── raw/
│   │   ├── train.csv
│   │   └── predict.csv
│   ├── processed/
│   │   ├── train.csv
│   │   └── test.csv
│   ├── models/
│   │   ├── model_1.pkl
│   ├── metrics/
│   │   └── model_1.json
│   └── submission/
│       ├── submission_1.csv
│       └── submission_2.csv
├── src/
│   ├── utils/
│   │   ├── transformers.py
│   │   ├── paths.py
│   │   ├── functions.py
│   │   └── __init__.py
│   ├── pyproject.toml
│   ├── setup.cfg
│   └── setup.py
└── notebooks/
    ├── 01_EDA.ipynb
    └── 02_Modeling.ipynb

This directory structure shows the organization of the project. The data directory contains the raw and processed data, as well as the models and submission files. The src directory contains the Python module with the necessary transformer and utility functions. The notebooks directory contains the notebooks to execute to verify all the steps followed for the analysis of the houses in Madrid.

Data

The data used for this project is from the Kaggle competition "Machine Learning Avanzado I - Hands-on". The data is split into two files: train.csv and predict.csv. The train.csv file contains the training data, which includes the target variable buy_price_by_area. The predict.csv file contains the submission data, which does not include the target variable. The goal of the project is to predict the buy_price_by_area of the houses in the predict.csv file.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

House Price Analysis in Madrid

Installation

Usage

Directory Structure

Data

About

Releases

Packages

Languages

License

ericmg97/madrid_houses_analysis

Folders and files

Latest commit

History

Repository files navigation

House Price Analysis in Madrid

Installation

Usage

Directory Structure

Data

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages