Skip to content

CarlosUziel/airbnb-spain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Aibnb locations

A look at Airbnb data in Spain

Table of Contents
  1. About The Project
  2. Getting Started
  3. Additional Notes
  4. License
  5. Contact
  6. Acknowledgments

About The Project

In this mini-project, I use the CRISP-DM process to answer several business questions about Airbnb locations and reservations across Spain using their publicly-available data. Get to know the main insights by reading my post on Medium.

Premise

We will take the role of a private investor that has decided to purchase a property in Spain for renting it out through Airbnb. After careful examination, we have selected 9 possible Spanish cities where it would be interesting to make such a purchase. Naturally, we want to maximize our return on investment (ROI), for which we need to understand the competition in each city as well as the main price drivers for each location.

After having a brief look at the available data, we have selected a few questions that will aid us in making our investment decisions:

  1. What is the average price of each location type per neighbourhood? What are the most expensive neighbourhoods on average?
  2. What is the average host acceptance rate per location type and neighborhood? In which neighbourhoods is it the lowest?
  3. How is the competition in each neighbourhood? What number and proportion of listings belong to hosts owning different numbers of locations? In which neighbourhoods is the concentration lower?
  4. What is the expected average profit per room type and neighborhood when looking at the reservations for the next 6 months? What is the neighbourhood expected to be the most profitable in that period?
  5. What listings' factors affect the expected profit for the next 6 months? Can we use them to forecast the expected profit over that period?

We will be comparing the answers to those questions among the different Spanish regions of Madrid, Barcelona, Girona, Valencia, Mallorca, Menorca, Sevilla, Málaga and Euskadi. Hopefully, this will help us in making a more informed investment decision.

(back to top)

Execution plan

In order answer our questions, we will follow the CRISP-DM process. Our list of questions is already the result of the first two steps (Business Understanding and Data Understanding). We will then prepare the data as necessary to obtain the answers to our questions. This part will include performing all sorts of pre-processing steps, such as data cleaning as well as dealing with missing values. For our final question, we will also be modelling the data and try to predict the number of reservations for each location.

All processing is done with the help of Python and its widely-used libraries such as pandas, numpy and scikit-learn.

(back to top)

Data

This project uses publicly-available Airbnb data for 9 Spanish regions (the September 2022 version of each region). For each region, we have two different datasets:

  • Listings: Contains all kinds of information regarding Airbnb listings, such as location, host it belongs to, type, etc. The complete data dictionary can be found in data/airbnb/listings_schema.csv.
  • Calendar: Contains reservations for all listings and the price at which they were reserved.

(back to top)


Getting Started

To make use of this project, I recommend managing the required dependencies with Anaconda.

Setting up a conda environment

Install miniconda:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Install mamba:

conda install -n base -c conda-forge mamba

Install environment using provided file:

mamba env create -f environment.yml # alternatively use environment_hist.yml if base system is not debian
mamba activate airbnb_spain

And finally, follow along the main notebook: notebooks/main.ipynb.

File descriptions

The project files are structured as follows:

  • data/airbnb: Where all data is located.
  • notebooks/main.ipynb: The Jupyter notebook that runs the complete project.
  • src: Contains the source code of helper functions used in the data wrangling and analysis.

(back to top)


Additional Notes

Source files formatted using the following commands:

isort .
autoflake -r --in-place --remove-unused-variable --remove-all-unused-imports --ignore-init-module-imports .
black .

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Carlos Uziel Pérez Malla

GitHub - Google Scholar - LinkedIn - Twitter

Acknowledgments

This project was done as part of the Data Science Nanodegree Program at Udacity.

(back to top)

Releases

No releases published

Packages

No packages published