- This project is a result of my final thesis at the Data Science. Institute by Fabian Rappert.
- Scraped almost 300000 places and over 2.7 million comments from park4night using python.
- Created a mySQL database.
- Created a Streamlit App for searching and filtering places.
- Made a Data Analyses with Tableau on the scraped data.
Python Version: 3.11
Packages: requests, BeautifulSoup, pandas
Every place has a unique place ID and with that an own web-address accessible with the base-url: https://park4night.com/en/place/ followed by the place ID. For example: https://park4night.com/en/place/88726 The following pictures shows where to find the scraped data.
The webscraping uses requests and beautifulsoup. To save the scraped data, parquet will be used. You can select how many pages you want to scrape with the variable 'pages_to_scrape'. By default the main program starts at the last scraped place ID if there already exists an 'data_base.parquet' file. If not a new pandas DataFrame will created.
The DataFrame looks like this:
After scraping the data, I needed to clean it up so that I could create a mySQL database. Therefore I created a Jupyter Notebook.
...to be continued