image source
⏰ Time's short?
👉 Click here to check my presentation, hosted by GitHub Pages ♡
Shark Attacks – data cleaning and manipulation with Pandas – is my first project at Ironhack's Data Analytics Bootcamp (2021). The given dataset was extremly messy and dirty, so the main pythonic challange here was to have it clean and usable. But before starting to transform beast into beauty I was also challanged to develop a story based on a business question to answer. So – Where in Australia to build a 'shark-free' family resort? – sounds familiar to my background in the construction industry and is also a huge coding job: lots to clean, search for supporting datasets and, of course, fun! 👨🏻💻
▫️ Use storytelling with data to answer a 'business' question.
▫️ Apply different cleaning and manipulation techniques to make a messy dataset usable.
▫️ Shark-free Hotels & Resorts is a 'worldwide to be' hotel chain, since it's missing a branch
in Australia.
▫️ To date, all of its other houses are built in "safe" beaches – with no sight of sharks.
▫️ Main clientèle – all kinds of families, with/without kids.
▫️ Cleaned columns include:
`year`
`type (provoked/unprovoked)`
`fatal (y/n)`
`area`
`location`
`sex`
`age`
▫️ Developed a cleaning strategy for column `location` to get coordinates, applying `GeoPy`.
▫️ To support the analysis, 5 extra datasets were used:
1. Hotels in Australia, key findings:
- Top 3 States by number of hotels are New South Wales, Queensland and Victoria.
- Accomodation rate mean for Australia around 65% and almost all states follow the mean.
2. Short-term visitors in Australia, key findings:
- Australia celebrates an incredible growth in short-term visitors over the last 40 years.
- Over 200% rise from 1990 to 1997 and almost 170% rise from 2010 to 2018.
- Turist growth in relation to total shark attacks, refer to the Jupyter notebook file for
more details.
3. Australia cities database.
4. Top 20 beaches in Australia (self-made dataset).
5. List of beaches in Australia (self-made dataset).
▫️ Based on 230 years of available data, Australia is the second country in the world with most shark attacks (1338); behind USA (2229) and before Mexico (579).
▫️ Top 3 countries come up with 65% of all incidents.
▫️ 22% end up deadly.
▫️ Almost 90% of attacked individuals are male.
image source
▫️ In 85% of all fatalities, it was possible to locate the coordinates.
▫️ Number of tourists exploded in the last 40 years: over 200% rise from 1990 to 1997 and almost 170% from 2010 to 2018.
▫️ So did shark attacks, top 10 years included: 2016, 2015, 2014, 2009, 2012, 2017 and 2018.
▫️ For a "small" State, Victoria has almost 20% of total Hotels in Australia. And the best room occupancy rate among them all, over 70%.
▫️ Among the top 3 States, only in Victoria there haven't been any deadly shark attacks in the past 40 years.
▫️ Lake Tyers Beach is a top 20 beach in Australia!
image source
▫️ Despite a full coast with shark attacks, there is a `shark free` area in Victoria State.
▫️ It's named `Lake Tyers Beach` and is also a top 20 beach in Australia! Ranked #16.
▫️ Therefore a `safe` place for `Shark-free Hotels & Resorts` to start hosting in Australia.
- Cleaned final dataset (./assets):
shark_au_df.csv
- Data analysis in a Jupyter notebook:
project_01_shark_attack.ipynb
- Formal presentation – Storytelling with data –, done in
HTML5
CSS3
JavaScript
: check it here
-
Given dataset
- Global Shark Attacks: @kaggle.com.
-
Extra datasets used
- Number of movements Short-term Visitors arriving in Australia: Australian Bureau of Statistics.
- Information on the supply of, and demand for, tourist accommodation facilities in Australia: Australian Bureau of Statistics.
- Australia cities database: @kaggle.com.
-
Created datasets based on
- Python @ Jupyter Notebook
- Pandas / Numpy
- Geopy / Nominatium (Python client for geocoding)
- Viz: seabron / plotly