The python scripts in this repository are intended to process the .csv files obtained from https://github.com/JeffSackmann/tennis_atp.git Of all the data available there, we just used the files with names atp_matches_2000.csv through atp_matches_2017.csv
The goal is generating a single .csv file with the ATP matches from 2000 to 2017, just for levels G, M, F, and A. We wanted to add the names of the cities and the code of the countries they belong to, that way we can use them on a map. The data processed with these files were used in a visualization project using Tableau (https://public.tableau.com/views/atp_2000_2017/Story1?:embed=y&:display_count=yes)
- atp_matches_union_updated.py (Takes the .csv files of interest and joins them, also making some fixing in the process. Take into account that it will merge all the .csv files contained in the target directory)
- generate_city_country.py (Uses the file generated by the above script and adds city and country info. This file is not still optimized for performance)
- atp_tourney_locations.csv (utility file, used by generate_city_country.py)
- atp_tour_tournaments_2000_2017.csv (dataset generated using the python scripts)
- FinalProjectTableau.pdf (where the story of how the Tableau presentation was assembled)