Acquires unstructured data from BookMyShow clean it and feed it to MySQL server for remote access.
Scrapped data is structured mainly in 4 data tables:
CITY_Table : It contains information of all the cities compatible with BookMyShow.
- Columns : "city_id", "city_name"
MOVIE_Table : It contains information about movies that are screened on cities supported by BMS all over INDIA.
- Columns : "movie_name", "movie_id", "city_id", 'booking_links', "lang", "format_"
THEATRE_Table : It contains information(name, latitude, longitude, city) about all the theatres located in INDIA which supports booking through BMS.
- Columns : "theatre_ID", "theatre_name", "theatre_lat", "theatre_long", "city", "city_ID"
SHOW_Table : It contains information of shows i.e. show_timings, show_dates, percentage of seats available, of movies screening in theatres all over INDIA.
- Columns : "show_ID", "theatre_ID", "movie_ID", "show_date", "show_timing", "seat_percent_available"
-
bms_scrapper.py : Extract movies information for a particular region.
- get_topten_movies : Extracts list of top ten movies screening in particuar region.
- get_nearby_theatres : Lists names of nearby theatres to you location.
-
city_class.py : Class to extract cities supported by BookMyShow.
city_table : city_ID, city_name -
movie_class.py : Class to extract data of movies showing in all cities supported.
MOVIE table : movie_ID, movie_name, city_ID -
theatre_show.py :
- Extracts details about theatre like latitude and longitutde of theatres, city.
- Extracts details about shows like show_timings and dates, theatre and city.
-
sql.py :
This script push the collected data to a remote MySQL server.
To fetch information about certain region, just run: bms_scrapper.py
bms = BMSData("ncr")
print(bms.get_topten_movies)