Skip to content

DhrumilShah98/CityCompaniesDataPythonScript

Repository files navigation

CityCompaniesDataPythonScript

SCRAPING JSON WEBPAGE

author     = "Dhrumil Amish Shah"
copyright  = "Copyright 2019"
credits    = ["Dhrumil Amish Shah"]
version    = "1.0.0"
maintainer = "Dhrumil Amish Shah"
github     = "https://github.com/DhrumilShah98/"
linkedIn   = "https://linkedin.com/in/dhrumilshah98/"

NOTE: I CAN NOT SHARE THE NAME OF THE WEBSITE

IF YOU WANT JSON OR EXCEL SHEET, CONTACT ME :)
THIS SCRIPT IS CAPABLE TO GET DATA FOR ANY CITY. [INDIA ONLY]

FILE:- 1. CityCompaniesExcelScript.py

  • Fetch the data from 'website_name' website.
  • I invested a lot of time to find the main data source on "website_name"
  • I found pattern in the data source (JSON file) which called multiple times within which multiple URLs were called too from which i got chunks of data. [CONTAINS A LOT DATA OUT OF WHICH I USED ONLY FEW]
  • Then I cleaned all the data and I decided what should I keep and what I should not. [TEDIOUS TASK]
  • Below is the code. It calls multiple URLs one by one, get required data by parsing and then make an Excel structure out of it.

Summarizing....

  1. Got a data source (JSON form), called multiple URLS by analyzaing the JSON form,
  2. From multiple URLs, I got a chunks of data after which I created a simple excel structure.
  3. You can contact me on linkedIn if you want this data either in Excel or JSON.
TOTAL NUMBER OF COLUMNS: 31 COLUMNS
```````````````````````````````````
1) Company Full Name			
2) Company Short Name			
3) Company Scrip Code			
4) Company Category			
5) Company SEO URL EQ			
6) Company Contact Person Prefix	
7) Company Contact Person Firstname	
8) Company Contact Person Middlename	
9) Company Contact Person Lastname	
10) Company Contact Person Designation				
11) Company Address
12) Company City
13) Company PIN
14) Company State
15) Company Tele
16) Company Fax
17) Company Email
18) Company URL
19) Company Registrars Name
20) Company Registrars Address
21) Company Registrars Phone
22) Company Registrars Fax
22) Company Registrars Email Id
22) Company Registrars Website
25) Company ISIN
26) Company Industry
27) Company Impact Cost(%)
28) Company BC/RD
29) Company Market Lot
30) Company Listing Date
31) Company CIN

ALSO, THERE ARE MANY MORE FEATURES WHICH I HAVE NOT SCRAPPED BUT I WILL DO IT IN FUTURE IF I GET SOME MORE TIME ;)

THIS PROJECT WAS FUN :)