Internship project to automate marketing & sales team work.
This scraper scrape all the types, categories and details of the Singapore Companies that are listed on the web
Scraped & Cleaned data is located in DataOutput: data.json ; data.xlsx ; extra_cleaned_data.xlsx ; data_sheets.xlsx
ScrapeSgCo package is the scraper
executable python script resides in ScrapeSgCo package and you just need to run the main.py as located in the base dir
Output of the main.py script will be the cleaned scraped data stored in data.json located in DataOutput dir
To convert the data.json to excel file simply execute convert_to_excel.py located in excel_util_scripts
$ git clone https://github.com/shahan007/SGCO-Scraper-Oct8
$ python -m venv venv
$ source venv/Scripts/activate
(venv) $ pip install -r requirements.txt
(venv) $ python main.py
(venv) $ python ./excel_util_scripts/convert_to_excel.py
(further clean data.xlsx file for easier usage of the data) (pre-req is the availability of data.xlsx file resulted from the execution of the convert_to_excel.py )
(venv) $ python ./excel_util_scripts/xtra_clean_excel.py
(venv) $ python ./excel_util_scripts/cat_to_sheet.py
This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details