SGCO-Scraper-Oct8

Internship project to automate marketing & sales team work.
This scraper scrape all the types, categories and details of the Singapore Companies that are listed on the web
Scraped & Cleaned data is located in DataOutput: data.json ; data.xlsx ; extra_cleaned_data.xlsx ; data_sheets.xlsx

Note !

ScrapeSgCo package is the scraper
executable python script resides in ScrapeSgCo package and you just need to run the main.py as located in the base dir
Output of the main.py script will be the cleaned scraped data stored in data.json located in DataOutput dir
To convert the data.json to excel file simply execute convert_to_excel.py located in excel_util_scripts

How to run ?

Clone the repo

$ git clone https://github.com/shahan007/SGCO-Scraper-Oct8

Setting up the environment

$ python -m venv venv
$ source venv/Scripts/activate
(venv) $ pip install -r requirements.txt

Run the Scraper

(venv) $ python main.py

Optional (convert data.json to excel file for excel experts)

(venv) $ python ./excel_util_scripts/convert_to_excel.py

Optional (further clean the generated excel file)

(further clean data.xlsx file for easier usage of the data) (pre-req is the availability of data.xlsx file resulted from the execution of the convert_to_excel.py )

(venv) $ python ./excel_util_scripts/xtra_clean_excel.py

Optional (further splits the clean generated excel file into sheets by WebCategory field)

(venv) $ python ./excel_util_scripts/cat_to_sheet.py

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE.md file for details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGCO-Scraper-Oct8

Note !

How to run ?

Clone the repo

Setting up the environment

Run the Scraper

Optional (convert data.json to excel file for excel experts)

Optional (further clean the generated excel file)

Optional (further splits the clean generated excel file into sheets by WebCategory field)

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
DataOutput		DataOutput
ScrapeSgCo		ScrapeSgCo
excel_util_scripts		excel_util_scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

License

shahan007/SGCO-Scraper

Folders and files

Latest commit

History

Repository files navigation

SGCO-Scraper-Oct8

Note !

How to run ?

Clone the repo

Setting up the environment

Run the Scraper

Optional (convert data.json to excel file for excel experts)

Optional (further clean the generated excel file)

Optional (further splits the clean generated excel file into sheets by WebCategory field)

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages