Concept

I created theses scripts in order to scrapp documentation websites in order to merge all the data into one file and give it like a context for create GPT Builder specialized in specific technos.

Before begin install dependances:

pip install requests
pip install beautifulsoup4
pip install selenium
pip install webdriver-manager

Setup

First Step:

Choose the website url you want to scrap

Second Step:

Run python scrap.py in the terminal for getting all the route from the url you want. It will create a folder txt with all the links route

Third Step:

Run python scrapwebsiteDataSelenium.py in the terminal for getting all the links you scrapped in nextjs_doc_links.txt. It will create a files with websites content for every link you scrapped in the step 2

Fourth Step:

Run python mergeFiles.py in the terminal for merge all the files in one file.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.vscode		.vscode
README.md		README.md
documentation_concatenee.txt		documentation_concatenee.txt
mergeFiles.py		mergeFiles.py
scrap.py		scrap.py
scrapwebsiteDataSelenium.py		scrapwebsiteDataSelenium.py
website_doc_links.txt		website_doc_links.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concept

Before begin install dependances:

Setup

First Step:

Second Step:

Third Step:

Fourth Step:

About

Packages

Languages

JW-Rami/documentation-scrapper-for-GPT-Builder

Folders and files

Latest commit

History

Repository files navigation

Concept

Before begin install dependances:

Setup

First Step:

Second Step:

Third Step:

Fourth Step:

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages