AI Assited Web Scraper

Built with Streamlit

Powered by Ollamla

ABOUT:

This webscraper uses the Ollama LLM to assist in web scraping. It prompts the user to supply a URL, and a description of they want scraped from the specified website. These instructions are then read by the LLM and parsed by the python code, and the results are returned to the user.

How to use:

You will need to download the Chrome Driver and palce it in the same directory as the python files/scripts https://googlechromelabs.github.io/chrome-for-testing/#stable

Clone the repository & Install the requirments

pip install -r requirements.txt

Then, in the command line, in the /AIWebScraper directory run:

streamlit run main.py

Quickstart

To run and chat with Llama 3.1:

ollama run llama3.1

To pull a model do:

ollama pull llama3.1

Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

The following information is from Ollama's GitHub and is relevant to it's use in this web scraper

Model library

Ollama supports a list of models available on ollama.com/library

Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	70B	40GB	`ollama run llama3.1:70b`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Phi 3 Medium	14B	7.9GB	`ollama run phi3:medium`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Note

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Customize a model

Import from GGUF

Ollama supports importing GGUF models in the Modelfile:

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.
```
FROM ./vicuna-33b.Q4_0.gguf
```
Create the model in Ollama
```
ollama create example -f Modelfile
```
Run the model
```
ollama run example
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
TODO		TODO
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Assited Web Scraper

How to use:

Quickstart

Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

Model library

Customize a model

Import from GGUF

About

Releases

Packages

Languages

LinuxUser255/AIWebScraper

Folders and files

Latest commit

History

Repository files navigation

AI Assited Web Scraper

How to use:

Quickstart

Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

Model library

Customize a model

Import from GGUF

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages