This project contains a Python script that scrapes educational content from the website testkolik.com. It's designed to extract questions and their corresponding options for specific subjects and topics.
- Scrapes questions and options from testkolik.com
- Supports different grades, lessons, and topics
- Handles Turkish language content
- Uses BeautifulSoup for HTML parsing
- Creates an API using fastapi
- Python 3.6+
- requests
- beautifulsoup4
-
Clone this repository:
https://github.com/code-alchemist01/intellifist-ai.git cd intellifist-ai
-
Install the required packages:
pip install requests beautifulsoup4 google-generativeai fastapi uvicorn
The main script is src/data/bs_scraper.py
. You can run it directly:
python src/data/bs_scraper.py
Replace GRADE
, LESSON
, and TOPIC
with your desired values.
The main function scrape_through_hs(grade: str, lesson: str, topic: str)
takes three parameters:
grade
: The grade level ("1-12")lesson
: The lesson name ("matematik","fizik","kimya","biyoloji","tarih","din-kulturu","ingilizce","cografya")topic
: The specific topic (e.g., "allah-insan-iliskisi") (You can find the topics via the URL on this site)
It returns a list of dictionaries, each containing:
number
: The question numberfull_text
: The full text of the questionoptions
: A list of answer options
This script is for educational purposes only. Make sure you have the right to scrape content from the target website and comply with their terms of service.