This project is given a starting page Here:
- Crawl all the child pages contents
- Build indexes in database
- Search query using the indexes
- Create a frontend for accepting user query
Frontend | Python Backend | Java Backend | |
---|---|---|---|
URL | https://search.johnnyip.com/ | - | - |
Libraries | React.js | Flask | Spring Boot |
Mantine (UI) | pymongo | htmlparser | |
Axios | sqlite3 | gson | |
sentence_transformers | spring-boot-starter-data-redis | ||
NLTK | jsoup | ||
numpy | sqlite-jdbc |
!!! Performance of Crawling is much slower (~30 minutes) under Docker environment. Performance in local is much faster (~2 minutes).
!!! DB file in backend-java is a blank template. It is used for future data update during initialization.
In case any error occurs, please remove all files and run docker compose again.
-
Before you begin, make sure you have Docker client installed
-
Make sure the
compose.yaml
file is inside the folder -
Open Terminal (Mac/Linux), or cmd in Windows, and enter the following commands
cd <path_to_your_folder>
docker compose up -d
-
After those necessary docker images are downloaded, it will be up and running.
-
3 Folders will be created
db
folder contains the SQLite filemongodb
folder contains data of MongoDBredis
folder contains data of Redis DB