This project implements cross-lingual information retrieval techniques using BERT (Bidirectional Encoder Representations from Transformers) for English-Turkish language pair. It compares different approaches including Latent Semantic Indexing (LSI), LSI with translation, and BERT-based methods.
The main goal of this project is to evaluate and compare different techniques for cross-lingual information retrieval between English and Turkish. The project explores:
- Latent Semantic Indexing (LSI)
- LSI with Translation
- BERT-based approach
For each approach, various similarity metrics are used:
- Cosine Similarity
- Jaccard Similarity
- Dice Similarity
- Overlap Similarity
BERT-based method shows significant improvement over traditional LSI and LSI with translation approaches.
The project utilizes the following technologies: