This repository contains code and resources from the master’s thesis “Decoding Spatial Semantics: A Comparative Analysis of the Performance of Open-source LLMs against NMT Systems in Translating EN-PT-br”.
This study explores the challenges of translating spatial language using open-source Large Language Models (LLMs) and traditional Neural Machine Translation (NMT) systems. It focuses on translating spatial prepositions such as ACROSS, INTO, ONTO, and THROUGH from English to Portuguese (PT-br).
- Code: Includes scripts for data preprocessing, running experiments, and evaluating results.
- Datasets: Bilingual dataset of TED Talks subtitles focusing on spatial prepositions.
- Evaluation Metrics: Scripts for computing BLEU, METEOR, BERTScore, COMET, and TER.