Supervisor: Judith Bütepage
Collaborators: Fredrik Segerhammar & Mariya Lazarova (Statistical Method), Tianzong Wang (Deep Learning Method)
Locating the best matching paragraph in a document given a search query is a very well studied problem. However, for podcast data the problem is newer and there is not much research done on it. We attempt to retrieve the best jump-in point for relevant segments of podcast episodes given arbitrary user search queries, using the dataset provided in the TREC 2020 Podcasts Track. We propose two methods, one based traditional statistical methods utilizaing TF-IDF and Okapi BM25, and another Sentence-Transformer based deep learning embedding method, to target the first Ad-hoc Segment Retrieval task. A detailed project report and presentation will be released later, or upon request.