This repository contains the Sinhala song lyrics corpus, the web crawler and the search engine built using ElasticSearch.
To start the search engine, run an elasticsearch instance on port 9200.
- lyrics crawler: source code for the data scraper
- original data: original data files in unicode format
- processed data: data translated to Sinhala
- queries: ElasticSearch queries
The corpus was scraped from https://sinhalasongbook.com/. It contains 1094 unique Sinhala songs. Each song has 8 matadata fields as follows.
- title - Sinhala
- artist - Sinhala
- lyricist - Sinhala
- musicComposer - Sinhala
- genre - Sinhala
- views - Number
- shares - Number
- lyrics - Sinhala
Metadata fields in English were translated to Sinhala using mtranslate python library.
- Search a song by any field
- Faceted search
- Range queries
- Support for synonyms