In this repository, we provide the source code for the UI of our dataset search engine DataHunter.
The number of datasets is steadily rising, making it increasingly difficult for researchers and practitioners in the various scientific disciplines to be aware of all datasets, particularly of the most relevant datasets for a given research problem. To this end, dataset search engines have been proposed. However, they are based on the users' keywords and thus have difficulties in determining precisely fitting datasets for complex research problems. In this paper, we propose the system at http://data-hunter.io that recommends suitable datasets to users based on given research problem descriptions. It is based on fastText for the text representation and text classification, the Data Set Knowledge Graph (DSKG) with metadata about almost 1,700 unique datasets, as well as 88,000 paper abstracts as research problem descriptions for training the model. Overall, our system demonstrates that recommending datasets facilitates data provisioning and reuse according to the FAIR principles and that dataset recommendation is a promising future research direction.
The dataset search UI is based on Django and SQLite.
We recommend to setup a Python virtual environment with SQLite and Django. Detailled instructions follow soon.
To run the UI on the Django server, execute within the main directory
python manage.py runserver 0:{{PORT}}
Typical port numbers are 8000, 8080 or 9000. If you do not specify any port number, the port 8000 will be used automatically. To access the application go to https://localhost:{{PORT}}. In case you are running the application on a remote server, replace localhost by the remote server's IP address.
Usually, you need to migrate all changes such that they are reflected in your database. For this reason, run the following commands:
- python manage.py makemigrations
- python manage.py migrate
The system has been designed and implemented by Michael Färber and Ann-Kathrin Leisinger. Feel free to reach out to us:
Michael Färber, michael.faerber@kit.edu
Please cite our paper (published at RecSys'21) as follows:
@inproceedings{Faerber2021RecSys,
author = {Michael F{\"{a}}rber and
Ann-Kathrin Leisinger},
title = "{DataHunter: A System for Finding Datasets Based on Scientific Problem Descriptions}",
booktitle = "{Proceedings of the 15th ACM Conference on Recommender Systems}",
location = "{Amsterdam, The Netherlands}",
year = {2021}
}