A collection of open datasets published by Simula Research Laboratory and SimulaMet.
Currently, we have published the following datasets:
Medical and Biology Datasets
- Depresjon, The Depresjon Dataset. [ publication ]
- HyperKvasir, The Largest Gastrointestinal Dataset. [ publication ]
- HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [ publication ]
- KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [ publication ]
- Cellular, A cell autophagy dataset. [ publication ]
- GastroVision, A multicenter dataset. [ publication ]
- Nerthus, A Bowel Preparation Quality Video Dataset. [ publication ]
- Kvasir-VQA: A Text-Image Pair GI Tract Dataset
- Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [ publication ]
- Kvasir Instrument, A gastrointestinal instrument Dataset. [ publication ]
- Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [ publication ]
- Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [ publication ]
- Psykose, A Motor Activity Database of Patients with Schizophrenia. [ publication ]
- VISEM QC, A sperm quality control dataset.
- VISEM, A Multimodal Video Dataset of Human Spermatozoa. [ publication ]
Sport Datasets
- Alfheim, Soccer video and player position dataset. [ publication ]
- ARX, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [ publication ]
- Heimdallr, A Dataset For Sport Analysis.
- ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset.
- Soccer Summarization, Soccer game captions and summary in English for game summarization. [ publication ]
- SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams.
- SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch [ publication ]
- SoccerNet-Echoes, SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset [ publication ]
- PMData , A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys.
- TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos [ publication ]
Other Datasets
- Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [ publication ]
- European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [ publication ]
- Eye Tracker, A Serious Game Based Dataset. [ publication ]
- HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios.
- HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [ publication ]
- Image Sentiment, A dataset for image sentiment analysis. [ publication ]
- Njord, A fishing boat dataset.
- Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation.
- THREAT, A Large Annotated Corpus for Detection of Violent Threats.
- Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [ publication ]
- WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [ publication ]
- WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [ publication ]
To add a new dataset, follow these steps:
- Fork the Repository: Fork this repository to your GitHub account.
- Create a Markdown File: In your forked repository, navigate to the
datasets
folder and create a new Markdown file (.md
) for your dataset. The file name should be descriptive of the dataset. - Add Dataset Information: Copy and paste the following template into your Markdown file:
Fill in the template with the appropriate information about your dataset.
--- title: <dataset name> desc: <dataset description> thumbnail: <dataset thumbnail> publication: <link to publication> github: <link to github> tags: - <list of tags> ---
- Add a Dataset Thumbnail: Add a thumbnail to the dataset that will be displayed on the main page. The thumbnail should use a 16:9 aspect ratio, like
320 x 180
or640 x 360
pixels, and be placed underpublic/thumbnails
. - Update the README: Update this README with the new dataset added under one of the categories above. Add links to the publication, code, or other things that may be useful.
- Create a Pull Request: Once you have added the Markdown file and filled in the dataset information, commit your changes. Push the changes to your forked repository. Create a pull request to merge your changes into the main repository.
If you have any questions or need assistance, please open an issue in the repository or contact steven@simula.no.