Skip to content

simula/datasets.simula.no

Repository files navigation

datasets.simula.no

A collection of open datasets published by Simula Research Laboratory and SimulaMet.

Currently, we have published the following datasets:

Medical and Biology Datasets

  • Depresjon, The Depresjon Dataset. [ publication ]
  • HyperKvasir, The Largest Gastrointestinal Dataset. [ publication ]
  • HYPERAKTIV, A Motor Activity Database of Patients with ADHD. [ publication ]
  • KvasirCapsule SEG, A Capsule Endoscopy Segmentation Dataset. [ publication ]
  • Cellular, A cell autophagy dataset. [ publication ]
  • GastroVision, A multicenter dataset. [ publication ]
  • Nerthus, A Bowel Preparation Quality Video Dataset. [ publication ]
  • Kvasir-VQA: A Text-Image Pair GI Tract Dataset
  • Kvasir Capsule, The largest gastrointestinal PillCAM dataset. [ publication ]
  • Kvasir Instrument, A gastrointestinal instrument Dataset. [ publication ]
  • Kvasir SEG, Segmented Polyp Dataset for Computer Aided Gastrointestinal Disease Detection. [ publication ]
  • Kvasir, A Multi-Class Image-Dataset for Computer Aided Gastrointestinal Disease Detection. [ publication ]
  • Psykose, A Motor Activity Database of Patients with Schizophrenia. [ publication ]
  • VISEM QC, A sperm quality control dataset.
  • VISEM, A Multimodal Video Dataset of Human Spermatozoa. [ publication ]

Sport Datasets

  • Alfheim, Soccer video and player position dataset. [ publication ]
  • ARX, A Text-Classification Dataset Consisting of Norwegian Soccer Articles from VG and TV2. [ publication ]
  • Heimdallr, A Dataset For Sport Analysis.
  • ScopeSense, A 8.5-month sport, nutrition, and lifestyle lifelogging dataset.
  • Soccer Summarization, Soccer game captions and summary in English for game summarization. [ publication ]
  • SoccerMon, Subjective and objective data collected over two years from two different elite women´s soccer teams.
  • SoccerSum, The SoccerSum Dataset for Automated Detection, Segmentation, and Tracking of Objects on the Soccer Pitch [ publication ]
  • SoccerNet-Echoes, SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset [ publication ]
  • PMData , A lifelogging dataset of 16 persons during 5 months using Fitbit, Google Forms and PMSys.
  • TACDEC, TACDEC: Dataset of Tackle Events in Soccer Game Videos [ publication ]

Other Datasets

  • Anarchy Online, Server-side Network Traffic from Anarchy Online: Analysis, Statistics and Applications. [ publication ]
  • European Cloud Cover, A dataset containing reanalysis data from ERA5 and satellite retrievals from METeosat Second Generation. [ publication ]
  • Eye Tracker, A Serious Game Based Dataset. [ publication ]
  • HSDPA, HSDPA-bandwidth logs for mobile HTTP streaming scenarios.
  • HTAD, A Home-Tasks Activities Dataset with Wrist-accelerometer and Audio Features. [ publication ]
  • Image Sentiment, A dataset for image sentiment analysis. [ publication ]
  • Njord, A fishing boat dataset.
  • Right Inflight, A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation.
  • THREAT, A Large Annotated Corpus for Detection of Violent Threats.
  • Toadstool, A Dataset for Training Emotional and Intelligent Machines Playing Super Mario Bros. [ publication ]
  • WICO Graph Dataset, A Labeled Dataset of Twitter Subgraphs based on Conspiracy Theory and 5G-Corona Misinformation Tweets. [ publication ]
  • WICO Text, A labeled dataset of conspiracy theory and 5G-corona misinformation tweets. [ publication ]

How to contribute

To add a new dataset, follow these steps:

  1. Fork the Repository: Fork this repository to your GitHub account.
  2. Create a Markdown File: In your forked repository, navigate to the datasets folder and create a new Markdown file (.md) for your dataset. The file name should be descriptive of the dataset.
  3. Add Dataset Information: Copy and paste the following template into your Markdown file:
    ---
    title: <dataset name>
    desc: <dataset description>
    thumbnail: <dataset thumbnail>
    publication: <link to publication>
    github: <link to github>
    tags:
      - <list of tags>
    ---
    Fill in the template with the appropriate information about your dataset.
  4. Add a Dataset Thumbnail: Add a thumbnail to the dataset that will be displayed on the main page. The thumbnail should use a 16:9 aspect ratio, like 320 x 180 or 640 x 360 pixels, and be placed under public/thumbnails.
  5. Update the README: Update this README with the new dataset added under one of the categories above. Add links to the publication, code, or other things that may be useful.
  6. Create a Pull Request: Once you have added the Markdown file and filled in the dataset information, commit your changes. Push the changes to your forked repository. Create a pull request to merge your changes into the main repository.

Contact

If you have any questions or need assistance, please open an issue in the repository or contact steven@simula.no.