Skip to content

A curated list of papers that released datasets along with their work

License

Notifications You must be signed in to change notification settings

voxel51/papers-with-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Papers with Data

Data reigns supreme πŸ₯‡

Every day it becomes more evident that data is the limiting factor for state-of-the-art πŸ“ˆ machine learning. Your model architecture may be revolutionary, but without high-quality data πŸ“Š to train on, it will be doomed to mediocrity.

Pair idea with execution and use top-notch data in your next project!

NeurIPS 2023

We've combed through the 2384 papers accepted to NeurIPS in 2023 and compiled a short-list of papers introducing exciting new datasets.

Title Tags Paper Dataset Code
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data perceptual similarity, image, synthetic, diffusion, JND, 2AFC arXiv FiftyOne GitHub
Visual Instruction Tuning vision-language, llm, instruction-tuning, image, multimodal arXiv FiftyOne GitHub
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation reward-model, image, text-to-image, synthetic, human-preference, alignment arXiv FiftyOne GitHub
MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing image-editing, synthetic, image, instruction arXiv FiftyOne GitHub
REAL3D-AD 3D, point-cloud, anomaly-detection arXiv FiftyOne GitHub

WACV 2024

Title Tags Paper Dataset Code
dacl10k: Benchmark for Semantic Bridge Damage Segmentation image, semantic segmentation, classification, construction, defect arXiv FiftyOne GitHub

ICCV 2023

Title Tags Paper Dataset Code
Satlas: A Large-Scale, Multi-Task Dataset for Remote Sensing Image Understanding image, SAR, satellite, detection, climate arXiv FiftyOne GitHub
Building3D: An Urban-Scale Dataset and Benchmarks for Learning Roof Structures from Point Clouds 3D, point cloud arXiv FiftyOne
EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding image, object, ego arXiv FiftyOne GitHub
Equivariant Similarity for Vision-Language Foundation Models image, similarity, caption arXiv FiftyOne GitHub
MOSE: A New Dataset for Video Object Segmentation in Complex Scenes video, segmentation, tracking arXiv FiftyOne
SportsMOT: A Large Multi-Object Tracking Dataset in Multiple Sports Scenes multi-object tracking, sports arXiv FiftyOne GitHub

CVPR 2023

cvpr2023-4

We've combed through the 2359 papers accepted to CVPR in 2023 and compiled a short-list of papers introducing exciting new datasets.

Title Tags Paper Dataset Code
MVImgNet: A Large-scale Dataset of Multi-view Images multi-view, image arXiv FiftyOne GitHub
GeoNet: Benchmarking Unsupervised Adaptation across Geographies geolocation, image arXiv FiftyOne
Joint HDR Denoising and Fusion: A Real-World Mobile HDR Image Dataset denoising, image FiftyOne GitHub
Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo optical flow, stereo, image arXiv FiftyOne
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing image, editing arXiv FiftyOne GitHub
ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data RGB-D, segmentation, video arXiv FiftyOne GitHub
Diverse Embedding Expansion Network and Low-Light Cross-Modality Benchmark for Visible-Infrared Person Re-identification low-light, cross-modal, IR arXiv FiftyOne GitHub
JRDB-Pose: A Large-scale Dataset for Multi-Person Pose Estimation and Tracking pose estimation, image, keypoint, tracking arXiv FiftyOne
A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation synthetic, domain adaptation, supervised arXiv FiftyOne GitHub

Papers from 2022

Title Tags Paper Dataset Code
Calving fronts and where to find them: a benchmark dataset and methodology for automatic glacier calving front extraction from synthetic aperture radar imagery glacier, climate, SAR, satellite, image, semantic segmentation Paper Badge FiftyOne Code Badge
The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting conservation, detection, SONAR, video, tracking, counting arXiv FiftyOne GitHub

Classics

Title Tags Paper Dataset Code
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases x-ray, image, healthcare, detection arXiv FiftyOne

Contributing πŸ‘‹

We would love your help in making this repository even better! If we missed a paper that introduced a new dataset, or if you can think of any ways to improve the repository, feel free to open an issue or a pull request.

Note

This repository is inspired by paperswithcode, and the template was adapted from top-cvpr-2023-papers.