CVPR 2021 论文和开源项目合集(Papers with Code)

CVPR 2021 论文和开源项目合集(papers with code)！

CVPR 2021 收录列表：http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt

注1：欢迎各位大佬提交issue，分享CVPR 2021论文和开源项目！

注2：关于往年CV顶会论文以及其他优质CV论文和大盘点，详见： https://github.com/amusi/daily-paper-computer-vision

CVPR 2021 中奖群已成立！已经收录的同学，可以添加微信：CVer9999，请备注：CVPR2021已收录+姓名+学校/公司名称！一定要根据格式申请，可以拉你进群沟通开会等事宜。

【CVPR 2021 论文开源目录】

Backbone
NAS
GAN
VAE
Visual Transformer
Regularization
SLAM
长尾分布(Long-Tailed)
数据增广(Data Augmentation)
无监督/自监督(Self-Supervised)
半监督(Semi-Supervised)
胶囊网络(Capsule Network)
图像分类(Image Classification
2D目标检测(Object Detection)
单/多目标跟踪(Object Tracking)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
全景分割(Panoptic Segmentation)
医学图像分割(Medical Image Segmentation)
视频目标分割(Video-Object-Segmentation)
交互式视频目标分割(Interactive-Video-Object-Segmentation)
显著性检测(Saliency Detection)
伪装物体检测(Camouflaged Object Detection)
协同显著性检测(Co-Salient Object Detection)
图像抠图(Image Matting)
行人重识别(Person Re-identification)
行人搜索(Person Search)
视频理解/行为识别(Video Understanding)
人脸识别(Face Recognition)
人脸检测(Face Detection)
人脸活体检测(Face Anti-Spoofing)
Deepfake检测(Deepfake Detection)
人脸年龄估计(Age-Estimation)
人脸表情识别(Facial-Expression-Recognition)
Deepfakes
人体解析(Human Parsing)
2D/3D人体姿态估计(2D/3D Human Pose Estimation)
动物姿态估计(Animal Pose Estimation)
Human Volumetric Capture
场景文本识别(Scene Text Recognition)
图像压缩(Image Compression)
模型压缩/剪枝/量化
知识蒸馏(Knowledge Distillation)
超分辨率(Super-Resolution)
去雾(Dehazing)
图像恢复(Image Restoration)
图像补全(Image Inpainting)
图像编辑(Image Editing)
图像描述(Image Captioning)
字体生成(Font Generation)
图像匹配(Image Matching)
图像融合(Image Blending)
反光去除(Reflection Removal)
3D点云分类(3D Point Clouds Classification)
3D目标检测(3D Object Detection)
3D语义分割(3D Semantic Segmentation)
3D全景分割(3D Panoptic Segmentation)
3D目标跟踪(3D Object Tracking)
3D点云配准(3D Point Cloud Registration)
3D点云补全(3D-Point-Cloud-Completion)
3D重建(3D Reconstruction)
6D位姿估计(6D Pose Estimation)
相机姿态估计(Camera Pose Estimation)
深度估计(Depth Estimation)
立体匹配(Stereo Matching)
光流估计(Flow Estimation)
车道线检测(Lane Detection)
轨迹预测(Trajectory Prediction)
人群计数(Crowd Counting)
对抗样本(Adversarial-Examples)
图像检索(Image Retrieval)
视频检索(Video Retrieval)
跨模态检索(Cross-modal Retrieval)
Zero-Shot Learning
联邦学习(Federated Learning)
视频插帧(Video Frame Interpolation)
视觉推理(Visual Reasoning)
图像合成(Image Synthesis)
视图合成(Visual Synthesis)
风格迁移(Style Transfer)
布局生成(Layout Generation)
Domain Generalization
Domain Adaptation
Open-Set
Adversarial Attack
"人-物"交互(HOI)检测
阴影去除(Shadow Removal)
虚拟试衣(Virtual Try-On)
标签噪声(Label Noise)
数据集(Datasets)
其他(Others)
待添加(TODO)
不确定中没中(Not Sure)

Backbone

BCNet: Searching for Network Width with Bilaterally Coupled Network

Paper: https://arxiv.org/abs/2105.10533
Code: None

Decoupled Dynamic Filter Networks

Homepage: https://thefoxofsky.github.io/project_pages/ddf
Paper: https://arxiv.org/abs/2104.14107
Code: https://github.com/thefoxofsky/DDF

Lite-HRNet: A Lightweight High-Resolution Network

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

Paper: https://arxiv.org/abs/2104.04382
Code: https://github.com/jianghaojun/CondenseNetV2

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Paper: https://arxiv.org/abs/2103.13425
Code: https://github.com/DingXiaoH/DiverseBranchBlock

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None

ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

Paper: https://arxiv.org/abs/2007.00992
Code: https://github.com/clovaai/rexnet

Involution: Inverting the Inherence of Convolution for Visual Recognition

Paper: https://github.com/d-li14/involution
Code: https://arxiv.org/abs/2103.06255

Coordinate Attention for Efficient Mobile Network Design

Paper: https://arxiv.org/abs/2103.02907
Code: https://github.com/Andrew-Qibin/CoordAttention

Inception Convolution with Efficient Dilation Search

Paper: https://arxiv.org/abs/2012.13587
Code: https://github.com/yifan123/IC-Conv

RepVGG: Making VGG-style ConvNets Great Again

Paper: https://arxiv.org/abs/2101.03697
Code: https://github.com/DingXiaoH/RepVGG

NAS

BCNet: Searching for Network Width with Bilaterally Coupled Network

Paper: https://arxiv.org/abs/2105.10533
Code: None

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Paper: ttps://arxiv.org/abs/2105.10154
Code: None

Combined Depth Space based Architecture Search For Person Re-identification

Paper: https://arxiv.org/abs/2104.04163
Code: None

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

Paper(Oral): https://arxiv.org/abs/2103.15954
Code: None

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None
Code: https://github.com/dingmyu/HR-NAS

Neural Architecture Search with Random Labels

Paper: https://arxiv.org/abs/2101.11834
Code: None

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

Paper: https://arxiv.org/abs/2101.11342
Code: None

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Paper: https://arxiv.org/abs/2105.12971
Code: None

Prioritized Architecture Sampling with Monto-Carlo Tree Search

Paper: https://arxiv.org/abs/2103.11922
Code: https://github.com/xiusu/NAS-Bench-Macro

Contrastive Neural Architecture Search with Neural Architecture Comparators

Paper: https://arxiv.org/abs/2103.05471
Code: https://github.com/chenyaofo/CTNAS

AttentiveNAS: Improving Neural Architecture Search via Attentive

Paper: https://arxiv.org/abs/2011.09011
Code: None

ReNAS: Relativistic Evaluation of Neural Architecture Search

Paper: https://arxiv.org/abs/1910.01523
Code: None

HourNAS: Extremely Fast Neural Architecture

Paper: https://arxiv.org/abs/2005.14446
Code: None

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

Paper: https://arxiv.org/abs/2103.07289
Code: https://github.com/eric8607242/SGNAS

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Paper: https://arxiv.org/abs/2103.04507
Code: https://github.com/VDIGPKU/OPANAS

Inception Convolution with Efficient Dilation Search

Paper: https://arxiv.org/abs/2012.13587
Code: None

GAN

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Paper: https://arxiv.org/abs/2105.09188
Code: https://github.com/csjliang/LPTN
Dataset: https://github.com/csjliang/LPTN

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: https://arxiv.org/abs/2105.02201
Code: https://github.com/KumapowerLIU/PD-GAN

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: https://arxiv.org/abs/2104.14754
Code: https://github.com/naver-ai/StyleMapGAN
Demo Video: https://youtu.be/qCapNyRA_Ng

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Paper: https://arxiv.org/abs/2104.05376
Code: https://github.com/PaddlePaddle/PaddleGAN/

Regularizing Generative Adversarial Networks under Limited Data

Homepage: https://hytseng0509.github.io/lecam-gan/
Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf
Code: https://github.com/google/lecam-gan

Towards Real-World Blind Face Restoration with Generative Facial Prior

Paper: https://arxiv.org/abs/2101.04061
Code: None

TediGAN: Text-Guided Diverse Image Generation and Manipulation

Homepage: https://xiaweihao.com/projects/tedigan/
Paper: https://arxiv.org/abs/2012.03308
Code: https://github.com/weihaox/TediGAN

Generative Hierarchical Features from Synthesizing Image

Homepage: https://genforce.github.io/ghfeat/
Paper(Oral): https://arxiv.org/abs/2007.10379
Code: https://github.com/genforce/ghfeat

Teachers Do More Than Teach: Compressing Image-to-Image Models

Paper: https://arxiv.org/abs/2103.03467
Code: https://github.com/snap-research/CAT

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

Paper: https://arxiv.org/abs/2011.11731
Code: https://github.com/mahmoudnafifi/HistoGAN

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

Homepage: https://marcoamonteiro.github.io/pi-GAN-website/
Paper(Oral): https://arxiv.org/abs/2012.00926
Code: None

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

Paper: https://arxiv.org/abs/2103.07893
Code: None

Diverse Semantic Image Synthesis via Probability Distribution Modeling

Paper: https://arxiv.org/abs/2103.06878
Code: https://github.com/tzt101/INADE.git

LOHO: Latent Optimization of Hairstyles via Orthogonalization

Paper: https://arxiv.org/abs/2103.03891
Code: None

PISE: Person Image Synthesis and Editing with Decoupled GAN

Paper: https://arxiv.org/abs/2103.04023
Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Paper: http://raywzy.com/
Code: http://raywzy.com/

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: http://raywzy.com/
Code: http://raywzy.com/

Efficient Conditional GAN Transfer with Knowledge Propagation across Classes

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: None
Code: None

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

Paper: https://arxiv.org/abs/2011.14107
Code: None

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

Homepage: https://eladrich.github.io/pixel2style2pixel/
Paper: https://arxiv.org/abs/2008.00951
Code: https://github.com/eladrich/pixel2style2pixel

A 3D GAN for Improved Large-pose Facial Recognition

Paper: https://arxiv.org/abs/2012.10545
Code: None

HumanGAN: A Generative Model of Humans Images

Paper: https://arxiv.org/abs/2103.06902
Code: None

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

Paper: https://arxiv.org/abs/2103.02264
Code: https://github.com/MingyuY/Iterative-view-synthesis

CoMoGAN: continuous model-guided image-to-image translation

Paper(Oral): https://arxiv.org/abs/2103.06879
Code: https://github.com/cv-rits/CoMoGAN

Training Generative Adversarial Networks in One Stage

Paper: https://arxiv.org/abs/2103.00430
Code: None

Closed-Form Factorization of Latent Semantics in GANs

Homepage: https://genforce.github.io/sefa/
Paper(Oral): https://arxiv.org/abs/2007.06600
Code: https://github.com/genforce/sefa

Anycost GANs for Interactive Image Synthesis and Editing

Paper: https://arxiv.org/abs/2103.03243
Code: https://github.com/mit-han-lab/anycost-gan

Image-to-image Translation via Hierarchical Style Disentanglement

Paper: https://arxiv.org/abs/2103.01456
Code: https://github.com/imlixinyang/HiSD

VAE

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

Homepage: https://taldatech.github.io/soft-intro-vae-web/
Paper: https://arxiv.org/abs/2012.13253
Code: https://github.com/taldatech/soft-intro-vae-pytorch

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

Paper: https://arxiv.org/abs/2012.09760
Code: https://github.com/microsoft/MeshTransformer

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Paper: https://arxiv.org/abs/2101.06184
Code: https://github.com/tobyperrett/trx

3. Kaleido-BERT：Vision-Language Pre-training on Fashion Domain

Paper: https://arxiv.org/abs/2103.16110
Code: https://github.com/mczhuge/Kaleido-BERT

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper: https://arxiv.org/abs/2104.13682
Code: https://github.com/kakaobrain/HOTR

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

Paper: https://arxiv.org/abs/2104.09224
Code: https://github.com/autonomousvision/transfuser

6. Pose Recognition with Cascade Transformers

Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation

Paper: https://arxiv.org/abs/2104.02416
Code: None

8. LoFTR: Detector-Free Local Feature Matching with Transformers

Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Paper: https://arxiv.org/abs/2103.16553
Code: None

11. Transformer Tracking

Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

Paper(Oral): None
Code: https://github.com/dingmyu/HR-NAS

13. MIST: Multiple Instance Spatial Transformer

Paper: https://arxiv.org/abs/1811.10725
Code: None

14. Multimodal Motion Prediction with Stacked Transformers

Paper: https://arxiv.org/abs/2103.11624
Code: https://decisionforce.github.io/mmTransformer

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer

Paper: https://arxiv.org/abs/2012.00364
Code: None

18. End-to-End Video Instance Segmentation with Transformers

Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr

20. End-to-End Human Object Interaction Detection with HOI Transformer

Paper: https://arxiv.org/abs/2103.04503
Code: https://github.com/bbepoch/HoiTransformer

21. Transformer Interpretability Beyond Attention Visualization

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

Paper: None
Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None
Code: None

24. Line Segment Detection Using Transformers without Edges

Paper(Oral): https://arxiv.org/abs/2101.01909
Code: None

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Paper(Oral): https://arxiv.org/abs/2101.08833
Code: https://github.com/dukebw/SSTVOS

27. Facial Action Unit Detection With Transformers

Paper: None
Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

Paper: None
Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

Paper: None
Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

Paper: https://arxiv.org/abs/2012.05292
Code: None

31. Adaptive Image Transformer for One-Shot Object Detection

Paper: None
Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

Paper: None
Code: None

33. Taming Transformers for High-Resolution Image Synthesis

Homepage: https://compvis.github.io/taming-transformers/
Paper(Oral): https://arxiv.org/abs/2012.09841
Code: https://github.com/CompVis/taming-transformers

34. Self-Supervised Video Hashing via Bidirectional Transformers

Paper: None
Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
Code: None

36. Gaussian Context Transformer

Paper: None
Code: None

37. General Multi-Label Image Classification With Transformers

Paper: https://arxiv.org/abs/2011.14027
Code: None

38. Bottleneck Transformers for Visual Recognition

Paper: https://arxiv.org/abs/2101.11605
Code: None

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

Paper(Oral): https://arxiv.org/abs/2011.13922
Code: https://github.com/YicongHong/Recurrent-VLN-BERT

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Paper(Oral): https://arxiv.org/abs/2102.06183
Code: https://github.com/jayleicn/ClipBERT

41. Self-attention based Text Knowledge Mining for Text Detection

Paper: None
Code: https://github.com/CVI-SZU/STKM

42. SSAN: Separable Self-Attention Network for Video Representation Learning

Paper: None
Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Paper(Oral): https://arxiv.org/abs/2103.12731
Code: None

Regularization

Regularizing Neural Networks via Adversarial Model Perturbation

Paper: https://arxiv.org/abs/2010.04925
Code: https://github.com/hiyouga/AMP-Regularizer

SLAM

Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

Paper: https://arxiv.org/abs/2105.07593
Code: None

Generalizing to the Open World: Deep Visual Odometry with Online Adaptation

Paper: https://arxiv.org/abs/2103.15279
Code: https://arxiv.org/abs/2103.15279

长尾分布(Long-Tailed)

Adversarial Robustness under Long-Tailed Distribution

Paper(Oral): https://arxiv.org/abs/2104.02703
Code: https://github.com/wutong16/Adversarial_Long-Tail

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

Paper: https://arxiv.org/abs/2103.16370
Code: https://github.com/Megvii-BaseDetection/DisAlign

Adaptive Class Suppression Loss for Long-Tail Object Detection

Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

Paper: https://arxiv.org/abs/2103.14267
Code: None

数据增广(Data Augmentation)

Scale-aware Automatic Augmentation for Object Detection

Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug

无监督/自监督(Un/Self-Supervised)

Domain-Specific Suppression for Adaptive Object Detection

Paper: https://arxiv.org/abs/2105.03570
Code: None

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

Paper: https://arxiv.org/abs/2104.14558
Code: https://github.com/facebookresearch/SlowFast

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Paper: https://arxiv.org/abs/2104.12961
Code: None

Self-supervised Video Representation Learning by Context and Motion Decoupling

Paper: https://arxiv.org/abs/2104.00862
Code: None

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
Paper: https://arxiv.org/abs/2009.05769
Code: https://github.com/FingerRec/BE

Spatially Consistent Representation Learning

Paper: https://arxiv.org/abs/2103.06122
Code: None

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

Paper: https://arxiv.org/abs/2103.05905
Code: https://github.com/tinapan-pt/VideoMoCo

Exploring Simple Siamese Representation Learning

Paper(Oral): https://arxiv.org/abs/2011.10566
Code: None

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Paper(Oral): https://arxiv.org/abs/2011.09157
Code: https://github.com/WXinlong/DenseCL

半监督学习(Semi-Supervised )

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Paper: https://arxiv.org/abs/2103.11402
Code: None

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

胶囊网络(Capsule Network)

Capsule Network is Not More Robust than Convolutional Network

Paper: https://arxiv.org/abs/2103.15459
Code: None

图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

Paper(Oral): https://arxiv.org/abs/2105.10305
Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet

2D目标检测(Object Detection)

2D目标检测

Dynamic Head: Unifying Object Detection Heads with Attentions

Paper: https://arxiv.org/abs/2106.08322
Code: https://github.com/microsoft/DynamicHead

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

Paper: https://arxiv.org/abs/2105.12971
Code: None

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

Paper: https://arxiv.org/abs/2105.12990
Code: None

Domain-Specific Suppression for Adaptive Object Detection

Paper: https://arxiv.org/abs/2105.03570
Code: None

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

Paper: https://arxiv.org/abs/2104.06936
Code: None

Multi-Scale Aligned Distillation for Low-Resolution Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection

Paper: https://arxiv.org/abs/2104.00885
Code: https://github.com/CASIA-IVA-Lab/ACSL

VarifocalNet: An IoU-aware Dense Object Detector

Paper(Oral): https://arxiv.org/abs/2008.13367
Code: https://github.com/hyz-xmaster/VarifocalNet

Scale-aware Automatic Augmentation for Object Detection

Paper: https://arxiv.org/abs/2103.17220
Code: https://github.com/Jia-Research-Lab/SA-AutoAug

OTA: Optimal Transport Assignment for Object Detection

Paper: https://arxiv.org/abs/2103.14259
Code: https://github.com/Megvii-BaseDetection/OTA

Distilling Object Detectors via Decoupled Features

Paper: https://arxiv.org/abs/2103.14475
Code: https://github.com/ggjy/DeFeat.pytorch

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Paper: https://arxiv.org/abs/2011.12450
Code: https://github.com/PeizeSun/SparseR-CNN

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Positive-Unlabeled Data Purification in the Wild for Object Detection

Paper: None
Code: None

Instance Localization for Self-supervised Detection Pretraining

Paper: https://arxiv.org/abs/2102.08318
Code: https://github.com/limbo0000/InstanceLoc

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

Paper: https://arxiv.org/abs/2103.04224
Code: None

End-to-End Object Detection with Fully Convolutional Network

Paper: https://arxiv.org/abs/2012.03544
Code: https://github.com/Megvii-BaseDetection/DeFCN

Robust and Accurate Object Detection via Adversarial Learning

Paper: https://arxiv.org/abs/2103.13886
Code: None

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

Paper: https://arxiv.org/abs/2103.13757
Code: None

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Paper: https://arxiv.org/abs/2103.11402
Code: None

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

Paper: https://arxiv.org/abs/2103.04507
Code: https://github.com/VDIGPKU/OPANAS

YOLOF：You Only Look One-level Feature

Paper: https://arxiv.org/abs/2103.09460
Code: https://github.com/megvii-model/YOLOF

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Paper(Oral): https://arxiv.org/abs/2011.09094
Code: https://github.com/dddzg/up-detr

General Instance Distillation for Object Detection

Paper: https://arxiv.org/abs/2103.02340
Code: None

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Paper: https://arxiv.org/abs/2011.12885
Code: https://github.com/implus/GFocalV2

Multiple Instance Active Learning for Object Detection

Paper: https://github.com/yuantn/MIAL/raw/master/paper.pdf
Code: https://github.com/yuantn/MIAL

Towards Open World Object Detection

Paper(Oral): https://arxiv.org/abs/2103.02603
Code: https://github.com/JosephKJ/OWOD

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

Paper: None
Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

Paper: https://arxiv.org/abs/2103.17115
Code: https://github.com/hzhupku/DCNet

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

Paper: https://arxiv.org/abs/2103.01903
Code: None

Few-Shot Object Detection via Contrastive Proposal Encoding

Paper: https://arxiv.org/abs/2103.05950
Code: https://github.com/MegviiDetection/FSCE

旋转目标检测

Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

Paper: https://arxiv.org/abs/2011.09670
Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow
Code2: https://github.com/yangxue0827/RotationDetection

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

Paper: https://arxiv.org/abs/2103.07733
Code: https://github.com/csuhan/ReDet

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Paper: https://arxiv.org/abs/2104.14545
Code: https://github.com/researchmm/LightTrack

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

Homepage: https://sites.google.com/view/langtrackbenchmark/
Paper: https://arxiv.org/abs/2103.16746
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Paper: https://arxiv.org/abs/2103.14938
Code: https://github.com/VISION-SJTU/IoUattack

Graph Attention Tracking

Paper: https://arxiv.org/abs/2011.11204
Code: https://github.com/ohhhyeahhh/SiamGAT

Rotation Equivariant Siamese Networks for Tracking

Paper: https://arxiv.org/abs/2012.13078
Code: None

Track to Detect and Segment: An Online Multi-Object Tracker

Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: None
Code: None

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Paper(Oral): https://arxiv.org/abs/2103.11681
Code: https://github.com/594422814/TransformerTrack

Transformer Tracking

Paper: https://arxiv.org/abs/2103.15436
Code: https://github.com/chenxin-dlut/TransT

多目标跟踪

Multiple Object Tracking with Correlation Learning

Paper: https://arxiv.org/abs/2104.03541
Code: None

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Paper: https://arxiv.org/abs/2012.02337
Code: None

Learning a Proposal Classifier for Multiple Object Tracking

Paper: https://arxiv.org/abs/2103.07889
Code: https://github.com/daip13/LPC_MOT.git

Track to Detect and Segment: An Online Multi-Object Tracker

Homepage: https://jialianwu.com/projects/TraDeS.html
Paper: https://arxiv.org/abs/2103.08808
Code: https://github.com/JialianW/TraDeS

语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Rethinking BiSeNet For Real-time Semantic Segmentation

Paper: https://arxiv.org/abs/2104.13188
Code: https://github.com/MichaelFan01/STDC-Seg

Progressive Semantic Segmentation

Paper: https://arxiv.org/abs/2104.03778
Code: https://github.com/VinAIResearch/MagNet

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Paper: https://arxiv.org/abs/2012.15840
Code: https://github.com/fudan-zvg/SETR

Bidirectional Projection Network for Cross Dimension Scene Understanding

Paper(Oral): https://arxiv.org/abs/2103.14326
Code: https://github.com/wbhu/BPNet

Cross-Dataset Collaborative Learning for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.11351
Code: None

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

Paper: https://arxiv.org/abs/2103.06342
Code: None

Capturing Omni-Range Context for Omnidirectional Segmentation

Paper: https://arxiv.org/abs/2103.05687
Code: None

Learning Statistical Texture for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04133
Code: None

PLOP: Learning without Forgetting for Continual Semantic Segmentation

Paper: https://arxiv.org/abs/2011.11390
Code: None

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/
Paper: https://arxiv.org/abs/2104.00905
Code: None

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

Paper: https://arxiv.org/abs/2103.14581
Code: None

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

Paper: https://arxiv.org/abs/2103.08907
Code: None

半监督语义分割

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Paper: https://arxiv.org/abs/2106.01226
Code: https://github.com/charlesCXK/TorchSemiSeg

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04705
Code: None

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

Paper: https://arxiv.org/abs/2105.00097
Code: https://github.com/visinf/da-sac

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Paper: https://arxiv.org/abs/2103.15597
Code: https://github.com/shachoi/RobustNet

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

Paper: https://arxiv.org/abs/2103.13041
Code: None

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

Paper: https://arxiv.org/abs/2103.05254
Code: None

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

Paper: https://arxiv.org/abs/2103.04717
Code: None

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

Paper: https://arxiv.org/abs/2101.10979
Code: https://github.com/microsoft/ProDA

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Paper: https://arxiv.org/abs/2011.09876
Code: https://github.com/aliyun/DCT-Mask

Incremental Few-Shot Instance Segmentation

Paper: https://arxiv.org/abs/2105.05312
Code: https://github.com/danganea/iMTFA

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

Paper: https://arxiv.org/abs/2105.03186
Code: None

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Paper: https://arxiv.org/abs/2104.08569
Code: https://github.com/zhanggang001/RefineMask/

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Paper: https://arxiv.org/abs/2104.05239
Code: https://github.com/tinyalpha/BPR

Multi-Scale Aligned Distillation for Low-Resolution Detection

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Homepage: https://bowenc0221.github.io/boundary-iou/
Paper: https://arxiv.org/abs/2103.16562
Code: https://github.com/bowenc0221/boundary-iou-api

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Paper: https://arxiv.org/abs/2103.12340
Code: https://github.com/lkeab/BCNet

Zero-shot instance segmentation（Not Sure）

Paper: None
Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
Code: https://github.com/MinghanLi/STMask

End-to-End Video Instance Segmentation with Transformers

Paper(Oral): https://arxiv.org/abs/2011.14503
Code: https://github.com/Epiphqny/VisTR

全景分割(Panoptic Segmentation)

Part-aware Panoptic Segmentation

Paper: https://arxiv.org/abs/2106.06351
Code: https://github.com/tue-mps/panoptic_parts
Dataset: https://github.com/tue-mps/panoptic_parts

Exemplar-Based Open-Set Panoptic Segmentation Network

Homepage: https://cv.snu.ac.kr/research/EOPSN/
Paper: https://arxiv.org/abs/2105.08336
Code: https://github.com/jd730/EOPSN

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

Panoptic Segmentation Forecasting

Paper: https://arxiv.org/abs/2104.03962
Code: https://github.com/nianticlabs/panoptic-forecasting

Fully Convolutional Networks for Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.00720
Code: https://github.com/yanwei-li/PanopticFCN

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

Paper: https://arxiv.org/abs/2103.02584
Code: None

医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Paper: https://arxiv.org/abs/2103.06030
Code: https://github.com/liuquande/FedDG-ELCFS

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

Paper(Oral): https://arxiv.org/abs/2103.15954
Code: None

视频目标分割(Video-Object-Segmentation)

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Paper: https://arxiv.org/abs/2104.04329
Code: None

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

Paper(Oral): https://arxiv.org/abs/2101.08833
Code: https://github.com/dukebw/SSTVOS

交互式视频目标分割(Interactive-Video-Object-Segmentation)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

Homepage: https://hkchengrex.github.io/MiVOS/
Paper: https://arxiv.org/abs/2103.07941
Code: https://github.com/hkchengrex/MiVOS
Demo: https://hkchengrex.github.io/MiVOS/video.html#partb

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

Paper: https://arxiv.org/abs/2103.10391
Code: https://github.com/svip-lab/IVOS-W

显著性检测(Saliency Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

Paper(Oral): https://arxiv.org/abs/2103.11832
Code: https://github.com/sunpeng1996/DSA2F

伪装物体检测(Camouflaged Object Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

Paper: https://arxiv.org/abs/2104.02628
Code: https://github.com/JingZhang617/Joint_COD_SOD

协同显著性检测(Co-Salient Object Detection)

Group Collaborative Learning for Co-Salient Object Detection

Paper: https://arxiv.org/abs/2104.01108
Code: https://github.com/fanq15/GCoNet

协同显著性检测(Image Matting)

Semantic Image Matting

行人重识别(Person Re-identification)

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Paper: https://arxiv.org/abs/2105.09156
Code: None

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

Paper: https://arxiv.org/abs/2104.12961
Code: None

Combined Depth Space based Architecture Search For Person Re-identification

Paper: https://arxiv.org/abs/2104.04163
Code: None

行人搜索(Person Search)

Anchor-Free Person Search

Paper: https://arxiv.org/abs/2103.11617
Code: https://github.com/daodaofr/AlignPS
Interpretation: 首个无需锚框（Anchor-Free）的行人搜索框架 | CVPR 2021

视频理解/行为识别(Video Understanding)

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Paper: https://arxiv.org/abs/2101.06184
Code: https://github.com/tobyperrett/trx

FrameExit: Conditional Early Exiting for Efficient Video Recognition

Paper(Oral): https://arxiv.org/abs/2104.13400
Code: None

No frame left behind: Full Video Action Recognition

Paper: https://arxiv.org/abs/2103.15395
Code: None

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Paper: https://arxiv.org/abs/2103.13137
Code: None

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Paper: https://arxiv.org/abs/2103.13141
Code: None
Interpretation: CVPR 2021 | TCANet：最强时序动作提名修正网络

ACTION-Net: Multipath Excitation for Action Recognition

Paper: https://arxiv.org/abs/2103.07372
Code: https://github.com/V-Sense/ACTION-Net

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
Paper: https://arxiv.org/abs/2009.05769
Code: https://github.com/FingerRec/BE

TDN: Temporal Difference Networks for Efficient Action Recognition

Paper: https://arxiv.org/abs/2012.10071
Code: https://github.com/MCG-NJU/TDN

人脸识别(Face Recognition)

A 3D GAN for Improved Large-pose Facial Recognition

Paper: https://arxiv.org/abs/2012.10545
Code: None

MagFace: A Universal Representation for Face Recognition and Quality Assessment

Paper(Oral): https://arxiv.org/abs/2103.06627
Code: https://github.com/IrvingMeng/MagFace

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Homepage: https://www.face-benchmark.org/
Paper: https://arxiv.org/abs/2103.04098
Dataset: https://www.face-benchmark.org/

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Paper(Oral): https://arxiv.org/abs/2103.01520
Code: https://github.com/Hzzone/MTLFace
Dataset: https://github.com/Hzzone/MTLFace

人脸检测(Face Detection)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection

Homepage: https://daooshee.github.io/HLA-Face-Website/
Paper: https://arxiv.org/abs/2104.01984
Code: https://github.com/daooshee/HLA-Face-Code

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

Paper: https://arxiv.org/abs/2103.07017
Code: None

人脸活体检测(Face Anti-Spoofing)

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

Paper: https://arxiv.org/abs/2103.00948
Code: None

Deepfake检测(Deepfake Detection)

Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

Paper：https://arxiv.org/abs/2103.01856
Code: None

Multi-attentional Deepfake Detection

Paper：https://arxiv.org/abs/2103.02406
Code: None

人脸年龄估计(Age Estimation)

Continuous Face Aging via Self-estimated Residual Age Embedding

Paper: https://arxiv.org/abs/2105.00020
Code: None

PML: Progressive Margin Loss for Long-tailed Age Classification

Paper: https://arxiv.org/abs/2103.02140
Code: None

人脸表情识别(Facial Expression Recognition)

Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

Paper: https://arxiv.org/abs/2103.13372
Code: None

Deepfakes

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

Paper: https://arxiv.org/abs/2103.14211
Code: None

人体解析(Human Parsing)

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Paper: https://arxiv.org/abs/2103.04570
Code: https://github.com/tfzhou/MG-HumanParsing

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

Paper: ttps://arxiv.org/abs/2105.10154
Code: None

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

Paper: https://arxiv.org/abs/2105.06152
Code: None

Pose Recognition with Cascade Transformers

Paper: https://arxiv.org/abs/2104.06976
Code: https://github.com/mlpc-ucsd/PRTR

DCPose: Deep Dual Consecutive Network for Human Pose Estimation

Paper: https://arxiv.org/abs/2103.07254
Code: https://github.com/Pose-Group/DCPose

3D 人体姿态估计

End-to-End Human Pose and Mesh Reconstruction with Transformers

Paper: https://arxiv.org/abs/2012.09760
Code: https://github.com/microsoft/MeshTransformer

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

Paper(Oral): https://arxiv.org/abs/2105.02465
Code: https://github.com/jfzhang95/PoseAug

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

Paper: https://arxiv.org/abs/2103.02845
Code: https://github.com/SeanChenxy/HandMesh

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

Homepage: https://jeffli.site/HybrIK/
Paper: https://arxiv.org/abs/2011.14672
Code: https://github.com/Jeff-sjtu/HybrIK

动物姿态估计(Animal Pose Estimation)

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

Paper: https://arxiv.org/abs/2103.14843
Code: None

Human Volumetric Capture

POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture

Homepage: http://www.liuyebin.com/posefusion/posefusion.html
Paper(Oral): https://arxiv.org/abs/2103.15331
Code: None

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

Paper: https://arxiv.org/abs/2104.10442
Code: None

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Paper: https://arxiv.org/abs/2103.06495
Code: https://github.com/FangShancheng/ABINet

图像压缩

Checkerboard Context Model for Efficient Learned Image Compression

Paper: https://arxiv.org/abs/2103.15306
Code: None

Slimmable Compressive Autoencoders for Practical Neural Image Compression

Paper: https://arxiv.org/abs/2103.15726
Code: None

Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton

Paper: https://arxiv.org/abs/2103.15368
Code: None

模型压缩/剪枝/量化

Teachers Do More Than Teach: Compressing Image-to-Image Models

Paper: https://arxiv.org/abs/2103.03467
Code: https://github.com/snap-research/CAT

模型剪枝

Dynamic Slimmable Network

Paper: https://arxiv.org/abs/2103.13258
Code: https://github.com/changlin31/DS-Net

模型量化

Network Quantization with Element-wise Gradient Scaling

Paper: https://arxiv.org/abs/2104.00903
Code: None

Zero-shot Adversarial Quantization

Paper(Oral): https://arxiv.org/abs/2103.15263
Code: https://git.io/Jqc0y

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Paper: https://arxiv.org/abs/2103.07156
Code: None

知识蒸馏(Knowledge Distillation)

Distilling Knowledge via Knowledge Review

Paper: https://arxiv.org/abs/2104.09044
Code: https://github.com/Jia-Research-Lab/ReviewKD

Distilling Object Detectors via Decoupled Features

Paper: https://arxiv.org/abs/2103.14475
Code: https://github.com/ggjy/DeFeat.pytorch

超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

Paper: https://arxiv.org/abs/2103.04039
Code: https://github.com/Xiangtaokong/ClassSR

AdderSR: Towards Energy Efficient Image Super-Resolution

Paper: https://arxiv.org/abs/2009.08891
Code: None

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

Paper: https://arxiv.org/abs/2104.09367
Code: https://github.com/GlassyWu/AECR-Net

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

Paper: None
Code: https://github.com/CS-GangXu/TMNet

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

Paper: https://arxiv.org/abs/2102.02808
Code: https://github.com/swz30/MPRNet

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

Paper: https://arxiv.org/abs/2105.02201
Code: https://github.com/KumapowerLIU/PD-GAN

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

Homepage: https://yzhouas.github.io/projects/TransFill/index.html
Paper: https://arxiv.org/abs/2103.15982
Code: None

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: https://arxiv.org/abs/2104.14754
Code: https://github.com/naver-ai/StyleMapGAN
Demo Video: https://youtu.be/qCapNyRA_Ng

High-Fidelity and Arbitrary Face Editing

Paper: https://arxiv.org/abs/2103.15814
Code: None

Anycost GANs for Interactive Image Synthesis and Editing

Paper: https://arxiv.org/abs/2103.03243
Code: https://github.com/mit-han-lab/anycost-gan

PISE: Person Image Synthesis and Editing with Decoupled GAN

Paper: https://arxiv.org/abs/2103.04023
Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Paper: http://raywzy.com/
Code: http://raywzy.com/

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

Paper: None
Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

Paper: https://arxiv.org/abs/2105.03236
Code: None

字体生成(Font Generation)

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

Paper: https://arxiv.org/abs/2104.03064
Code: https://github.com/ecnuycxie/DG-Font

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Homepage: https://zju3dv.github.io/loftr/
Paper: https://arxiv.org/abs/2104.00680
Code: https://github.com/zju3dv/LoFTR

Convolutional Hough Matching Networks

Homapage: http://cvlab.postech.ac.kr/research/CHM/
Paper(Oral): https://arxiv.org/abs/2103.16831
Code: None

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

反光去除(Reflection Removal)

Robust Reflection Removal with Reflection-free Flash-only Cues

3D点云分类(3D Point Clouds Classification)

Equivariant Point Network for 3D Point Cloud Analysis

Paper: https://arxiv.org/abs/2103.14147
Code: None

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

Paper: https://arxiv.org/abs/2103.14635
Code: https://github.com/CVMI-Lab/PAConv

3D目标检测(3D Object Detection)

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Paper: https://arxiv.org/abs/2104.06114
Code: https://github.com/cheng052/BRNet

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/
Paper: https://arxiv.org/abs/2104.00902
Code: https://github.com/cvlab-yonsei/HVPR

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

Paper: https://arxiv.org/abs/2103.15297
Code: https://github.com/tusimple/LiDAR_RCNN

M3DSSD: Monocular 3D Single Stage Object Detector

Paper: https://arxiv.org/abs/2103.13164
Code: https://github.com/mumianyuxin/M3DSSD

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud

Paper: None
Code: https://github.com/Vegeta2020/SE-SSD

Center-based 3D Object Detection and Tracking

Paper: https://arxiv.org/abs/2006.11275
Code: https://github.com/tianweiy/CenterPoint

Categorical Depth Distribution Network for Monocular 3D Object Detection

Paper: https://arxiv.org/abs/2103.01100
Code: None

3D语义分割(3D Semantic Segmentation)

Bidirectional Projection Network for Cross Dimension Scene Understanding

Paper(Oral): https://arxiv.org/abs/2103.14326
Code: https://github.com/wbhu/BPNet

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

Paper: https://arxiv.org/abs/2103.07074
Code: https://github.com/ShiQiu0419/BAAF-Net

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

Paper: https://arxiv.org/abs/2011.10033
Code: https://github.com/xinge008/Cylinder3D

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

Homepage: https://github.com/QingyongHu/SensatUrban
Paper: http://arxiv.org/abs/2009.03137
Code: https://github.com/QingyongHu/SensatUrban
Dataset: https://github.com/QingyongHu/SensatUrban

3D全景分割(3D Panoptic Segmentation)

Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation

Paper: https://arxiv.org/abs/2103.14962
Code: https://github.com/edwardzhou130/Panoptic-PolarNet

3D目标跟踪(3D Object Trancking)

Center-based 3D Object Detection and Tracking

Paper: https://arxiv.org/abs/2006.11275
Code: https://github.com/tianweiy/CenterPoint

3D点云配准(3D Point Cloud Registration)

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

Paper: https://arxiv.org/abs/2103.15231
Code: None

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

Paper: https://arxiv.org/abs/2103.05465
Code: https://github.com/XuyangBai/PointDSC

PREDATOR: Registration of 3D Point Clouds with Low Overlap

Paper: https://arxiv.org/abs/2011.13005
Code: https://github.com/ShengyuH/OverlapPredator

3D点云补全(3D Point Cloud Completion)

Unsupervised 3D Shape Completion through GAN Inversion

Homepage: https://junzhezhang.github.io/projects/ShapeInversion/
Paper: https://arxiv.org/abs/2104.13366
Code: https://github.com/junzhezhang/shape-inversion

Variational Relational Point Completion Network

Homepage: https://paul007pl.github.io/projects/VRCNet
Paper: https://arxiv.org/abs/2104.10154
Code: https://github.com/paul007pl/VRCNet

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion

Homepage: https://alphapav.github.io/SpareNet/
Paper: https://arxiv.org/abs/2103.02535
Code: https://github.com/microsoft/SpareNet

3D重建(3D Reconstruction)

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Paper: https://arxiv.org/abs/2104.00858
Code: None

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

Homepage: https://zju3dv.github.io/neuralrecon/
Paper(Oral): https://arxiv.org/abs/2104.00681
Code: https://github.com/zju3dv/NeuralRecon

6D位姿估计(6D Pose Estimation)

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Paper(Oral): https://arxiv.org/abs/2103.07054
Code: https://github.com/DC1991/FS-Net

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Paper: http://arxiv.org/abs/2102.12145
code: https://git.io/GDR-Net

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

Paper: https://arxiv.org/abs/2103.02242
Code: https://github.com/ethnhe/FFB6D

相机姿态估计

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

Paper: https://arxiv.org/abs/2103.09213
Code: https://github.com/cvg/pixloc

深度估计(Depth Estimation)

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

Paper(Oral): https://arxiv.org/abs/2104.00877
Code: None

Beyond Image to Depth: Improving Depth Prediction using Echoes

Homepage: https://krantiparida.github.io/projects/bimgdepth.html
Paper: https://arxiv.org/abs/2103.08468
Code: https://github.com/krantiparida/beyond-image-to-depth

S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation

Paper: https://arxiv.org/abs/2103.02396
Code: None

Depth from Camera Motion and Object Detection

Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD

立体匹配(Stereo Matching)

A Decomposition Model for Stereo Matching

Paper: https://arxiv.org/abs/2104.07516
Code: None

光流估计(Flow Estimation)

Self-Supervised Multi-Frame Monocular Scene Flow

Paper: https://arxiv.org/abs/2105.02216
Code: https://github.com/visinf/multi-mono-sf

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

Paper: https://arxiv.org/abs/2012.00726v1
Code: None

Learning Optical Flow From Still Images

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

Paper: https://arxiv.org/abs/2104.00798
Code: None

车道线检测(Lane Detection)

Focus on Local: Detecting Lane Marker from Bottom Up via Key Point

Paper: https://arxiv.org/abs/2105.13680
Code: None

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

Paper: https://arxiv.org/abs/2010.12035
Code: https://github.com/lucastabelini/LaneATT

轨迹预测(Trajectory Prediction)

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

Paper(Oral): https://arxiv.org/abs/2104.08277
Code: None

人群计数(Crowd Counting)

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd

对抗样本(Adversarial Examples)

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

Paper: https://arxiv.org/abs/2103.15571
Code: https://github.com/JHL-HUST/VT

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Paper: https://arxiv.org/abs/2103.14835
Code: None

Natural Adversarial Examples

Paper: https://arxiv.org/abs/1907.07174
Code: https://github.com/hendrycks/natural-adv-examples

图像检索(Image Retrieval)

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

Paper: https://arxiv.org/abs/2103.15706
COde: None

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

Paper: https://arxiv.org/abs/2103.02927
Code: None

视频检索(Video Retrieval)

On Semantic Similarity in Video Retrieval

Paper: https://arxiv.org/abs/2103.10095
Homepage: https://mwray.github.io/SSVR/
Code: https://github.com/mwray/Semantic-Video-Retrieval

跨模态检索(Cross-modal Retrieval)

Cross-Modal Center Loss for 3D Cross-Modal Retrieval

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Paper: https://arxiv.org/abs/2103.16553
Code: None

Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

Zero-Shot Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

Paper: https://arxiv.org/abs/2103.00887
Code: https://github.com/yue-zhongqi/gcm-cf

联邦学习(Federated Learning)

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

Paper: https://arxiv.org/abs/2103.06030
Code: https://github.com/liuquande/FedDG-ELCFS

视频插帧(Video Frame Interpolation)

CDFI: Compression-Driven Network Design for Frame Interpolation

Paper: None
Code: https://github.com/tding1/CDFI

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

Homepage: https://tarun005.github.io/FLAVR/
Paper: https://arxiv.org/abs/2012.08512
Code: https://github.com/tarun005/FLAVR

视觉推理(Visual Reasoning)

Transformation Driven Visual Reasoning

homepage: https://hongxin2019.github.io/TVR/
Paper: https://arxiv.org/abs/2011.13160
Code: https://github.com/hughplay/TVR

图像合成(Image Synthesis)

Taming Transformers for High-Resolution Image Synthesis

Homepage: https://compvis.github.io/taming-transformers/
Paper(Oral): https://arxiv.org/abs/2012.09841
Code: https://github.com/CompVis/taming-transformers

视图合成(View Synthesis)

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/
Paper: https://arxiv.org/abs/2104.06935

Self-Supervised Visibility Learning for Novel View Synthesis

Paper: https://arxiv.org/abs/2103.15407
Code: None

NeX: Real-time View Synthesis with Neural Basis Expansion

Homepage: https://nex-mpi.github.io/
Paper(Oral): https://arxiv.org/abs/2103.05606

风格迁移(Style Transfer)

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

Paper: https://arxiv.org/abs/2104.05376
Code: https://github.com/PaddlePaddle/PaddleGAN/

布局生成(Layout Generation)

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

Paper: None
Code: None

Variational Transformer Networks for Layout Generation

Paper: https://arxiv.org/abs/2104.02416
Code: None

Domain Generalization

Generalizable Person Re-identification with Relevance-aware Mixture of Experts

Paper: https://arxiv.org/abs/2105.09156
Code: None

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Paper: https://arxiv.org/abs/2103.15597
Code: https://github.com/shachoi/RobustNet

Adaptive Methods for Real-World Domain Generalization

Paper: https://arxiv.org/abs/2103.15796
Code: None

FSDR: Frequency Space Domain Randomization for Domain Generalization

Paper: https://arxiv.org/abs/2103.02370
Code: None

Domain Adaptation

Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation

Paper: https://arxiv.org/abs/2104.00808
Code: None

Domain Consensus Clustering for Universal Domain Adaptation

Open-Set

Towards Open World Object Detection

Paper(Oral): https://arxiv.org/abs/2103.02603
Code: https://github.com/JosephKJ/OWOD

Exemplar-Based Open-Set Panoptic Segmentation Network

Homepage: https://cv.snu.ac.kr/research/EOPSN/
Paper: https://arxiv.org/abs/2105.08336
Code: https://github.com/jd730/EOPSN

Learning Placeholders for Open-Set Recognition

Paper(Oral): https://arxiv.org/abs/2103.15086
Code: None

Adversarial Attack

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Paper: https://arxiv.org/abs/2103.14938
Code: https://github.com/VISION-SJTU/IoUattack

"人-物"交互(HOI)检测

HOTR: End-to-End Human-Object Interaction Detection with Transformers

Paper: https://arxiv.org/abs/2104.13682
Code: None

Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

Paper: https://arxiv.org/abs/2103.05399
Code: https://github.com/hitachi-rd-cv/qpic

Reformulating HOI Detection as Adaptive Set Prediction

Paper: https://arxiv.org/abs/2103.05983
Code: https://github.com/yoyomimi/AS-Net

Detecting Human-Object Interaction via Fabricated Compositional Learning

Paper: https://arxiv.org/abs/2103.08214
Code: https://github.com/zhihou7/FCL

End-to-End Human Object Interaction Detection with HOI Transformer

Paper: https://arxiv.org/abs/2103.04503
Code: https://github.com/bbepoch/HoiTransformer

阴影去除(Shadow Removal)

Auto-Exposure Fusion for Single-Image Shadow Removal

虚拟换衣(Virtual Try-On)

Parser-Free Virtual Try-on via Distilling Appearance Flows

基于外观流蒸馏的无需人体解析的虚拟换装

Paper: https://arxiv.org/abs/2103.04559
Code: https://github.com/geyuying/PF-AFN

标签噪声(Label Noise)

A Second-Order Approach to Learning with Instance-Dependent Label Noise

Paper(Oral): https://arxiv.org/abs/2012.11854
Code: https://github.com/UCSC-REAL/CAL

数据集(Datasets)

Part-aware Panoptic Segmentation

Paper: https://arxiv.org/abs/2106.06351
Code: https://github.com/tue-mps/panoptic_parts
Dataset: https://github.com/tue-mps/panoptic_parts

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Homepage: https://www.yasamin.page/hdnet_tiktok
Paper(Oral): https://arxiv.org/abs/2103.03319
Code: https://github.com/yasaminjafarian/HDNet_TikTok
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Paper: https://arxiv.org/abs/2105.09188
Code: https://github.com/csjliang/LPTN
Dataset: https://github.com/csjliang/LPTN

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Paper: https://arxiv.org/abs/2105.02440
Code: https://github.com/VisDrone/DroneCrowd
Dataset: https://github.com/VisDrone/DroneCrowd

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
Paper(Oral): https://arxiv.org/abs/2104.12690
Code: https://github.com/fidler-lab/efficient-annotation-cookbook

论文下载链接：

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Paper: https://arxiv.org/abs/2012.05258
Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Learning To Count Everything

Semantic Image Matting

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Homepage: http://mepro.bjtu.edu.cn/resource.html
Paper: https://arxiv.org/abs/2104.06174
Code: None

Visual Semantic Role Labeling for Video Understanding

Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Homepage: https://www.vspwdataset.com/
Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
GitHub: https://github.com/sssdddwww2/vspw_dataset_download

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10619

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Homepage: https://vap.aau.dk/sewer-ml/
Paper: https://arxiv.org/abs/2103.10895

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Paper: https://arxiv.org/abs/2103.03375
Dataset: None

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

Homepage: https://github.com/QingyongHu/SensatUrban
Paper: http://arxiv.org/abs/2009.03137
Code: https://github.com/QingyongHu/SensatUrban
Dataset: https://github.com/QingyongHu/SensatUrban

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Paper(Oral): https://arxiv.org/abs/2103.01520
Code: https://github.com/Hzzone/MTLFace
Dataset: https://github.com/Hzzone/MTLFace

Depth from Camera Motion and Object Detection

Paper: https://arxiv.org/abs/2103.01468
Code: https://github.com/griffbr/ODMD
Dataset: https://github.com/griffbr/ODMD

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

其他(Others)

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

Homepage: https://www.yasamin.page/hdnet_tiktok
Paper(Oral): https://arxiv.org/abs/2103.03319
Code: https://github.com/yasaminjafarian/HDNet_TikTok
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v

Omnimatte: Associating Objects and Their Effects in Video

Homepage: https://omnimatte.github.io/
Paper(Oral): https://arxiv.org/abs/2105.06993
Code: https://omnimatte.github.io/#code

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
Paper(Oral): https://arxiv.org/abs/2104.12690
Code: https://github.com/fidler-lab/efficient-annotation-cookbook

Motion Representations for Articulated Animation

Deep Lucas-Kanade Homography for Multimodal Image Alignment

Skip-Convolutions for Efficient Video Processing

Paper: https://arxiv.org/abs/2104.11487
Code: None

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

Homepage: http://tomasjakab.github.io/KeypointDeformer
Paper(Oral): https://arxiv.org/abs/2104.11224
Code: https://github.com/tomasjakab/keypoint_deformer/

Learning To Count Everything

SOLD2: Self-supervised Occlusion-aware Line Description and Detection

Paper(Oral): https://arxiv.org/abs/2104.03362
Code: https://github.com/cvg/SOLD2

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

Homepage: https://li-wanhua.github.io/POEs/
Paper: https://arxiv.org/abs/2103.13629
Code: https://github.com/Li-Wanhua/POEs

LEAP: Learning Articulated Occupancy of People

Paper: https://arxiv.org/abs/2104.06849
Code: None

Visual Semantic Role Labeling for Video Understanding

Homepage: https://vidsitu.org/
Paper: https://arxiv.org/abs/2104.00990
Code: https://github.com/TheShadow29/VidSitu
Dataset: https://github.com/TheShadow29/VidSitu

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Paper: https://arxiv.org/abs/2104.00946
Code: https://github.com/SUTDCV/UAV-Human

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Paper(Oral): https://arxiv.org/abs/2104.00924
Code: None

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

Paper: https://arxiv.org/abs/2104.00858
Code: None

Towards High Fidelity Face Relighting with Realistic Shadows

Paper: https://arxiv.org/abs/2104.00825
Code: None

BRepNet: A topological message passing system for solid models

Paper(Oral): https://arxiv.org/abs/2104.00706
Code: None

Visually Informed Binaural Audio Generation without Binaural Audios

Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
Paper: None
GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc

Exploring intermediate representation for monocular vehicle pose estimation

Paper: None
Code: https://github.com/Nicholasli1995/EgoNet

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

Paper(Oral): https://arxiv.org/abs/2103.14708
Code: None

Invertible Image Signal Processing

Paper: https://arxiv.org/abs/2103.15061
Code: https://github.com/yzxing87/Invertible-ISP

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

Paper: https://arxiv.org/abs/2103.14858
Code: None

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

Paper: https://arxiv.org/abs/2103.14898
Code: None

Embedding Transfer with Label Relaxation for Improved Metric Learning

Paper: https://arxiv.org/abs/2103.14908
Code: None

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

Paper: https://arxiv.org/abs/2103.15076
Code: https://github.com/hlei-ziyan/Picasso

Meta-Mining Discriminative Samples for Kinship Verification

Paper: https://arxiv.org/abs/2103.15108
Code: None

Cloud2Curve: Generation and Vectorization of Parametric Sketches

Paper: https://arxiv.org/abs/2103.15536
Code: None

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Paper: https://arxiv.org/abs/2103.15538
Code: https://github.com/SUTDCV/SUTD-TrafficQA

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Homepage: http://wellyzhang.github.io/project/prae.html
Paper: https://arxiv.org/abs/2103.14230
Code: None

ACRE: Abstract Causal REasoning Beyond Covariation

Homepage: http://wellyzhang.github.io/project/acre.html
Paper: https://arxiv.org/abs/2103.14232
Code: None

Confluent Vessel Trees with Accurate Bifurcations

Paper: https://arxiv.org/abs/2103.14268
Code: None

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

Homepage: https://paschalidoud.github.io/neural_parts
Paper: None
Code: https://github.com/paschalidoud/neural_parts

Knowledge Evolution in Neural Networks

Paper(Oral): https://arxiv.org/abs/2103.05152
Code: https://github.com/ahmdtaha/knowledge_evolution

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Paper: https://arxiv.org/abs/2103.02148
Code: https://github.com/guopengf/FLMRCM

SGP: Self-supervised Geometric Perception

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

Paper: https://arxiv.org/abs/2103.02148
Code: https://github.com/guopengf/FLMRCM

Diffusion Probabilistic Models for 3D Point Cloud Generation

Paper: https://arxiv.org/abs/2103.01458
Code: https://github.com/luost26/diffusion-point-cloud

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

Paper: https://arxiv.org/abs/2012.02206
Code: https://github.com/daveredrum/Scan2Cap
Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Paper: https://arxiv.org/abs/2103.01353
Code: http://rl.uni-freiburg.de/research/multimodal-distill
Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

待添加(TODO)

不确定中没中(Not Sure)

CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models

Paper: none
Code: https://github.com/transcendentsky/Film-Recovery

Toward Explainable Reflection Removal with Distilling and Model Uncertainty

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

Paper: none
Code: https://github.com/lhaippp/DeepOIS

Exploring Adversarial Fake Images on Face Manifold

Paper: none
Code: https://github.com/ldz666666/Style-atk

Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task

Temporal Contrastive Graph for Self-supervised Video Representation Learning

Paper: none
Code: https://github.com/YangLiu9208/TCG

Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching

Paper: none
Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr

Fast and Memory-Efficient Compact Bilinear Pooling

Paper: none
Code: https://github.com/cvpr2021kp2/cvpr2021kp2

Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine

Paper: none
Code: https://github.com/gapDetection/cvpr2021

Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation

Paper: none
Code: https://github.com/interactivekeypoint2020/Morph

https://github.com/ShaoQiangShen/CVPR2021

https://github.com/gillesflash/CVPR2021

https://github.com/anonymous-submission1991/BaLeNAS

https://github.com/cvpr2021dcb/cvpr2021dcb

https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578

https://github.com/AldrichZeng/FreqPrune

https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM

https://github.com/ddfss/datadrive-fss

Name		Name	Last commit message	Last commit date
Latest commit History 509 Commits
CVPR2019-Papers-with-Code.md		CVPR2019-Papers-with-Code.md
CVPR2020-Papers-with-Code.md		CVPR2020-Papers-with-Code.md
README.md		README.md

Bruuuuuuce/CVPR2021-Papers-with-Code

Folders and files

Latest commit

History

Repository files navigation

CVPR 2021 论文和开源项目合集(Papers with Code)

【CVPR 2021 论文开源目录】

Backbone

NAS

GAN

VAE

Visual Transformer

Regularization

SLAM

长尾分布(Long-Tailed)

数据增广(Data Augmentation)

无监督/自监督(Un/Self-Supervised)

半监督学习(Semi-Supervised )

胶囊网络(Capsule Network)

图像分类(Image Classification)

2D目标检测(Object Detection)

2D目标检测

Few-Shot目标检测

旋转目标检测

单/多目标跟踪(Object Tracking)

单目标跟踪

多目标跟踪

语义分割(Semantic Segmentation)

弱监督语义分割

半监督语义分割

域自适应语义分割

视频语义分割

实例分割(Instance Segmentation)

视频实例分割

全景分割(Panoptic Segmentation)

医学图像分割

3D医学图像分割

视频目标分割(Video-Object-Segmentation)

交互式视频目标分割(Interactive-Video-Object-Segmentation)

显著性检测(Saliency Detection)

伪装物体检测(Camouflaged Object Detection)

协同显著性检测(Co-Salient Object Detection)

协同显著性检测(Image Matting)

行人重识别(Person Re-identification)

行人搜索(Person Search)

视频理解/行为识别(Video Understanding)

人脸识别(Face Recognition)

人脸检测(Face Detection)

人脸活体检测(Face Anti-Spoofing)

Deepfake检测(Deepfake Detection)

人脸年龄估计(Age Estimation)

人脸表情识别(Facial Expression Recognition)

Deepfakes

人体解析(Human Parsing)

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

3D 人体姿态估计

动物姿态估计(Animal Pose Estimation)

Human Volumetric Capture

场景文本检测(Scene Text Detection)

场景文本识别(Scene Text Recognition)

图像压缩

模型压缩/剪枝/量化

模型剪枝

模型量化

知识蒸馏(Knowledge Distillation)

超分辨率(Super-Resolution)

去雾(Dehazing)

视频超分辨率

图像恢复(Image Restoration)

图像补全(Image Inpainting)

图像编辑(Image Editing)

图像描述(Image Captioning)

字体生成(Font Generation)

图像匹配(Image Matcing)

图像融合(Image Blending)

反光去除(Reflection Removal)

3D点云分类(3D Point Clouds Classification)

3D目标检测(3D Object Detection)

3D语义分割(3D Semantic Segmentation)

Packages