CVPR 2021 论文和开源项目合集(papers with code)!
CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt
注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
CVPR 2021 中奖群已成立!已经收录的同学,可以添加微信:CVer9999,请备注:CVPR2021已收录+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群沟通开会等事宜。
- Backbone
- NAS
- GAN
- VAE
- Visual Transformer
- Regularization
- SLAM
- 长尾分布(Long-Tailed)
- 数据增广(Data Augmentation)
- 无监督/自监督(Self-Supervised)
- 半监督(Semi-Supervised)
- 胶囊网络(Capsule Network)
- 图像分类(Image Classification
- 2D目标检测(Object Detection)
- 单/多目标跟踪(Object Tracking)
- 语义分割(Semantic Segmentation)
- 实例分割(Instance Segmentation)
- 全景分割(Panoptic Segmentation)
- 医学图像分割(Medical Image Segmentation)
- 视频目标分割(Video-Object-Segmentation)
- 交互式视频目标分割(Interactive-Video-Object-Segmentation)
- 显著性检测(Saliency Detection)
- 伪装物体检测(Camouflaged Object Detection)
- 协同显著性检测(Co-Salient Object Detection)
- 图像抠图(Image Matting)
- 行人重识别(Person Re-identification)
- 行人搜索(Person Search)
- 视频理解/行为识别(Video Understanding)
- 人脸识别(Face Recognition)
- 人脸检测(Face Detection)
- 人脸活体检测(Face Anti-Spoofing)
- Deepfake检测(Deepfake Detection)
- 人脸年龄估计(Age-Estimation)
- 人脸表情识别(Facial-Expression-Recognition)
- Deepfakes
- 人体解析(Human Parsing)
- 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
- 动物姿态估计(Animal Pose Estimation)
- Human Volumetric Capture
- 场景文本识别(Scene Text Recognition)
- 图像压缩(Image Compression)
- 模型压缩/剪枝/量化
- 知识蒸馏(Knowledge Distillation)
- 超分辨率(Super-Resolution)
- 去雾(Dehazing)
- 图像恢复(Image Restoration)
- 图像补全(Image Inpainting)
- 图像编辑(Image Editing)
- 图像描述(Image Captioning)
- 字体生成(Font Generation)
- 图像匹配(Image Matching)
- 图像融合(Image Blending)
- 反光去除(Reflection Removal)
- 3D点云分类(3D Point Clouds Classification)
- 3D目标检测(3D Object Detection)
- 3D语义分割(3D Semantic Segmentation)
- 3D全景分割(3D Panoptic Segmentation)
- 3D目标跟踪(3D Object Tracking)
- 3D点云配准(3D Point Cloud Registration)
- 3D点云补全(3D-Point-Cloud-Completion)
- 3D重建(3D Reconstruction)
- 6D位姿估计(6D Pose Estimation)
- 相机姿态估计(Camera Pose Estimation)
- 深度估计(Depth Estimation)
- 立体匹配(Stereo Matching)
- 光流估计(Flow Estimation)
- 车道线检测(Lane Detection)
- 轨迹预测(Trajectory Prediction)
- 人群计数(Crowd Counting)
- 对抗样本(Adversarial-Examples)
- 图像检索(Image Retrieval)
- 视频检索(Video Retrieval)
- 跨模态检索(Cross-modal Retrieval)
- Zero-Shot Learning
- 联邦学习(Federated Learning)
- 视频插帧(Video Frame Interpolation)
- 视觉推理(Visual Reasoning)
- 图像合成(Image Synthesis)
- 视图合成(Visual Synthesis)
- 风格迁移(Style Transfer)
- 布局生成(Layout Generation)
- Domain Generalization
- Domain Adaptation
- Open-Set
- Adversarial Attack
- "人-物"交互(HOI)检测
- 阴影去除(Shadow Removal)
- 虚拟试衣(Virtual Try-On)
- 标签噪声(Label Noise)
- 数据集(Datasets)
- 其他(Others)
- 待添加(TODO)
- 不确定中没中(Not Sure)
BCNet: Searching for Network Width with Bilaterally Coupled Network
- Paper: https://arxiv.org/abs/2105.10533
- Code: None
Decoupled Dynamic Filter Networks
- Homepage: https://thefoxofsky.github.io/project_pages/ddf
- Paper: https://arxiv.org/abs/2104.14107
- Code: https://github.com/thefoxofsky/DDF
Lite-HRNet: A Lightweight High-Resolution Network
CondenseNet V2: Sparse Feature Reactivation for Deep Networks
Diverse Branch Block: Building a Convolution as an Inception-like Unit
Scaling Local Self-Attention For Parameter Efficient Visual Backbones
-
Paper(Oral): https://arxiv.org/abs/2103.12731
-
Code: None
ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network
Involution: Inverting the Inherence of Convolution for Visual Recognition
Coordinate Attention for Efficient Mobile Network Design
Inception Convolution with Efficient Dilation Search
RepVGG: Making VGG-style ConvNets Great Again
BCNet: Searching for Network Width with Bilaterally Coupled Network
- Paper: https://arxiv.org/abs/2105.10533
- Code: None
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
Combined Depth Space based Architecture Search For Person Re-identification
- Paper: https://arxiv.org/abs/2104.04163
- Code: None
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None
HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS
Neural Architecture Search with Random Labels
- Paper: https://arxiv.org/abs/2101.11834
- Code: None
Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search
- Paper: https://arxiv.org/abs/2101.11342
- Code: None
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Paper: https://arxiv.org/abs/2105.12971
- Code: None
Prioritized Architecture Sampling with Monto-Carlo Tree Search
Contrastive Neural Architecture Search with Neural Architecture Comparators
AttentiveNAS: Improving Neural Architecture Search via Attentive
- Paper: https://arxiv.org/abs/2011.09011
- Code: None
ReNAS: Relativistic Evaluation of Neural Architecture Search
- Paper: https://arxiv.org/abs/1910.01523
- Code: None
HourNAS: Extremely Fast Neural Architecture
- Paper: https://arxiv.org/abs/2005.14446
- Code: None
Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
Inception Convolution with Efficient Dilation Search
- Paper: https://arxiv.org/abs/2012.13587
- Code: None
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
Regularizing Generative Adversarial Networks under Limited Data
- Homepage: https://hytseng0509.github.io/lecam-gan/
- Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf
- Code: https://github.com/google/lecam-gan
Towards Real-World Blind Face Restoration with Generative Facial Prior
- Paper: https://arxiv.org/abs/2101.04061
- Code: None
TediGAN: Text-Guided Diverse Image Generation and Manipulation
-
Homepage: https://xiaweihao.com/projects/tedigan/
Generative Hierarchical Features from Synthesizing Image
-
Homepage: https://genforce.github.io/ghfeat/
-
Paper(Oral): https://arxiv.org/abs/2007.10379
Teachers Do More Than Teach: Compressing Image-to-Image Models
HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
-
Paper(Oral): https://arxiv.org/abs/2012.00926
-
Code: None
DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network
- Paper: https://arxiv.org/abs/2103.07893
- Code: None
Diverse Semantic Image Synthesis via Probability Distribution Modeling
LOHO: Latent Optimization of Hairstyles via Orthogonalization
- Paper: https://arxiv.org/abs/2103.03891
- Code: None
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
Efficient Conditional GAN Transfer with Knowledge Propagation across Classes
- Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes
- Code: http://github.com/mshahbazi72/cGANTransfer
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs
- Paper: https://arxiv.org/abs/2011.14107
- Code: None
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Homepage: https://eladrich.github.io/pixel2style2pixel/
- Paper: https://arxiv.org/abs/2008.00951
- Code: https://github.com/eladrich/pixel2style2pixel
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
HumanGAN: A Generative Model of Humans Images
- Paper: https://arxiv.org/abs/2103.06902
- Code: None
ID-Unet: Iterative Soft and Hard Deformation for View Synthesis
CoMoGAN: continuous model-guided image-to-image translation
- Paper(Oral): https://arxiv.org/abs/2103.06879
- Code: https://github.com/cv-rits/CoMoGAN
Training Generative Adversarial Networks in One Stage
- Paper: https://arxiv.org/abs/2103.00430
- Code: None
Closed-Form Factorization of Latent Semantics in GANs
- Homepage: https://genforce.github.io/sefa/
- Paper(Oral): https://arxiv.org/abs/2007.06600
- Code: https://github.com/genforce/sefa
Anycost GANs for Interactive Image Synthesis and Editing
Image-to-image Translation via Hierarchical Style Disentanglement
Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders
1. End-to-End Human Pose and Mesh Reconstruction with Transformers
2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
4. HOTR: End-to-End Human-Object Interaction Detection with Transformers
5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
6. Pose Recognition with Cascade Transformers
7. Variational Transformer Networks for Layout Generation
- Paper: https://arxiv.org/abs/2104.02416
- Code: None
8. LoFTR: Detector-Free Local Feature Matching with Transformers
- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR
9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Paper: https://arxiv.org/abs/2103.16553
- Code: None
11. Transformer Tracking
12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
- Paper(Oral): None
- Code: https://github.com/dingmyu/HR-NAS
13. MIST: Multiple Instance Spatial Transformer
- Paper: https://arxiv.org/abs/1811.10725
- Code: None
14. Multimodal Motion Prediction with Stacked Transformers
15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
-
Paper(Oral): https://arxiv.org/abs/2103.11681
17. Pre-Trained Image Processing Transformer
- Paper: https://arxiv.org/abs/2012.00364
- Code: None
18. End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
20. End-to-End Human Object Interaction Detection with HOI Transformer
21. Transformer Interpretability Beyond Attention Visualization
- Paper: https://arxiv.org/abs/2012.09838
- Code: https://github.com/hila-chefer/Transformer-Explainability
22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
- Paper: None
- Code: None
23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
- Paper: None
- Code: None
24. Line Segment Detection Using Transformers without Edges
- Paper(Oral): https://arxiv.org/abs/2101.01909
- Code: None
25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None
26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS
27. Facial Action Unit Detection With Transformers
- Paper: None
- Code: None
28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
- Paper: None
- Code: None
29. Lesion-Aware Transformers for Diabetic Retinopathy Grading
- Paper: None
- Code: None
30. Topological Planning With Transformers for Vision-and-Language Navigation
- Paper: https://arxiv.org/abs/2012.05292
- Code: None
31. Adaptive Image Transformer for One-Shot Object Detection
- Paper: None
- Code: None
32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
- Paper: None
- Code: None
33. Taming Transformers for High-Resolution Image Synthesis
- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers
34. Self-Supervised Video Hashing via Bidirectional Transformers
- Paper: None
- Code: None
35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
- Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
- Code: None
36. Gaussian Context Transformer
- Paper: None
- Code: None
37. General Multi-Label Image Classification With Transformers
- Paper: https://arxiv.org/abs/2011.14027
- Code: None
38. Bottleneck Transformers for Visual Recognition
- Paper: https://arxiv.org/abs/2101.11605
- Code: None
39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
- Paper(Oral): https://arxiv.org/abs/2011.13922
- Code: https://github.com/YicongHong/Recurrent-VLN-BERT
40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
- Paper(Oral): https://arxiv.org/abs/2102.06183
- Code: https://github.com/jayleicn/ClipBERT
41. Self-attention based Text Knowledge Mining for Text Detection
- Paper: None
- Code: https://github.com/CVI-SZU/STKM
42. SSAN: Separable Self-Attention Network for Video Representation Learning
- Paper: None
- Code: None
43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
-
Paper(Oral): https://arxiv.org/abs/2103.12731
-
Code: None
Regularizing Neural Networks via Adversarial Model Perturbation
Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation
- Paper: https://arxiv.org/abs/2105.07593
- Code: None
Generalizing to the Open World: Deep Visual Odometry with Online Adaptation
Adversarial Robustness under Long-Tailed Distribution
- Paper(Oral): https://arxiv.org/abs/2104.02703
- Code: https://github.com/wutong16/Adversarial_Long-Tail
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition
Adaptive Class Suppression Loss for Long-Tail Object Detection
Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
- Paper: https://arxiv.org/abs/2103.14267
- Code: None
Scale-aware Automatic Augmentation for Object Detection
Domain-Specific Suppression for Adaptive Object Detection
- Paper: https://arxiv.org/abs/2105.03570
- Code: None
A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
- Paper: https://arxiv.org/abs/2104.12961
- Code: None
Self-supervised Video Representation Learning by Context and Motion Decoupling
- Paper: https://arxiv.org/abs/2104.00862
- Code: None
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
Spatially Consistent Representation Learning
- Paper: https://arxiv.org/abs/2103.06122
- Code: None
VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples
Exploring Simple Siamese Representation Learning
- Paper(Oral): https://arxiv.org/abs/2011.10566
- Code: None
Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- Paper(Oral): https://arxiv.org/abs/2011.09157
- Code: https://github.com/WXinlong/DenseCL
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning
- Paper: https://arxiv.org/abs/2103.02193
- Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning
Capsule Network is Not More Robust than Convolutional Network
- Paper: https://arxiv.org/abs/2103.15459
- Code: None
Correlated Input-Dependent Label Noise in Large-Scale Image Classification
- Paper(Oral): https://arxiv.org/abs/2105.10305
- Code: https://github.com/google/uncertainty-baselines/tree/master/baselines/imagenet
Dynamic Head: Unifying Object Detection Heads with Attentions
Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Paper: https://arxiv.org/abs/2105.12971
- Code: None
PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
- Paper: https://arxiv.org/abs/2105.12990
- Code: None
Domain-Specific Suppression for Adaptive Object Detection
- Paper: https://arxiv.org/abs/2105.03570
- Code: None
IQDet: Instance-wise Quality Distribution Sampling for Object Detection
- Paper: https://arxiv.org/abs/2104.06936
- Code: None
Multi-Scale Aligned Distillation for Low-Resolution Detection
Adaptive Class Suppression Loss for Long-Tail Object Detection
VarifocalNet: An IoU-aware Dense Object Detector
-
Paper(Oral): https://arxiv.org/abs/2008.13367
Scale-aware Automatic Augmentation for Object Detection
OTA: Optimal Transport Assignment for Object Detection
Distilling Object Detectors via Decoupled Features
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: https://rl.uni-freiburg.de/
- Paper: https://arxiv.org/abs/2103.01353
- Code: None
Positive-Unlabeled Data Purification in the Wild for Object Detection
- Paper: None
- Code: None
Instance Localization for Self-supervised Detection Pretraining
MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection
- Paper: https://arxiv.org/abs/2103.04224
- Code: None
End-to-End Object Detection with Fully Convolutional Network
Robust and Accurate Object Detection via Adversarial Learning
-
Code: None
I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors
- Paper: https://arxiv.org/abs/2103.13757
- Code: None
Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework
- Paper: https://arxiv.org/abs/2103.11402
- Code: None
OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection
YOLOF:You Only Look One-level Feature
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.09094
- Code: https://github.com/dddzg/up-detr
General Instance Distillation for Object Detection
- Paper: https://arxiv.org/abs/2103.02340
- Code: None
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection
Multiple Instance Active Learning for Object Detection
Towards Open World Object Detection
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD
Adaptive Image Transformer for One-Shot Object Detection
- Paper: None
- Code: None
Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection
Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection
- Paper: https://arxiv.org/abs/2103.01903
- Code: None
Few-Shot Object Detection via Contrastive Proposal Encoding
Dense Label Encoding for Boundary Discontinuity Free Rotation Detection
- Paper: https://arxiv.org/abs/2011.09670
- Code1: https://github.com/Thinklab-SJTU/DCL_RetinaNet_Tensorflow
- Code2: https://github.com/yangxue0827/RotationDetection
ReDet: A Rotation-equivariant Detector for Aerial Object Detection
LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search
Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark
-
Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit
-
Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
Graph Attention Tracking
Rotation Equivariant Siamese Networks for Tracking
- Paper: https://arxiv.org/abs/2012.13078
- Code: None
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: None
- Code: None
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
-
Paper(Oral): https://arxiv.org/abs/2103.11681
Transformer Tracking
Multiple Object Tracking with Correlation Learning
- Paper: https://arxiv.org/abs/2104.03541
- Code: None
Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking
- Paper: https://arxiv.org/abs/2012.02337
- Code: None
Learning a Proposal Classifier for Multiple Object Tracking
Track to Detect and Segment: An Online Multi-Object Tracker
- Homepage: https://jialianwu.com/projects/TraDeS.html
- Paper: https://arxiv.org/abs/2103.08808
- Code: https://github.com/JialianW/TraDeS
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Rethinking BiSeNet For Real-time Semantic Segmentation
Progressive Semantic Segmentation
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Bidirectional Projection Network for Cross Dimension Scene Understanding
- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet
Cross-Dataset Collaborative Learning for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.11351
- Code: None
Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations
- Paper: https://arxiv.org/abs/2103.06342
- Code: None
Capturing Omni-Range Context for Omnidirectional Segmentation
- Paper: https://arxiv.org/abs/2103.05687
- Code: None
Learning Statistical Texture for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.04133
- Code: None
PLOP: Learning without Forgetting for Continual Semantic Segmentation
- Paper: https://arxiv.org/abs/2011.11390
- Code: None
Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation
-
Code: None
Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.14581
- Code: None
BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation
- Paper: https://arxiv.org/abs/2103.08907
- Code: None
Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision
Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.04705
- Code: None
Self-supervised Augmentation Consistency for Adapting Semantic Segmentation
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization
- Paper: https://arxiv.org/abs/2103.13041
- Code: None
MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.05254
- Code: None
Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation
- Paper: https://arxiv.org/abs/2103.04717
- Code: None
Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download
DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation
Incremental Few-Shot Instance Segmentation
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation
- Paper: https://arxiv.org/abs/2105.03186
- Code: None
RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features
Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation
Multi-Scale Aligned Distillation for Low-Resolution Detection
Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers
Zero-shot instance segmentation(Not Sure)
- Paper: None
- Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395
STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation
End-to-End Video Instance Segmentation with Transformers
- Paper(Oral): https://arxiv.org/abs/2011.14503
- Code: https://github.com/Epiphqny/VisTR
Part-aware Panoptic Segmentation
- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts
Exemplar-Based Open-Set Panoptic Segmentation Network
- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN
MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
- Paper: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.html
- Code: None
Panoptic Segmentation Forecasting
Fully Convolutional Networks for Panoptic Segmentation
Cross-View Regularization for Domain Adaptive Panoptic Segmentation
- Paper: https://arxiv.org/abs/2103.02584
- Code: None
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
- Paper(Oral): https://arxiv.org/abs/2103.15954
- Code: None
Learning Position and Target Consistency for Memory-based Video Object Segmentation
- Paper: https://arxiv.org/abs/2104.04329
- Code: None
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
- Paper(Oral): https://arxiv.org/abs/2101.08833
- Code: https://github.com/dukebw/SSTVOS
Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion
-
Homepage: https://hkchengrex.github.io/MiVOS/
Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion
- Paper(Oral): https://arxiv.org/abs/2103.11832
- Code: https://github.com/sunpeng1996/DSA2F
Uncertainty-aware Joint Salient Object and Camouflaged Object Detection
Group Collaborative Learning for Co-Salient Object Detection
Semantic Image Matting
- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
- Paper: https://arxiv.org/abs/2105.09156
- Code: None
Unsupervised Multi-Source Domain Adaptation for Person Re-Identification
- Paper: https://arxiv.org/abs/2104.12961
- Code: None
Combined Depth Space based Architecture Search For Person Re-identification
- Paper: https://arxiv.org/abs/2104.04163
- Code: None
Anchor-Free Person Search
- Paper: https://arxiv.org/abs/2103.11617
- Code: https://github.com/daodaofr/AlignPS
- Interpretation: 首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021
Temporal-Relational CrossTransformers for Few-Shot Action Recognition
FrameExit: Conditional Early Exiting for Efficient Video Recognition
- Paper(Oral): https://arxiv.org/abs/2104.13400
- Code: None
No frame left behind: Full Video Action Recognition
- Paper: https://arxiv.org/abs/2103.15395
- Code: None
Learning Salient Boundary Feature for Anchor-free Temporal Action Localization
- Paper: https://arxiv.org/abs/2103.13137
- Code: None
Temporal Context Aggregation Network for Temporal Action Proposal Refinement
- Paper: https://arxiv.org/abs/2103.13141
- Code: None
- Interpretation: CVPR 2021 | TCANet:最强时序动作提名修正网络
ACTION-Net: Multipath Excitation for Action Recognition
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
- Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
- Paper: https://arxiv.org/abs/2009.05769
- Code: https://github.com/FingerRec/BE
TDN: Temporal Difference Networks for Efficient Action Recognition
A 3D GAN for Improved Large-pose Facial Recognition
- Paper: https://arxiv.org/abs/2012.10545
- Code: None
MagFace: A Universal Representation for Face Recognition and Quality Assessment
- Paper(Oral): https://arxiv.org/abs/2103.06627
- Code: https://github.com/IrvingMeng/MagFace
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition
- Homepage: https://www.face-benchmark.org/
- Paper: https://arxiv.org/abs/2103.04098
- Dataset: https://www.face-benchmark.org/
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
HLA-Face: Joint High-Low Adaptation for Low Light Face Detection
- Homepage: https://daooshee.github.io/HLA-Face-Website/
- Paper: https://arxiv.org/abs/2104.01984
- Code: https://github.com/daooshee/HLA-Face-Code
CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement
- Paper: https://arxiv.org/abs/2103.07017
- Code: None
Cross Modal Focal Loss for RGBD Face Anti-Spoofing
- Paper: https://arxiv.org/abs/2103.00948
- Code: None
Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain
- Paper:https://arxiv.org/abs/2103.01856
- Code: None
Multi-attentional Deepfake Detection
- Paper:https://arxiv.org/abs/2103.02406
- Code: None
Continuous Face Aging via Self-estimated Residual Age Embedding
- Paper: https://arxiv.org/abs/2105.00020
- Code: None
PML: Progressive Margin Loss for Long-tailed Age Classification
- Paper: https://arxiv.org/abs/2103.02140
- Code: None
Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
- Paper: https://arxiv.org/abs/2103.13372
- Code: None
MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes
- Paper: https://arxiv.org/abs/2103.14211
- Code: None
Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing
ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search
- Paper: ttps://arxiv.org/abs/2105.10154
- Code: None
When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks
- Paper: https://arxiv.org/abs/2105.06152
- Code: None
Pose Recognition with Cascade Transformers
DCPose: Deep Dual Consecutive Network for Human Pose Estimation
End-to-End Human Pose and Mesh Reconstruction with Transformers
PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation
-
Paper(Oral): https://arxiv.org/abs/2105.02465
Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration
Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks
HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation
- Homepage: https://jeffli.site/HybrIK/
- Paper: https://arxiv.org/abs/2011.14672
- Code: https://github.com/Jeff-sjtu/HybrIK
From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation
- Paper: https://arxiv.org/abs/2103.14843
- Code: None
POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture
-
Homepage: http://www.liuyebin.com/posefusion/posefusion.html
-
Paper(Oral): https://arxiv.org/abs/2103.15331
-
Code: None
Fourier Contour Embedding for Arbitrary-Shaped Text Detection
- Paper: https://arxiv.org/abs/2104.10442
- Code: None
Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Checkerboard Context Model for Efficient Learned Image Compression
- Paper: https://arxiv.org/abs/2103.15306
- Code: None
Slimmable Compressive Autoencoders for Practical Neural Image Compression
- Paper: https://arxiv.org/abs/2103.15726
- Code: None
Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton
- Paper: https://arxiv.org/abs/2103.15368
- Code: None
Teachers Do More Than Teach: Compressing Image-to-Image Models
Dynamic Slimmable Network
Network Quantization with Element-wise Gradient Scaling
- Paper: https://arxiv.org/abs/2104.00903
- Code: None
Zero-shot Adversarial Quantization
- Paper(Oral): https://arxiv.org/abs/2103.15263
- Code: https://git.io/Jqc0y
Learnable Companding Quantization for Accurate Low-bit Neural Networks
- Paper: https://arxiv.org/abs/2103.07156
- Code: None
Distilling Knowledge via Knowledge Review
Distilling Object Detectors via Decoupled Features
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None
ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic
AdderSR: Towards Energy Efficient Image Super-Resolution
- Paper: https://arxiv.org/abs/2009.08891
- Code: None
Contrastive Learning for Compact Single Image Dehazing
Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
- Paper: None
- Code: https://github.com/CS-GangXu/TMNet
Multi-Stage Progressive Image Restoration
PD-GAN: Probabilistic Diverse GAN for Image Inpainting
TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations
- Homepage: https://yzhouas.github.io/projects/TransFill/index.html
- Paper: https://arxiv.org/abs/2103.15982
- Code: None
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: https://arxiv.org/abs/2104.14754
- Code: https://github.com/naver-ai/StyleMapGAN
- Demo Video: https://youtu.be/qCapNyRA_Ng
High-Fidelity and Arbitrary Face Editing
- Paper: https://arxiv.org/abs/2103.15814
- Code: None
Anycost GANs for Interactive Image Synthesis and Editing
PISE: Person Image Synthesis and Editing with Decoupled GAN
DeFLOCNet: Deep Image Editing via Flexible Low-level Controls
- Paper: http://raywzy.com/
- Code: http://raywzy.com/
Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
- Paper: None
- Code: None
Towards Accurate Text-based Image Captioning with Content Diversity Exploration
- Paper: https://arxiv.org/abs/2105.03236
- Code: None
DG-Font: Deformable Generative Networks for Unsupervised Font Generation
LoFTR: Detector-Free Local Feature Matching with Transformers
- Homepage: https://zju3dv.github.io/loftr/
- Paper: https://arxiv.org/abs/2104.00680
- Code: https://github.com/zju3dv/LoFTR
Convolutional Hough Matching Networks
- Homapage: http://cvlab.postech.ac.kr/research/CHM/
- Paper(Oral): https://arxiv.org/abs/2103.16831
- Code: None
Bridging the Visual Gap: Wide-Range Image Blending
Robust Reflection Removal with Reflection-free Flash-only Cues
- Paper: https://arxiv.org/abs/2103.04273
- Code: https://github.com/ChenyangLEI/flash-reflection-removal
Equivariant Point Network for 3D Point Cloud Analysis
- Paper: https://arxiv.org/abs/2103.14147
- Code: None
PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds
Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection
LiDAR R-CNN: An Efficient and Universal 3D Object Detector
M3DSSD: Monocular 3D Single Stage Object Detector
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud
- Paper: None
- Code: https://github.com/Vegeta2020/SE-SSD
Center-based 3D Object Detection and Tracking
Categorical Depth Distribution Network for Monocular 3D Object Detection
- Paper: https://arxiv.org/abs/2103.01100
- Code: None
Bidirectional Projection Network for Cross Dimension Scene Understanding
- Paper(Oral): https://arxiv.org/abs/2103.14326
- Code: https://github.com/wbhu/BPNet
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion
Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation
Center-based 3D Object Detection and Tracking
ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning
- Paper: https://arxiv.org/abs/2103.15231
- Code: None
PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency
PREDATOR: Registration of 3D Point Clouds with Low Overlap
Unsupervised 3D Shape Completion through GAN Inversion
- Homepage: https://junzhezhang.github.io/projects/ShapeInversion/
- Paper: https://arxiv.org/abs/2104.13366
- Code: https://github.com/junzhezhang/shape-inversion
Variational Relational Point Completion Network
- Homepage: https://paul007pl.github.io/projects/VRCNet
- Paper: https://arxiv.org/abs/2104.10154
- Code: https://github.com/paul007pl/VRCNet
Style-based Point Generator with Adversarial Rendering for Point Cloud Completion
-
Homepage: https://alphapav.github.io/SpareNet/
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
- Paper: https://arxiv.org/abs/2104.00858
- Code: None
NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video
-
Homepage: https://zju3dv.github.io/neuralrecon/
-
Paper(Oral): https://arxiv.org/abs/2104.00681
FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism
- Paper(Oral): https://arxiv.org/abs/2103.07054
- Code: https://github.com/DC1991/FS-Net
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation
- Paper: http://arxiv.org/abs/2102.12145
- code: https://git.io/GDR-Net
FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation
Back to the Feature: Learning Robust Camera Localization from Pixels to Pose
S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation
- Paper(Oral): https://arxiv.org/abs/2104.00877
- Code: None
Beyond Image to Depth: Improving Depth Prediction using Echoes
- Homepage: https://krantiparida.github.io/projects/bimgdepth.html
- Paper: https://arxiv.org/abs/2103.08468
- Code: https://github.com/krantiparida/beyond-image-to-depth
S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation
- Paper: https://arxiv.org/abs/2103.02396
- Code: None
Depth from Camera Motion and Object Detection
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
A Decomposition Model for Stereo Matching
- Paper: https://arxiv.org/abs/2104.07516
- Code: None
Self-Supervised Multi-Frame Monocular Scene Flow
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
- Paper: https://arxiv.org/abs/2012.00726v1
- Code: None
Learning Optical Flow From Still Images
-
Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/
-
Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf
FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds
- Paper: https://arxiv.org/abs/2104.00798
- Code: None
Focus on Local: Detecting Lane Marker from Bottom Up via Key Point
- Paper: https://arxiv.org/abs/2105.13680
- Code: None
Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection
Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction
- Paper(Oral): https://arxiv.org/abs/2104.08277
- Code: None
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Enhancing the Transferability of Adversarial Attacks through Variance Tuning
LiBRe: A Practical Bayesian Approach to Adversarial Detection
- Paper: https://arxiv.org/abs/2103.14835
- Code: None
Natural Adversarial Examples
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval
- Paper: https://arxiv.org/abs/2103.15706
- COde: None
QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval
- Paper: https://arxiv.org/abs/2103.02927
- Code: None
On Semantic Similarity in Video Retrieval
-
Homepage: https://mwray.github.io/SSVR/
Cross-Modal Center Loss for 3D Cross-Modal Retrieval
- Paper: https://arxiv.org/abs/2008.03561
- Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
- Paper: https://arxiv.org/abs/2103.16553
- Code: None
Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
Counterfactual Zero-Shot and Open-Set Visual Recognition
FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space
CDFI: Compression-Driven Network Design for Frame Interpolation
- Paper: None
- Code: https://github.com/tding1/CDFI
FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation
-
Homepage: https://tarun005.github.io/FLAVR/
Transformation Driven Visual Reasoning
- homepage: https://hongxin2019.github.io/TVR/
- Paper: https://arxiv.org/abs/2011.13160
- Code: https://github.com/hughplay/TVR
Taming Transformers for High-Resolution Image Synthesis
- Homepage: https://compvis.github.io/taming-transformers/
- Paper(Oral): https://arxiv.org/abs/2012.09841
- Code: https://github.com/CompVis/taming-transformers
Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes
Self-Supervised Visibility Learning for Novel View Synthesis
- Paper: https://arxiv.org/abs/2103.15407
- Code: None
NeX: Real-time View Synthesis with Neural Basis Expansion
- Homepage: https://nex-mpi.github.io/
- Paper(Oral): https://arxiv.org/abs/2103.05606
Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer
LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
- Paper: None
- Code: None
Variational Transformer Networks for Layout Generation
- Paper: https://arxiv.org/abs/2104.02416
- Code: None
Generalizable Person Re-identification with Relevance-aware Mixture of Experts
- Paper: https://arxiv.org/abs/2105.09156
- Code: None
RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening
Adaptive Methods for Real-World Domain Generalization
- Paper: https://arxiv.org/abs/2103.15796
- Code: None
FSDR: Frequency Space Domain Randomization for Domain Generalization
- Paper: https://arxiv.org/abs/2103.02370
- Code: None
Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation
- Paper: https://arxiv.org/abs/2104.00808
- Code: None
Domain Consensus Clustering for Universal Domain Adaptation
- Paper: http://reler.net/papers/guangrui_cvpr2021.pdf
- Code: https://github.com/Solacex/Domain-Consensus-Clustering
Towards Open World Object Detection
- Paper(Oral): https://arxiv.org/abs/2103.02603
- Code: https://github.com/JosephKJ/OWOD
Exemplar-Based Open-Set Panoptic Segmentation Network
- Homepage: https://cv.snu.ac.kr/research/EOPSN/
- Paper: https://arxiv.org/abs/2105.08336
- Code: https://github.com/jd730/EOPSN
Learning Placeholders for Open-Set Recognition
- Paper(Oral): https://arxiv.org/abs/2103.15086
- Code: None
IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking
HOTR: End-to-End Human-Object Interaction Detection with Transformers
- Paper: https://arxiv.org/abs/2104.13682
- Code: None
Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information
Reformulating HOI Detection as Adaptive Set Prediction
Detecting Human-Object Interaction via Fabricated Compositional Learning
End-to-End Human Object Interaction Detection with HOI Transformer
Auto-Exposure Fusion for Single-Image Shadow Removal
- Paper: https://arxiv.org/abs/2103.01255
- Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal
Parser-Free Virtual Try-on via Distilling Appearance Flows
基于外观流蒸馏的无需人体解析的虚拟换装
A Second-Order Approach to Learning with Instance-Dependent Label Noise
- Paper(Oral): https://arxiv.org/abs/2012.11854
- Code: https://github.com/UCSC-REAL/CAL
Part-aware Panoptic Segmentation
- Paper: https://arxiv.org/abs/2106.06351
- Code: https://github.com/tue-mps/panoptic_parts
- Dataset: https://github.com/tue-mps/panoptic_parts
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
-
Homepage: https://www.yasamin.page/hdnet_tiktok
-
Paper(Oral): https://arxiv.org/abs/2103.03319
-
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network
- Paper: https://arxiv.org/abs/2105.09188
- Code: https://github.com/csjliang/LPTN
- Dataset: https://github.com/csjliang/LPTN
Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook
论文下载链接:
ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation
- Paper: https://arxiv.org/abs/2012.05258
- Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
- Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab
Learning To Count Everything
- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything
Semantic Image Matting
- Paper: https://arxiv.org/abs/2104.08201
- Code: https://github.com/nowsyn/SIM
- Dataset: https://github.com/nowsyn/SIM
Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline
- Homepage: http://mepro.bjtu.edu.cn/resource.html
- Paper: https://arxiv.org/abs/2104.06174
- Code: None
Visual Semantic Role Labeling for Video Understanding
-
Homepage: https://vidsitu.org/
VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild
- Homepage: https://www.vspwdataset.com/
- Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
- GitHub: https://github.com/sssdddwww2/vspw_dataset_download
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
- Homepage: https://vap.aau.dk/sewer-ml/
- Paper: https://arxiv.org/abs/2103.10619
Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark
-
Homepage: https://vap.aau.dk/sewer-ml/
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
- Paper: https://arxiv.org/abs/2103.03375
- Dataset: None
Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges
- Homepage: https://github.com/QingyongHu/SensatUrban
- Paper: http://arxiv.org/abs/2009.03137
- Code: https://github.com/QingyongHu/SensatUrban
- Dataset: https://github.com/QingyongHu/SensatUrban
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework
- Paper(Oral): https://arxiv.org/abs/2103.01520
- Code: https://github.com/Hzzone/MTLFace
- Dataset: https://github.com/Hzzone/MTLFace
Depth from Camera Motion and Object Detection
- Paper: https://arxiv.org/abs/2103.01468
- Code: https://github.com/griffbr/ODMD
- Dataset: https://github.com/griffbr/ODMD
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
- Paper: https://arxiv.org/abs/2103.01353
- Code: http://rl.uni-freiburg.de/research/multimodal-distill
- Dataset: http://rl.uni-freiburg.de/research/multimodal-distill
Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos
-
Homepage: https://www.yasamin.page/hdnet_tiktok
-
Paper(Oral): https://arxiv.org/abs/2103.03319
-
Dataset: https://www.yasamin.page/hdnet_tiktok#h.jr9ifesshn7v
Omnimatte: Associating Objects and Their Effects in Video
-
Homepage: https://omnimatte.github.io/
-
Paper(Oral): https://arxiv.org/abs/2105.06993
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets
- Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
- Paper(Oral): https://arxiv.org/abs/2104.12690
- Code: https://github.com/fidler-lab/efficient-annotation-cookbook
Motion Representations for Articulated Animation
- Paper: https://arxiv.org/abs/2104.11280
- Code: https://github.com/snap-research/articulated-animation
Deep Lucas-Kanade Homography for Multimodal Image Alignment
- Paper: https://arxiv.org/abs/2104.11693
- Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography
Skip-Convolutions for Efficient Video Processing
- Paper: https://arxiv.org/abs/2104.11487
- Code: None
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
-
Paper(Oral): https://arxiv.org/abs/2104.11224
Learning To Count Everything
- Paper: https://arxiv.org/abs/2104.08391
- Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
- Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything
SOLD2: Self-supervised Occlusion-aware Line Description and Detection
- Paper(Oral): https://arxiv.org/abs/2104.03362
- Code: https://github.com/cvg/SOLD2
Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression
- Homepage: https://li-wanhua.github.io/POEs/
- Paper: https://arxiv.org/abs/2103.13629
- Code: https://github.com/Li-Wanhua/POEs
LEAP: Learning Articulated Occupancy of People
- Paper: https://arxiv.org/abs/2104.06849
- Code: None
Visual Semantic Role Labeling for Video Understanding
-
Homepage: https://vidsitu.org/
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
- Paper(Oral): https://arxiv.org/abs/2104.00924
- Code: None
Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction
- Paper: https://arxiv.org/abs/2104.00858
- Code: None
Towards High Fidelity Face Relighting with Realistic Shadows
- Paper: https://arxiv.org/abs/2104.00825
- Code: None
BRepNet: A topological message passing system for solid models
- Paper(Oral): https://arxiv.org/abs/2104.00706
- Code: None
Visually Informed Binaural Audio Generation without Binaural Audios
-
Homepage: https://sheldontsui.github.io/projects/PseudoBinaural
-
Paper: None
-
GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021
Exploring intermediate representation for monocular vehicle pose estimation
- Paper: None
- Code: https://github.com/Nicholasli1995/EgoNet
Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB
- Paper(Oral): https://arxiv.org/abs/2103.14708
- Code: None
Invertible Image Signal Processing
Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling
- Paper: https://arxiv.org/abs/2103.14858
- Code: None
SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences
- Paper: https://arxiv.org/abs/2103.14898
- Code: None
Embedding Transfer with Label Relaxation for Improved Metric Learning
- Paper: https://arxiv.org/abs/2103.14908
- Code: None
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes
Meta-Mining Discriminative Samples for Kinship Verification
- Paper: https://arxiv.org/abs/2103.15108
- Code: None
Cloud2Curve: Generation and Vectorization of Parametric Sketches
- Paper: https://arxiv.org/abs/2103.15536
- Code: None
TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
-
Code: None
ACRE: Abstract Causal REasoning Beyond Covariation
-
Code: None
Confluent Vessel Trees with Accurate Bifurcations
- Paper: https://arxiv.org/abs/2103.14268
- Code: None
Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling
- Paper: https://arxiv.org/abs/2103.14338
- Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer
Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks
- Homepage: https://paschalidoud.github.io/neural_parts
- Paper: None
- Code: https://github.com/paschalidoud/neural_parts
Knowledge Evolution in Neural Networks
- Paper(Oral): https://arxiv.org/abs/2103.05152
- Code: https://github.com/ahmdtaha/knowledge_evolution
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
SGP: Self-supervised Geometric Perception
-
Oral
Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning
Diffusion Probabilistic Models for 3D Point Cloud Generation
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models
- Paper: none
- Code: https://github.com/transcendentsky/Film-Recovery
Toward Explainable Reflection Removal with Distilling and Model Uncertainty
- Paper: none
- Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty
DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation
- Paper: none
- Code: https://github.com/lhaippp/DeepOIS
Exploring Adversarial Fake Images on Face Manifold
- Paper: none
- Code: https://github.com/ldz666666/Style-atk
Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task
- Paper: none
- Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task
Temporal Contrastive Graph for Self-supervised Video Representation Learning
- Paper: none
- Code: https://github.com/YangLiu9208/TCG
Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching
- Paper: none
- Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr
Fast and Memory-Efficient Compact Bilinear Pooling
- Paper: none
- Code: https://github.com/cvpr2021kp2/cvpr2021kp2
Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine
- Paper: none
- Code: https://github.com/gapDetection/cvpr2021
Estimating A Child's Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation
- Paper: none
- Code: https://github.com/interactivekeypoint2020/Morph
https://github.com/ShaoQiangShen/CVPR2021
https://github.com/gillesflash/CVPR2021
https://github.com/anonymous-submission1991/BaLeNAS
https://github.com/cvpr2021dcb/cvpr2021dcb
https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578
https://github.com/AldrichZeng/FreqPrune