摘自:https://github.com/amusi/CVPR2021-Papers-with-Code

CVPR 2021 论文和开源项目合集

  • CVPR 2021 论文和开源项目合集(Papers with Code)
    • 【CVPR 2021 论文开源目录】
  • Backbone
  • NAS
  • GAN
  • VAE
  • Visual Transformer
  • Regularization
  • SLAM
  • 长尾分布(Long-Tailed)
  • 数据增广(Data Augmentation)
  • 无监督/自监督(Un/Self-Supervised)
  • 半监督学习(Semi-Supervised )
  • 胶囊网络(Capsule Network)
  • 2D目标检测(Object Detection)
    • 2D目标检测
    • Few-Shot目标检测
    • 旋转目标检测
  • 单/多目标跟踪(Object Tracking)
    • 单目标跟踪
    • 多目标跟踪
  • 语义分割(Semantic Segmentation)
    • 弱监督语义分割
    • 半监督语义分割
    • 域自适应语义分割
    • 视频语义分割
  • 实例分割(Instance Segmentation)
    • 视频实例分割
  • 全景分割(Panoptic Segmentation)
  • 医学图像分割
    • 3D医学图像分割
  • 视频目标分割(Video-Object-Segmentation)
  • 交互式视频目标分割(Interactive-Video-Object-Segmentation)
  • 显著性检测(Saliency Detection)
  • 伪装物体检测(Camouflaged Object Detection)
  • 协同显著性检测(Co-Salient Object Detection)
  • 协同显著性检测(Image Matting)
  • 行人重识别(Person Re-identification)
  • 行人搜索(Person Search)
  • 视频理解/行为识别(Video Understanding)
  • 人脸识别(Face Recognition)
  • 人脸检测(Face Detection)
  • 人脸活体检测(Face Anti-Spoofing)
  • Deepfake检测(Deepfake Detection)
  • 人脸年龄估计(Age Estimation)
  • 人脸表情识别(Facial Expression Recognition)
  • Deepfakes
  • 人体解析(Human Parsing)
  • 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
    • 2D 人体姿态估计
    • 3D 人体姿态估计
  • 动物姿态估计(Animal Pose Estimation)
  • Human Volumetric Capture
  • 场景文本检测(Scene Text Detection)
  • 场景文本识别(Scene Text Recognition)
  • 图像压缩
  • 模型压缩/剪枝/量化
    • 模型剪枝
    • 模型量化
  • 知识蒸馏(Knowledge Distillation)
  • 超分辨率(Super-Resolution)
  • 去雾(Dehazing)
    • 视频超分辨率
  • 图像恢复(Image Restoration)
  • 图像补全(Image Inpainting)
  • 图像编辑(Image Editing)
  • 图像描述(Image Captioning)
  • 字体生成(Font Generation)
  • 图像匹配(Image Matcing)
  • 图像融合(Image Blending)
  • 反光去除(Reflection Removal)
  • 3D点云分类(3D Point Clouds Classification)
  • 3D目标检测(3D Object Detection)
  • 3D语义分割(3D Semantic Segmentation)
  • 3D全景分割(3D Panoptic Segmentation)
  • 3D目标跟踪(3D Object Trancking)
  • 3D点云配准(3D Point Cloud Registration)
  • 3D点云补全(3D Point Cloud Completion)
  • 3D重建(3D Reconstruction)
  • 6D位姿估计(6D Pose Estimation)
  • 相机姿态估计
  • 深度估计(Depth Estimation)
  • 立体匹配(Stereo Matching)
  • 光流估计(Flow Estimation)
  • 车道线检测(Lane Detection)
  • 轨迹预测(Trajectory Prediction)
  • 人群计数(Crowd Counting)
  • 对抗样本(Adversarial Examples)
  • 图像检索(Image Retrieval)
  • 视频检索(Video Retrieval)
  • 跨模态检索(Cross-modal Retrieval)
  • Zero-Shot Learning
  • 联邦学习(Federated Learning)
  • 视频插帧(Video Frame Interpolation)
  • 视觉推理(Visual Reasoning)
  • 图像合成(Image Synthesis)
  • 视图合成(View Synthesis)
  • 风格迁移(Style Transfer)
  • 布局生成(Layout Generation)
  • Domain Generalization
  • Domain Adaptation
  • Open-Set
  • Adversarial Attack
  • "人-物"交互(HOI)检测
  • 阴影去除(Shadow Removal)
  • 虚拟换衣(Virtual Try-On)
  • 数据集(Datasets)
  • 其他(Others)
  • 待添加(TODO)
  • 不确定中没中(Not Sure)

CVPR 2021 论文和开源项目合集(Papers with Code)

CVPR 2021 论文和开源项目合集(papers with code)!

CVPR 2021 收录列表:http://cvpr2021.thecvf.com/sites/default/files/2021-03/accepted_paper_ids.txt

注1:欢迎各位大佬提交issue,分享CVPR 2021论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

【CVPR 2021 论文开源目录】

  • Backbone
  • NAS
  • GAN
  • VAE
  • Visual Transformer
  • Regularization
  • SLAM
  • 长尾分布(Long-Tailed)
  • 数据增广(Data Augmentation)
  • 无监督/自监督(Self-Supervised)
  • 半监督(Semi-Supervised)
  • 胶囊网络(Capsule Network)
  • 2D目标检测(Object Detection)
  • 单/多目标跟踪(Object Tracking)
  • 语义分割(Semantic Segmentation)
  • 实例分割(Instance Segmentation)
  • 全景分割(Panoptic Segmentation)
  • 医学图像分割(Medical Image Segmentation)
  • 视频目标分割(Video-Object-Segmentation)
  • 交互式视频目标分割(Interactive-Video-Object-Segmentation)
  • 显著性检测(Saliency Detection)
  • 伪装物体检测(Camouflaged Object Detection)
  • 协同显著性检测(Co-Salient Object Detection)
  • 图像抠图(Image Matting)
  • 行人重识别(Person Re-identification)
  • 行人搜索(Person Search)
  • 视频理解/行为识别(Video Understanding)
  • 人脸识别(Face Recognition)
  • 人脸检测(Face Detection)
  • 人脸活体检测(Face Anti-Spoofing)
  • Deepfake检测(Deepfake Detection)
  • 人脸年龄估计(Age-Estimation)
  • 人脸表情识别(Facial-Expression-Recognition)
  • Deepfakes
  • 人体解析(Human Parsing)
  • 2D/3D人体姿态估计(2D/3D Human Pose Estimation)
  • 动物姿态估计(Animal Pose Estimation)
  • Human Volumetric Capture
  • 场景文本识别(Scene Text Recognition)
  • 图像压缩(Image Compression)
  • 模型压缩/剪枝/量化
  • 知识蒸馏(Knowledge Distillation)
  • 超分辨率(Super-Resolution)
  • 去雾(Dehazing)
  • 图像恢复(Image Restoration)
  • 图像补全(Image Inpainting)
  • 图像编辑(Image Editing)
  • 图像描述(Image Captioning)
  • 字体生成(Font Generation)
  • 图像匹配(Image Matching)
  • 图像融合(Image Blending)
  • 反光去除(Reflection Removal)
  • 3D点云分类(3D Point Clouds Classification)
  • 3D目标检测(3D Object Detection)
  • 3D语义分割(3D Semantic Segmentation)
  • 3D全景分割(3D Panoptic Segmentation)
  • 3D目标跟踪(3D Object Tracking)
  • 3D点云配准(3D Point Cloud Registration)
  • 3D点云补全(3D-Point-Cloud-Completion)
  • 3D重建(3D Reconstruction)
  • 6D位姿估计(6D Pose Estimation)
  • 相机姿态估计(Camera Pose Estimation)
  • 深度估计(Depth Estimation)
  • 立体匹配(Stereo Matching)
  • 光流估计(Flow Estimation)
  • 车道线检测(Lane Detection)
  • 轨迹预测(Trajectory Prediction)
  • 人群计数(Crowd Counting)
  • 对抗样本(Adversarial-Examples)
  • 图像检索(Image Retrieval)
  • 视频检索(Video Retrieval)
  • 跨模态检索(Cross-modal Retrieval)
  • Zero-Shot Learning
  • 联邦学习(Federated Learning)
  • 视频插帧(Video Frame Interpolation)
  • 视觉推理(Visual Reasoning)
  • 图像合成(Image Synthesis)
  • 视图合成(Visual Synthesis)
  • 风格迁移(Style Transfer)
  • 布局生成(Layout Generation)
  • Domain Generalization
  • Domain Adaptation
  • Open-Set
  • Adversarial Attack
  • "人-物"交互(HOI)检测
  • 阴影去除(Shadow Removal)
  • 虚拟试衣
  • 数据集(Datasets)
  • 其他(Others)
  • 待添加(TODO)
  • 不确定中没中(Not Sure)

Backbone

Decoupled Dynamic Filter Networks

  • Homepage: https://thefoxofsky.github.io/project_pages/ddf
  • Paper: https://arxiv.org/abs/2104.14107
  • Code: https://github.com/thefoxofsky/DDF

Lite-HRNet: A Lightweight High-Resolution Network

  • Paper: https://arxiv.org/abs/2104.06403
  • https://github.com/HRNet/Lite-HRNet

CondenseNet V2: Sparse Feature Reactivation for Deep Networks

  • Paper: https://arxiv.org/abs/2104.04382

  • Code: https://github.com/jianghaojun/CondenseNetV2

Diverse Branch Block: Building a Convolution as an Inception-like Unit

  • Paper: https://arxiv.org/abs/2103.13425

  • Code: https://github.com/DingXiaoH/DiverseBranchBlock

Scaling Local Self-Attention For Parameter Efficient Visual Backbones

  • Paper(Oral): https://arxiv.org/abs/2103.12731

  • Code: None

ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network

  • Paper: https://arxiv.org/abs/2007.00992
  • Code: https://github.com/clovaai/rexnet

Involution: Inverting the Inherence of Convolution for Visual Recognition

  • Paper: https://github.com/d-li14/involution
  • Code: https://arxiv.org/abs/2103.06255

Coordinate Attention for Efficient Mobile Network Design

  • Paper: https://arxiv.org/abs/2103.02907
  • Code: https://github.com/Andrew-Qibin/CoordAttention

Inception Convolution with Efficient Dilation Search

  • Paper: https://arxiv.org/abs/2012.13587
  • Code: https://github.com/yifan123/IC-Conv

RepVGG: Making VGG-style ConvNets Great Again

  • Paper: https://arxiv.org/abs/2101.03697
  • Code: https://github.com/DingXiaoH/RepVGG

NAS

Combined Depth Space based Architecture Search For Person Re-identification

  • Paper: https://arxiv.org/abs/2104.04163
  • Code: None

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

  • Paper(Oral): https://arxiv.org/abs/2103.15954
  • Code: None

HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

  • Paper(Oral): None
  • Code: https://github.com/dingmyu/HR-NAS

Neural Architecture Search with Random Labels

  • Paper: https://arxiv.org/abs/2101.11834
  • Code: None

Towards Improving the Consistency, Efficiency, and Flexibility of Differentiable Neural Architecture Search

  • Paper: https://arxiv.org/abs/2101.11342
  • Code: None

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

  • Paper: None
  • Code: None

Prioritized Architecture Sampling with Monto-Carlo Tree Search

  • Paper: https://arxiv.org/abs/2103.11922
  • Code: https://github.com/xiusu/NAS-Bench-Macro

Contrastive Neural Architecture Search with Neural Architecture Comparators

  • Paper: https://arxiv.org/abs/2103.05471
  • Code: https://github.com/chenyaofo/CTNAS

AttentiveNAS: Improving Neural Architecture Search via Attentive

  • Paper: https://arxiv.org/abs/2011.09011
  • Code: None

ReNAS: Relativistic Evaluation of Neural Architecture Search

  • Paper: https://arxiv.org/abs/1910.01523
  • Code: None

HourNAS: Extremely Fast Neural Architecture

  • Paper: https://arxiv.org/abs/2005.14446
  • Code: None

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

  • Paper: https://arxiv.org/abs/2103.07289
  • Code: https://github.com/eric8607242/SGNAS

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

  • Paper: https://arxiv.org/abs/2103.04507
  • Code: https://github.com/VDIGPKU/OPANAS

Inception Convolution with Efficient Dilation Search

  • Paper: https://arxiv.org/abs/2012.13587
  • Code: None

GAN

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

  • Paper: https://arxiv.org/abs/2105.09188
  • Code: https://github.com/csjliang/LPTN
  • Dataset: https://github.com/csjliang/LPTN

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

  • Paper: https://arxiv.org/abs/2104.03064

  • Code: https://github.com/ecnuycxie/DG-Font

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

  • Paper: https://arxiv.org/abs/2105.02201
  • Code: https://github.com/KumapowerLIU/PD-GAN

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: https://arxiv.org/abs/2104.14754
  • Code: https://github.com/naver-ai/StyleMapGAN
  • Demo Video: https://youtu.be/qCapNyRA_Ng

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

  • Paper: https://arxiv.org/abs/2104.05376
  • Code: https://github.com/PaddlePaddle/PaddleGAN/

Regularizing Generative Adversarial Networks under Limited Data

  • Homepage: https://hytseng0509.github.io/lecam-gan/
  • Paper: https://faculty.ucmerced.edu/mhyang/papers/cvpr2021_gan_limited_data.pdf
  • Code: https://github.com/google/lecam-gan

Towards Real-World Blind Face Restoration with Generative Facial Prior

  • Paper: https://arxiv.org/abs/2101.04061
  • Code: None

TediGAN: Text-Guided Diverse Image Generation and Manipulation

  • Homepage: https://xiaweihao.com/projects/tedigan/

  • Paper: https://arxiv.org/abs/2012.03308

  • Code: https://github.com/weihaox/TediGAN

Generative Hierarchical Features from Synthesizing Image

  • Homepage: https://genforce.github.io/ghfeat/

  • Paper(Oral): https://arxiv.org/abs/2007.10379

  • Code: https://github.com/genforce/ghfeat

Teachers Do More Than Teach: Compressing Image-to-Image Models

  • Paper: https://arxiv.org/abs/2103.03467
  • Code: https://github.com/snap-research/CAT

HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms

  • Paper: https://arxiv.org/abs/2011.11731
  • Code: https://github.com/mahmoudnafifi/HistoGAN

pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis

  • Homepage: https://marcoamonteiro.github.io/pi-GAN-website/

  • Paper(Oral): https://arxiv.org/abs/2012.00926

  • Code: None

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

  • Paper: https://arxiv.org/abs/2103.07893
  • Code: None

Diverse Semantic Image Synthesis via Probability Distribution Modeling

  • Paper: https://arxiv.org/abs/2103.06878
  • Code: https://github.com/tzt101/INADE.git

LOHO: Latent Optimization of Hairstyles via Orthogonalization

  • Paper: https://arxiv.org/abs/2103.03891
  • Code: None

PISE: Person Image Synthesis and Editing with Decoupled GAN

  • Paper: https://arxiv.org/abs/2103.04023
  • Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

  • Paper: http://raywzy.com/
  • Code: http://raywzy.com/

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

  • Paper: http://raywzy.com/
  • Code: http://raywzy.com/

Efficient Conditional GAN Transfer with Knowledge Propagation across Classes

  • Paper: https://www.researchgate.net/publication/349309756_Efficient_Conditional_GAN_Transfer_with_Knowledge_Propagation_across_Classes
  • Code: http://github.com/mshahbazi72/cGANTransfer

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs

  • Paper: https://arxiv.org/abs/2011.14107
  • Code: None

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation

  • Homepage: https://eladrich.github.io/pixel2style2pixel/
  • Paper: https://arxiv.org/abs/2008.00951
  • Code: https://github.com/eladrich/pixel2style2pixel

A 3D GAN for Improved Large-pose Facial Recognition

  • Paper: https://arxiv.org/abs/2012.10545
  • Code: None

HumanGAN: A Generative Model of Humans Images

  • Paper: https://arxiv.org/abs/2103.06902
  • Code: None

ID-Unet: Iterative Soft and Hard Deformation for View Synthesis

  • Paper: https://arxiv.org/abs/2103.02264
  • Code: https://github.com/MingyuY/Iterative-view-synthesis

CoMoGAN: continuous model-guided image-to-image translation

  • Paper(Oral): https://arxiv.org/abs/2103.06879
  • Code: https://github.com/cv-rits/CoMoGAN

Training Generative Adversarial Networks in One Stage

  • Paper: https://arxiv.org/abs/2103.00430
  • Code: None

Closed-Form Factorization of Latent Semantics in GANs

  • Homepage: https://genforce.github.io/sefa/
  • Paper(Oral): https://arxiv.org/abs/2007.06600
  • Code: https://github.com/genforce/sefa

Anycost GANs for Interactive Image Synthesis and Editing

  • Paper: https://arxiv.org/abs/2103.03243
  • Code: https://github.com/mit-han-lab/anycost-gan

Image-to-image Translation via Hierarchical Style Disentanglement

  • Paper: https://arxiv.org/abs/2103.01456
  • Code: https://github.com/imlixinyang/HiSD

VAE

Soft-IntroVAE: Analyzing and Improving Introspective Variational Autoencoders

  • Homepage: https://taldatech.github.io/soft-intro-vae-web/

  • Paper: https://arxiv.org/abs/2012.13253

  • Code: https://github.com/taldatech/soft-intro-vae-pytorch

Visual Transformer

1. End-to-End Human Pose and Mesh Reconstruction with Transformers

  • Paper: https://arxiv.org/abs/2012.09760
  • Code: None

2. Temporal-Relational CrossTransformers for Few-Shot Action Recognition

  • Paper: https://arxiv.org/abs/2101.06184
  • Code: https://github.com/tobyperrett/trx

3. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain

  • Paper: https://arxiv.org/abs/2103.16110
  • Code: https://github.com/mczhuge/Kaleido-BERT

4. HOTR: End-to-End Human-Object Interaction Detection with Transformers

  • Paper: https://arxiv.org/abs/2104.13682
  • Code: None

5. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving

  • Paper: https://arxiv.org/abs/2104.09224
  • Code: https://github.com/autonomousvision/transfuser

6. Pose Recognition with Cascade Transformers

  • Paper: https://arxiv.org/abs/2104.06976

  • Code: https://github.com/mlpc-ucsd/PRTR

7. Variational Transformer Networks for Layout Generation

  • Paper: https://arxiv.org/abs/2104.02416
  • Code: None

8. LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/
  • Paper: https://arxiv.org/abs/2104.00680
  • Code: https://github.com/zju3dv/LoFTR

9. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/abs/2012.15840
  • Code: https://github.com/fudan-zvg/SETR

10. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

  • Paper: https://arxiv.org/abs/2103.16553
  • Code: None

11. Transformer Tracking

  • Paper: https://arxiv.org/abs/2103.15436
  • Code: https://github.com/chenxin-dlut/TransT

12. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers

  • Paper(Oral): None
  • Code: https://github.com/dingmyu/HR-NAS

13. MIST: Multiple Instance Spatial Transformer

  • Paper: https://arxiv.org/abs/1811.10725
  • Code: None

14. Multimodal Motion Prediction with Stacked Transformers

  • Paper: https://arxiv.org/abs/2103.11624
  • Code: https://decisionforce.github.io/mmTransformer

15. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

  • Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning

  • Code: https://github.com/amzn/image-to-recipe-transformers

16. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral): https://arxiv.org/abs/2103.11681

  • Code: https://github.com/594422814/TransformerTrack

17. Pre-Trained Image Processing Transformer

  • Paper: https://arxiv.org/abs/2012.00364
  • Code: None

18. End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.14503
  • Code: https://github.com/Epiphqny/VisTR

19. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.09094
  • Code: https://github.com/dddzg/up-detr

20. End-to-End Human Object Interaction Detection with HOI Transformer

  • Paper: https://arxiv.org/abs/2103.04503
  • Code: https://github.com/bbepoch/HoiTransformer

21. Transformer Interpretability Beyond Attention Visualization

  • Paper: https://arxiv.org/abs/2012.09838
  • Code: https://github.com/hila-chefer/Transformer-Explainability

22. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer

  • Paper: None
  • Code: None

23. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

24. Line Segment Detection Using Transformers without Edges

  • Paper(Oral): https://arxiv.org/abs/2101.01909
  • Code: None

25. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None

26. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

  • Paper(Oral): https://arxiv.org/abs/2101.08833
  • Code: https://github.com/dukebw/SSTVOS

27. Facial Action Unit Detection With Transformers

  • Paper: None
  • Code: None

28. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition

  • Paper: None
  • Code: None

29. Lesion-Aware Transformers for Diabetic Retinopathy Grading

  • Paper: None
  • Code: None

30. Topological Planning With Transformers for Vision-and-Language Navigation

  • Paper: https://arxiv.org/abs/2012.05292
  • Code: None

31. Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

32. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos

  • Paper: None
  • Code: None

33. Taming Transformers for High-Resolution Image Synthesis

  • Homepage: https://compvis.github.io/taming-transformers/
  • Paper(Oral): https://arxiv.org/abs/2012.09841
  • Code: https://github.com/CompVis/taming-transformers

34. Self-Supervised Video Hashing via Bidirectional Transformers

  • Paper: None
  • Code: None

35. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos

  • Paper(Oral): https://hehefan.github.io/pdfs/p4transformer.pdf
  • Code: None

36. Gaussian Context Transformer

  • Paper: None
  • Code: None

37. General Multi-Label Image Classification With Transformers

  • Paper: https://arxiv.org/abs/2011.14027
  • Code: None

38. Bottleneck Transformers for Visual Recognition

  • Paper: https://arxiv.org/abs/2101.11605
  • Code: None

39. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation

  • Paper(Oral): https://arxiv.org/abs/2011.13922
  • Code: https://github.com/YicongHong/Recurrent-VLN-BERT

40. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

  • Paper(Oral): https://arxiv.org/abs/2102.06183
  • Code: https://github.com/jayleicn/ClipBERT

41. Self-attention based Text Knowledge Mining for Text Detection

  • Paper: None
  • Code: https://github.com/CVI-SZU/STKM

42. SSAN: Separable Self-Attention Network for Video Representation Learning

  • Paper: None
  • Code: None

43. Scaling Local Self-Attention For Parameter Efficient Visual Backbones

  • Paper(Oral): https://arxiv.org/abs/2103.12731

  • Code: None

Regularization

Regularizing Neural Networks via Adversarial Model Perturbation

  • Paper: https://arxiv.org/abs/2010.04925
  • Code: https://github.com/hiyouga/AMP-Regularizer

SLAM

Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

  • Paper: https://arxiv.org/abs/2105.07593
  • Code: None

Generalizing to the Open World: Deep Visual Odometry with Online Adaptation

  • Paper: https://arxiv.org/abs/2103.15279
  • Code: https://arxiv.org/abs/2103.15279

长尾分布(Long-Tailed)

Adversarial Robustness under Long-Tailed Distribution

  • Paper(Oral): https://arxiv.org/abs/2104.02703
  • Code: https://github.com/wutong16/Adversarial_Long-Tail

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition

  • Paper: https://arxiv.org/abs/2103.16370
  • Code: https://github.com/Megvii-BaseDetection/DisAlign

Adaptive Class Suppression Loss for Long-Tail Object Detection

  • Paper: https://arxiv.org/abs/2104.00885
  • Code: https://github.com/CASIA-IVA-Lab/ACSL

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

  • Paper: https://arxiv.org/abs/2103.14267
  • Code: None

数据增广(Data Augmentation)

Scale-aware Automatic Augmentation for Object Detection

  • Paper: https://arxiv.org/abs/2103.17220

  • Code: https://github.com/Jia-Research-Lab/SA-AutoAug

无监督/自监督(Un/Self-Supervised)

Domain-Specific Suppression for Adaptive Object Detection

  • Paper: https://arxiv.org/abs/2105.03570
  • Code: None

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

  • Paper: https://arxiv.org/abs/2104.14558

  • Code: https://github.com/facebookresearch/SlowFast

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

  • Paper: https://arxiv.org/abs/2104.12961
  • Code: None

Self-supervised Video Representation Learning by Context and Motion Decoupling

  • Paper: https://arxiv.org/abs/2104.00862
  • Code: None

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

  • Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
  • Paper: https://arxiv.org/abs/2009.05769
  • Code: https://github.com/FingerRec/BE

Spatially Consistent Representation Learning

  • Paper: https://arxiv.org/abs/2103.06122
  • Code: None

VideoMoCo: Contrastive Video Representation Learning with Temporally Adversarial Examples

  • Paper: https://arxiv.org/abs/2103.05905
  • Code: https://github.com/tinapan-pt/VideoMoCo

Exploring Simple Siamese Representation Learning

  • Paper(Oral): https://arxiv.org/abs/2011.10566
  • Code: None

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

  • Paper(Oral): https://arxiv.org/abs/2011.09157
  • Code: https://github.com/WXinlong/DenseCL

半监督学习(Semi-Supervised )

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

  • Paper: https://arxiv.org/abs/2103.11402
  • Code: None

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

  • Paper: https://arxiv.org/abs/2103.02193
  • Code: https://github.com/SHI-Labs/Semi-Supervised-Transfer-Learning

胶囊网络(Capsule Network)

Capsule Network is Not More Robust than Convolutional Network

  • Paper: https://arxiv.org/abs/2103.15459
  • Code: None

2D目标检测(Object Detection)

2D目标检测

Domain-Specific Suppression for Adaptive Object Detection

  • Paper: https://arxiv.org/abs/2105.03570
  • Code: None

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

  • Paper: https://arxiv.org/abs/2104.06936
  • Code: None

Multi-Scale Aligned Distillation for Low-Resolution Detection

  • Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf

  • Code: https://github.com/Jia-Research-Lab/MSAD

Adaptive Class Suppression Loss for Long-Tail Object Detection

  • Paper: https://arxiv.org/abs/2104.00885
  • Code: https://github.com/CASIA-IVA-Lab/ACSL

VarifocalNet: An IoU-aware Dense Object Detector

  • Paper(Oral): https://arxiv.org/abs/2008.13367

  • Code: https://github.com/hyz-xmaster/VarifocalNet

Scale-aware Automatic Augmentation for Object Detection

  • Paper: https://arxiv.org/abs/2103.17220

  • Code: https://github.com/Jia-Research-Lab/SA-AutoAug

OTA: Optimal Transport Assignment for Object Detection

  • Paper: https://arxiv.org/abs/2103.14259
  • Code: https://github.com/Megvii-BaseDetection/OTA

Distilling Object Detectors via Decoupled Features

  • Paper: https://arxiv.org/abs/2103.14475
  • Code: https://github.com/ggjy/DeFeat.pytorch

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

  • Paper: https://arxiv.org/abs/2011.12450
  • Code: https://github.com/PeizeSun/SparseR-CNN

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: https://rl.uni-freiburg.de/
  • Paper: https://arxiv.org/abs/2103.01353
  • Code: None

Positive-Unlabeled Data Purification in the Wild for Object Detection

  • Paper: None
  • Code: None

Instance Localization for Self-supervised Detection Pretraining

  • Paper: https://arxiv.org/abs/2102.08318
  • Code: https://github.com/limbo0000/InstanceLoc

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

  • Paper: https://arxiv.org/abs/2103.04224
  • Code: None

End-to-End Object Detection with Fully Convolutional Network

  • Paper: https://arxiv.org/abs/2012.03544
  • Code: https://github.com/Megvii-BaseDetection/DeFCN

Robust and Accurate Object Detection via Adversarial Learning

  • Paper: https://arxiv.org/abs/2103.13886

  • Code: None

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

  • Paper: https://arxiv.org/abs/2103.13757
  • Code: None

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

  • Paper: https://arxiv.org/abs/2103.11402
  • Code: None

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

  • Paper: https://arxiv.org/abs/2103.04507
  • Code: https://github.com/VDIGPKU/OPANAS

YOLOF:You Only Look One-level Feature

  • Paper: https://arxiv.org/abs/2103.09460
  • Code: https://github.com/megvii-model/YOLOF

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.09094
  • Code: https://github.com/dddzg/up-detr

General Instance Distillation for Object Detection

  • Paper: https://arxiv.org/abs/2103.02340
  • Code: None

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
  • Paper: https://arxiv.org/abs/2103.01353
  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

  • Paper: https://arxiv.org/abs/2011.12885
  • Code: https://github.com/implus/GFocalV2

Multiple Instance Active Learning for Object Detection

  • Paper: https://github.com/yuantn/MIAL/raw/master/paper.pdf
  • Code: https://github.com/yuantn/MIAL

Towards Open World Object Detection

  • Paper(Oral): https://arxiv.org/abs/2103.02603
  • Code: https://github.com/JosephKJ/OWOD

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

  • Paper: https://arxiv.org/abs/2103.17115
  • Code: https://github.com/hzhupku/DCNet

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

  • Paper: https://arxiv.org/abs/2103.01903
  • Code: None

Few-Shot Object Detection via Contrastive Proposal Encoding

  • Paper: https://arxiv.org/abs/2103.05950
  • Code: https://github.com/MegviiDetection/FSCE

旋转目标检测

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

  • Paper: https://arxiv.org/abs/2103.07733

  • Code: https://github.com/csuhan/ReDet

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

  • Paper: https://arxiv.org/abs/2104.14545

  • Code: https://github.com/researchmm/LightTrack

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

  • Homepage: https://sites.google.com/view/langtrackbenchmark/

  • Paper: https://arxiv.org/abs/2103.16746

  • Evaluation Toolkit: https://github.com/wangxiao5791509/TNL2K_evaluation_toolkit

  • Demo Video: https://www.youtube.com/watch?v=7lvVDlkkff0&ab_channel=XiaoWang

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

  • Paper: https://arxiv.org/abs/2103.14938
  • Code: https://github.com/VISION-SJTU/IoUattack

Graph Attention Tracking

  • Paper: https://arxiv.org/abs/2011.11204
  • Code: https://github.com/ohhhyeahhh/SiamGAT

Rotation Equivariant Siamese Networks for Tracking

  • Paper: https://arxiv.org/abs/2012.13078
  • Code: None

Track to Detect and Segment: An Online Multi-Object Tracker

  • Homepage: https://jialianwu.com/projects/TraDeS.html
  • Paper: None
  • Code: None

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

  • Paper(Oral): https://arxiv.org/abs/2103.11681

  • Code: https://github.com/594422814/TransformerTrack

Transformer Tracking

  • Paper: https://arxiv.org/abs/2103.15436
  • Code: https://github.com/chenxin-dlut/TransT

多目标跟踪

Multiple Object Tracking with Correlation Learning

  • Paper: https://arxiv.org/abs/2104.03541
  • Code: None

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

  • Paper: https://arxiv.org/abs/2012.02337
  • Code: None

Learning a Proposal Classifier for Multiple Object Tracking

  • Paper: https://arxiv.org/abs/2103.07889
  • Code: https://github.com/daip13/LPC_MOT.git

Track to Detect and Segment: An Online Multi-Object Tracker

  • Homepage: https://jialianwu.com/projects/TraDeS.html
  • Paper: https://arxiv.org/abs/2103.08808
  • Code: https://github.com/JialianW/TraDeS

语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.05258
  • Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
  • Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Rethinking BiSeNet For Real-time Semantic Segmentation

  • Paper: https://arxiv.org/abs/2104.13188

  • Code: https://github.com/MichaelFan01/STDC-Seg

Progressive Semantic Segmentation

  • Paper: https://arxiv.org/abs/2104.03778
  • Code: https://github.com/VinAIResearch/MagNet

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

  • Paper: https://arxiv.org/abs/2012.15840
  • Code: https://github.com/fudan-zvg/SETR

Bidirectional Projection Network for Cross Dimension Scene Understanding

  • Paper(Oral): https://arxiv.org/abs/2103.14326
  • Code: https://github.com/wbhu/BPNet

Cross-Dataset Collaborative Learning for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.11351
  • Code: None

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

  • Paper: https://arxiv.org/abs/2103.06342
  • Code: None

Capturing Omni-Range Context for Omnidirectional Segmentation

  • Paper: https://arxiv.org/abs/2103.05687
  • Code: None

Learning Statistical Texture for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04133
  • Code: None

PLOP: Learning without Forgetting for Continual Semantic Segmentation

  • Paper: https://arxiv.org/abs/2011.11390
  • Code: None

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

  • Homepage: https://cvlab.yonsei.ac.kr/projects/BANA/

  • Paper: https://arxiv.org/abs/2104.00905

  • Code: None

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.14581
  • Code: None

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

  • Paper: https://arxiv.org/abs/2103.08907
  • Code: None

半监督语义分割

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04705

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

  • Paper: https://arxiv.org/abs/2105.00097

  • Code: https://github.com/visinf/da-sac

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

  • Paper: https://arxiv.org/abs/2103.15597
  • Code: https://github.com/shachoi/RobustNet

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

  • Paper: https://arxiv.org/abs/2103.13041
  • Code: None

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.05254
  • Code: None

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

  • Paper: https://arxiv.org/abs/2103.04717
  • Code: None

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

  • Paper: https://arxiv.org/abs/2101.10979
  • Code: https://github.com/microsoft/ProDA

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

  • Homepage: https://www.vspwdataset.com/
  • Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
  • GitHub: https://github.com/sssdddwww2/vspw_dataset_download

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

  • Paper: https://arxiv.org/abs/2011.09876
  • Code: https://github.com/aliyun/DCT-Mask

Incremental Few-Shot Instance Segmentation

  • Paper: https://arxiv.org/abs/2105.05312
  • Code: https://github.com/danganea/iMTFA

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

  • Paper: https://arxiv.org/abs/2105.03186
  • Code: None

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

  • Paper: https://arxiv.org/abs/2104.08569
  • Code: https://github.com/zhanggang001/RefineMask/

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

  • Paper: https://arxiv.org/abs/2104.05239
  • Code: https://github.com/tinyalpha/BPR

Multi-Scale Aligned Distillation for Low-Resolution Detection

  • Paper: https://jiaya.me/papers/ms_align_distill_cvpr21.pdf

  • Code: https://github.com/Jia-Research-Lab/MSAD

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

  • Homepage: https://bowenc0221.github.io/boundary-iou/

  • Paper: https://arxiv.org/abs/2103.16562

  • Code: https://github.com/bowenc0221/boundary-iou-api

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

  • Paper: https://arxiv.org/abs/2103.12340

  • Code: https://github.com/lkeab/BCNet

Zero-shot instance segmentation(Not Sure)

  • Paper: None
  • Code: https://github.com/CVPR2021-pape-id-1395/CVPR2021-paper-id-1395

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

  • Paper: http://www4.comp.polyu.edu.hk/~cslzhang/papers.htm
  • Code: https://github.com/MinghanLi/STMask

End-to-End Video Instance Segmentation with Transformers

  • Paper(Oral): https://arxiv.org/abs/2011.14503
  • Code: https://github.com/Epiphqny/VisTR

全景分割(Panoptic Segmentation)

Exemplar-Based Open-Set Panoptic Segmentation Network

  • Homepage: https://cv.snu.ac.kr/research/EOPSN/
  • Paper: https://arxiv.org/abs/2105.08336
  • Code: https://github.com/jd730/EOPSN

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None

Panoptic Segmentation Forecasting

  • Paper: https://arxiv.org/abs/2104.03962
  • Code: https://github.com/nianticlabs/panoptic-forecasting

Fully Convolutional Networks for Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.00720

  • Code: https://github.com/yanwei-li/PanopticFCN

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2103.02584
  • Code: None

医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

  • Paper: https://arxiv.org/abs/2103.06030
  • Code: https://github.com/liuquande/FedDG-ELCFS

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

  • Paper(Oral): https://arxiv.org/abs/2103.15954
  • Code: None

视频目标分割(Video-Object-Segmentation)

Learning Position and Target Consistency for Memory-based Video Object Segmentation

  • Paper: https://arxiv.org/abs/2104.04329
  • Code: None

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

  • Paper(Oral): https://arxiv.org/abs/2101.08833
  • Code: https://github.com/dukebw/SSTVOS

交互式视频目标分割(Interactive-Video-Object-Segmentation)

Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

  • Homepage: https://hkchengrex.github.io/MiVOS/

  • Paper: https://arxiv.org/abs/2103.07941

  • Code: https://github.com/hkchengrex/MiVOS

  • Demo: https://hkchengrex.github.io/MiVOS/video.html#partb

Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild

  • Paper: https://arxiv.org/abs/2103.10391

  • Code: https://github.com/svip-lab/IVOS-W

显著性检测(Saliency Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

  • Paper: https://arxiv.org/abs/2104.02628

  • Code: https://github.com/JingZhang617/Joint_COD_SOD

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

  • Paper(Oral): https://arxiv.org/abs/2103.11832
  • Code: https://github.com/sunpeng1996/DSA2F

伪装物体检测(Camouflaged Object Detection)

Uncertainty-aware Joint Salient Object and Camouflaged Object Detection

  • Paper: https://arxiv.org/abs/2104.02628

  • Code: https://github.com/JingZhang617/Joint_COD_SOD

协同显著性检测(Co-Salient Object Detection)

Group Collaborative Learning for Co-Salient Object Detection

  • Paper: https://arxiv.org/abs/2104.01108
  • Code: https://github.com/fanq15/GCoNet

协同显著性检测(Image Matting)

Semantic Image Matting

  • Paper: https://arxiv.org/abs/2104.08201
  • Code: https://github.com/nowsyn/SIM
  • Dataset: https://github.com/nowsyn/SIM

行人重识别(Person Re-identification)

Unsupervised Multi-Source Domain Adaptation for Person Re-Identification

  • Paper: https://arxiv.org/abs/2104.12961
  • Code: None

Combined Depth Space based Architecture Search For Person Re-identification

  • Paper: https://arxiv.org/abs/2104.04163
  • Code: None

行人搜索(Person Search)

Anchor-Free Person Search

  • Paper: https://arxiv.org/abs/2103.11617
  • Code: https://github.com/daodaofr/AlignPS
  • Interpretation: 首个无需锚框(Anchor-Free)的行人搜索框架 | CVPR 2021

视频理解/行为识别(Video Understanding)

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

  • Paper: https://arxiv.org/abs/2101.06184
  • Code: https://github.com/tobyperrett/trx

FrameExit: Conditional Early Exiting for Efficient Video Recognition

  • Paper(Oral): https://arxiv.org/abs/2104.13400
  • Code: None

No frame left behind: Full Video Action Recognition

  • Paper: https://arxiv.org/abs/2103.15395
  • Code: None

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

  • Paper: https://arxiv.org/abs/2103.13137
  • Code: None

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

  • Paper: https://arxiv.org/abs/2103.13141
  • Code: None
  • Interpretation: CVPR 2021 | TCANet:最强时序动作提名修正网络

ACTION-Net: Multipath Excitation for Action Recognition

  • Paper: https://arxiv.org/abs/2103.07372
  • Code: https://github.com/V-Sense/ACTION-Net

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

  • Homepage: https://fingerrec.github.io/index_files/jinpeng/papers/CVPR2021/project_website.html
  • Paper: https://arxiv.org/abs/2009.05769
  • Code: https://github.com/FingerRec/BE

TDN: Temporal Difference Networks for Efficient Action Recognition

  • Paper: https://arxiv.org/abs/2012.10071
  • Code: https://github.com/MCG-NJU/TDN

人脸识别(Face Recognition)

A 3D GAN for Improved Large-pose Facial Recognition

  • Paper: https://arxiv.org/abs/2012.10545
  • Code: None

MagFace: A Universal Representation for Face Recognition and Quality Assessment

  • Paper(Oral): https://arxiv.org/abs/2103.06627
  • Code: https://github.com/IrvingMeng/MagFace

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

  • Homepage: https://www.face-benchmark.org/
  • Paper: https://arxiv.org/abs/2103.04098
  • Dataset: https://www.face-benchmark.org/

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

  • Paper(Oral): https://arxiv.org/abs/2103.01520
  • Code: https://github.com/Hzzone/MTLFace
  • Dataset: https://github.com/Hzzone/MTLFace

人脸检测(Face Detection)

HLA-Face: Joint High-Low Adaptation for Low Light Face Detection

  • Homepage: https://daooshee.github.io/HLA-Face-Website/
  • Paper: https://arxiv.org/abs/2104.01984
  • Code: https://github.com/daooshee/HLA-Face-Code

CRFace: Confidence Ranker for Model-Agnostic Face Detection Refinement

  • Paper: https://arxiv.org/abs/2103.07017
  • Code: None

人脸活体检测(Face Anti-Spoofing)

Cross Modal Focal Loss for RGBD Face Anti-Spoofing

  • Paper: https://arxiv.org/abs/2103.00948
  • Code: None

Deepfake检测(Deepfake Detection)

Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

  • Paper:https://arxiv.org/abs/2103.01856
  • Code: None

Multi-attentional Deepfake Detection

  • Paper:https://arxiv.org/abs/2103.02406
  • Code: None

人脸年龄估计(Age Estimation)

Continuous Face Aging via Self-estimated Residual Age Embedding

  • Paper: https://arxiv.org/abs/2105.00020
  • Code: None

PML: Progressive Margin Loss for Long-tailed Age Classification

  • Paper: https://arxiv.org/abs/2103.02140
  • Code: None

人脸表情识别(Facial Expression Recognition)

Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition

  • Paper: https://arxiv.org/abs/2103.13372
  • Code: None

Deepfakes

MagDR: Mask-guided Detection and Reconstruction for Defending Deepfakes

  • Paper: https://arxiv.org/abs/2103.14211
  • Code: None

人体解析(Human Parsing)

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

  • Paper: https://arxiv.org/abs/2103.04570
  • Code: https://github.com/tfzhou/MG-HumanParsing

2D/3D人体姿态估计(2D/3D Human Pose Estimation)

2D 人体姿态估计

When Human Pose Estimation Meets Robustness: Adversarial Algorithms and Benchmarks

  • Paper: https://arxiv.org/abs/2105.06152
  • Code: None

Pose Recognition with Cascade Transformers

  • Paper: https://arxiv.org/abs/2104.06976

  • Code: https://github.com/mlpc-ucsd/PRTR

DCPose: Deep Dual Consecutive Network for Human Pose Estimation

  • Paper: https://arxiv.org/abs/2103.07254
  • Code: https://github.com/Pose-Group/DCPose

3D 人体姿态估计

PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

  • Paper(Oral): https://arxiv.org/abs/2105.02465

  • Code: https://github.com/jfzhang95/PoseAug

Camera-Space Hand Mesh Recovery via Semantic Aggregation and Adaptive 2D-1D Registration

  • Paper: https://arxiv.org/abs/2103.02845
  • Code: https://github.com/SeanChenxy/HandMesh

Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

  • Paper: https://arxiv.org/abs/2104.01797
  • https://github.com/3dpose/3D-Multi-Person-Pose

HybrIK: A Hybrid Analytical-Neural Inverse Kinematics Solution for 3D Human Pose and Shape Estimation

  • Homepage: https://jeffli.site/HybrIK/
  • Paper: https://arxiv.org/abs/2011.14672
  • Code: https://github.com/Jeff-sjtu/HybrIK

动物姿态估计(Animal Pose Estimation)

From Synthetic to Real: Unsupervised Domain Adaptation for Animal Pose Estimation

  • Paper: https://arxiv.org/abs/2103.14843
  • Code: None

Human Volumetric Capture

POSEFusion: Pose-guided Selective Fusion for Single-view Human Volumetric Capture

  • Homepage: http://www.liuyebin.com/posefusion/posefusion.html

  • Paper(Oral): https://arxiv.org/abs/2103.15331

  • Code: None

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

  • Paper: https://arxiv.org/abs/2104.10442
  • Code: None

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

  • Paper: https://arxiv.org/abs/2103.06495
  • Code: https://github.com/FangShancheng/ABINet

图像压缩

Checkerboard Context Model for Efficient Learned Image Compression

  • Paper: https://arxiv.org/abs/2103.15306
  • Code: None

Slimmable Compressive Autoencoders for Practical Neural Image Compression

  • Paper: https://arxiv.org/abs/2103.15726
  • Code: None

Attention-guided Image Compression by Deep Reconstruction of Compressive Sensed Saliency Skeleton

  • Paper: https://arxiv.org/abs/2103.15368
  • Code: None

模型压缩/剪枝/量化

Teachers Do More Than Teach: Compressing Image-to-Image Models

  • Paper: https://arxiv.org/abs/2103.03467
  • Code: https://github.com/snap-research/CAT

模型剪枝

Dynamic Slimmable Network

  • Paper: https://arxiv.org/abs/2103.13258
  • Code: https://github.com/changlin31/DS-Net

模型量化

Network Quantization with Element-wise Gradient Scaling

  • Paper: https://arxiv.org/abs/2104.00903
  • Code: None

Zero-shot Adversarial Quantization

  • Paper(Oral): https://arxiv.org/abs/2103.15263
  • Code: https://git.io/Jqc0y

Learnable Companding Quantization for Accurate Low-bit Neural Networks

  • Paper: https://arxiv.org/abs/2103.07156
  • Code: None

知识蒸馏(Knowledge Distillation)

Distilling Knowledge via Knowledge Review

  • Paper: https://arxiv.org/abs/2104.09044
  • Code: https://github.com/Jia-Research-Lab/ReviewKD

Distilling Object Detectors via Decoupled Features

  • Paper: https://arxiv.org/abs/2103.14475
  • Code: https://github.com/ggjy/DeFeat.pytorch

超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

  • Homepage: http://mepro.bjtu.edu.cn/resource.html
  • Paper: https://arxiv.org/abs/2104.06174
  • Code: None

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

  • Paper: https://arxiv.org/abs/2103.04039
  • Code: https://github.com/Xiangtaokong/ClassSR

AdderSR: Towards Energy Efficient Image Super-Resolution

  • Paper: https://arxiv.org/abs/2009.08891
  • Code: None

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

  • Paper: https://arxiv.org/abs/2104.09367
  • Code: https://github.com/GlassyWu/AECR-Net

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

  • Paper: None
  • Code: https://github.com/CS-GangXu/TMNet

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

  • Paper: https://arxiv.org/abs/2102.02808
  • Code: https://github.com/swz30/MPRNet

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

  • Paper: https://arxiv.org/abs/2105.02201
  • Code: https://github.com/KumapowerLIU/PD-GAN

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

  • Homepage: https://yzhouas.github.io/projects/TransFill/index.html
  • Paper: https://arxiv.org/abs/2103.15982
  • Code: None

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: https://arxiv.org/abs/2104.14754
  • Code: https://github.com/naver-ai/StyleMapGAN
  • Demo Video: https://youtu.be/qCapNyRA_Ng

High-Fidelity and Arbitrary Face Editing

  • Paper: https://arxiv.org/abs/2103.15814
  • Code: None

Anycost GANs for Interactive Image Synthesis and Editing

  • Paper: https://arxiv.org/abs/2103.03243
  • Code: https://github.com/mit-han-lab/anycost-gan

PISE: Person Image Synthesis and Editing with Decoupled GAN

  • Paper: https://arxiv.org/abs/2103.04023
  • Code: https://github.com/Zhangjinso/PISE

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

  • Paper: http://raywzy.com/
  • Code: http://raywzy.com/

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None
  • Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

  • Paper: https://arxiv.org/abs/2105.03236
  • Code: None

字体生成(Font Generation)

DG-Font: Deformable Generative Networks for Unsupervised Font Generation

  • Paper: https://arxiv.org/abs/2104.03064

  • Code: https://github.com/ecnuycxie/DG-Font

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

  • Homepage: https://zju3dv.github.io/loftr/
  • Paper: https://arxiv.org/abs/2104.00680
  • Code: https://github.com/zju3dv/LoFTR

Convolutional Hough Matching Networks

  • Homapage: http://cvlab.postech.ac.kr/research/CHM/
  • Paper(Oral): https://arxiv.org/abs/2103.16831
  • Code: None

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

  • Paper: https://arxiv.org/abs/2103.15149

  • Code: https://github.com/julia0607/Wide-Range-Image-Blending

反光去除(Reflection Removal)

Robust Reflection Removal with Reflection-free Flash-only Cues

  • Paper: https://arxiv.org/abs/2103.04273
  • Code: https://github.com/ChenyangLEI/flash-reflection-removal

3D点云分类(3D Point Clouds Classification)

Equivariant Point Network for 3D Point Cloud Analysis

  • Paper: https://arxiv.org/abs/2103.14147
  • Code: None

PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds

  • Paper: https://arxiv.org/abs/2103.14635
  • Code: https://github.com/CVMI-Lab/PAConv

3D目标检测(3D Object Detection)

Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

  • Paper: https://arxiv.org/abs/2104.06114
  • Code: https://github.com/cheng052/BRNet

HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

  • Homepage: https://cvlab.yonsei.ac.kr/projects/HVPR/

  • Paper: https://arxiv.org/abs/2104.00902

  • Code: https://github.com/cvlab-yonsei/HVPR

LiDAR R-CNN: An Efficient and Universal 3D Object Detector

  • Paper: https://arxiv.org/abs/2103.15297
  • Code: https://github.com/tusimple/LiDAR_RCNN

M3DSSD: Monocular 3D Single Stage Object Detector

  • Paper: https://arxiv.org/abs/2103.13164

  • Code: https://github.com/mumianyuxin/M3DSSD

SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud

  • Paper: None
  • Code: https://github.com/Vegeta2020/SE-SSD

Center-based 3D Object Detection and Tracking

  • Paper: https://arxiv.org/abs/2006.11275
  • Code: https://github.com/tianweiy/CenterPoint

Categorical Depth Distribution Network for Monocular 3D Object Detection

  • Paper: https://arxiv.org/abs/2103.01100
  • Code: None

3D语义分割(3D Semantic Segmentation)

Bidirectional Projection Network for Cross Dimension Scene Understanding

  • Paper(Oral): https://arxiv.org/abs/2103.14326
  • Code: https://github.com/wbhu/BPNet

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion

  • Paper: https://arxiv.org/abs/2103.07074
  • Code: https://github.com/ShiQiu0419/BAAF-Net

Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation

  • Paper: https://arxiv.org/abs/2011.10033
  • Code: https://github.com/xinge008/Cylinder3D

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

  • Homepage: https://github.com/QingyongHu/SensatUrban
  • Paper: http://arxiv.org/abs/2009.03137
  • Code: https://github.com/QingyongHu/SensatUrban
  • Dataset: https://github.com/QingyongHu/SensatUrban

3D全景分割(3D Panoptic Segmentation)

Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2103.14962
  • Code: https://github.com/edwardzhou130/Panoptic-PolarNet

3D目标跟踪(3D Object Trancking)

Center-based 3D Object Detection and Tracking

  • Paper: https://arxiv.org/abs/2006.11275
  • Code: https://github.com/tianweiy/CenterPoint

3D点云配准(3D Point Cloud Registration)

ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning

  • Paper: https://arxiv.org/abs/2103.15231
  • Code: None

PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency

  • Paper: https://arxiv.org/abs/2103.05465
  • Code: https://github.com/XuyangBai/PointDSC

PREDATOR: Registration of 3D Point Clouds with Low Overlap

  • Paper: https://arxiv.org/abs/2011.13005
  • Code: https://github.com/ShengyuH/OverlapPredator

3D点云补全(3D Point Cloud Completion)

Unsupervised 3D Shape Completion through GAN Inversion

  • Homepage: https://junzhezhang.github.io/projects/ShapeInversion/
  • Paper: https://arxiv.org/abs/2104.13366
  • Code: https://github.com/junzhezhang/shape-inversion

Variational Relational Point Completion Network

  • Homepage: https://paul007pl.github.io/projects/VRCNet
  • Paper: https://arxiv.org/abs/2104.10154
  • Code: https://github.com/paul007pl/VRCNet

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion

  • Homepage: https://alphapav.github.io/SpareNet/

  • Paper: https://arxiv.org/abs/2103.02535

  • Code: https://github.com/microsoft/SpareNet

3D重建(3D Reconstruction)

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

  • Paper: https://arxiv.org/abs/2104.00858
  • Code: None

NeuralRecon: Real-Time Coherent 3D Reconstruction from Monocular Video

  • Homepage: https://zju3dv.github.io/neuralrecon/

  • Paper(Oral): https://arxiv.org/abs/2104.00681

  • Code: https://github.com/zju3dv/NeuralRecon

6D位姿估计(6D Pose Estimation)

FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

  • Paper(Oral): https://arxiv.org/abs/2103.07054
  • Code: https://github.com/DC1991/FS-Net

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

  • Paper: http://arxiv.org/abs/2102.12145
  • code: https://git.io/GDR-Net

FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation

  • Paper: https://arxiv.org/abs/2103.02242
  • Code: https://github.com/ethnhe/FFB6D

相机姿态估计

Back to the Feature: Learning Robust Camera Localization from Pixels to Pose

  • Paper: https://arxiv.org/abs/2103.09213
  • Code: https://github.com/cvg/pixloc

深度估计(Depth Estimation)

S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

  • Paper(Oral): https://arxiv.org/abs/2104.00877
  • Code: None

Beyond Image to Depth: Improving Depth Prediction using Echoes

  • Homepage: https://krantiparida.github.io/projects/bimgdepth.html
  • Paper: https://arxiv.org/abs/2103.08468
  • Code: https://github.com/krantiparida/beyond-image-to-depth

S3: Learnable Sparse Signal Superdensity for Guided Depth Estimation

  • Paper: https://arxiv.org/abs/2103.02396
  • Code: None

Depth from Camera Motion and Object Detection

  • Paper: https://arxiv.org/abs/2103.01468
  • Code: https://github.com/griffbr/ODMD
  • Dataset: https://github.com/griffbr/ODMD

立体匹配(Stereo Matching)

A Decomposition Model for Stereo Matching

  • Paper: https://arxiv.org/abs/2104.07516
  • Code: None

光流估计(Flow Estimation)

Self-Supervised Multi-Frame Monocular Scene Flow

  • Paper: https://arxiv.org/abs/2105.02216
  • Code: https://github.com/visinf/multi-mono-sf

RAFT-3D: Scene Flow using Rigid-Motion Embeddings

  • Paper: https://arxiv.org/abs/2012.00726v1
  • Code: None

Learning Optical Flow From Still Images

  • Homepage: https://mattpoggi.github.io/projects/cvpr2021aleotti/

  • Paper: https://mattpoggi.github.io/assets/papers/aleotti2021cvpr.pdf

  • Code: https://github.com/mattpoggi/depthstillation

FESTA: Flow Estimation via Spatial-Temporal Attention for Scene Point Clouds

  • Paper: https://arxiv.org/abs/2104.00798
  • Code: None

车道线检测(Lane Detection)

Keep your Eyes on the Lane: Real-time Attention-guided Lane Detection

  • Paper: https://arxiv.org/abs/2010.12035
  • Code: https://github.com/lucastabelini/LaneATT

轨迹预测(Trajectory Prediction)

Divide-and-Conquer for Lane-Aware Diverse Trajectory Prediction

  • Paper(Oral): https://arxiv.org/abs/2104.08277
  • Code: None

人群计数(Crowd Counting)

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

  • Paper: https://arxiv.org/abs/2105.02440

  • Code: https://github.com/VisDrone/DroneCrowd

  • Dataset: https://github.com/VisDrone/DroneCrowd

对抗样本(Adversarial Examples)

Enhancing the Transferability of Adversarial Attacks through Variance Tuning

  • Paper: https://arxiv.org/abs/2103.15571
  • Code: https://github.com/JHL-HUST/VT

LiBRe: A Practical Bayesian Approach to Adversarial Detection

  • Paper: https://arxiv.org/abs/2103.14835
  • Code: None

Natural Adversarial Examples

  • Paper: https://arxiv.org/abs/1907.07174
  • Code: https://github.com/hendrycks/natural-adv-examples

图像检索(Image Retrieval)

StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval

  • Paper: https://arxiv.org/abs/2103.15706
  • COde: None

QAIR: Practical Query-efficient Black-Box Attacks for Image Retrieval

  • Paper: https://arxiv.org/abs/2103.02927
  • Code: None

视频检索(Video Retrieval)

On Semantic Similarity in Video Retrieval

  • Paper: https://arxiv.org/abs/2103.10095

  • Homepage: https://mwray.github.io/SSVR/

  • Code: https://github.com/mwray/Semantic-Video-Retrieval

跨模态检索(Cross-modal Retrieval)

Cross-Modal Center Loss for 3D Cross-Modal Retrieval

  • Paper: https://arxiv.org/abs/2008.03561
  • Code: https://github.com/LongLong-Jing/Cross-Modal-Center-Loss

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

  • Paper: https://arxiv.org/abs/2103.16553
  • Code: None

Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning

  • Paper: https://www.amazon.science/publications/revamping-cross-modal-recipe-retrieval-with-hierarchical-transformers-and-self-supervised-learning

  • Code: https://github.com/amzn/image-to-recipe-transformers

Zero-Shot Learning

Counterfactual Zero-Shot and Open-Set Visual Recognition

  • Paper: https://arxiv.org/abs/2103.00887
  • Code: https://github.com/yue-zhongqi/gcm-cf

联邦学习(Federated Learning)

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

  • Paper: https://arxiv.org/abs/2103.06030
  • Code: https://github.com/liuquande/FedDG-ELCFS

视频插帧(Video Frame Interpolation)

CDFI: Compression-Driven Network Design for Frame Interpolation

  • Paper: None
  • Code: https://github.com/tding1/CDFI

FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

  • Homepage: https://tarun005.github.io/FLAVR/

  • Paper: https://arxiv.org/abs/2012.08512

  • Code: https://github.com/tarun005/FLAVR

视觉推理(Visual Reasoning)

Transformation Driven Visual Reasoning

  • homepage: https://hongxin2019.github.io/TVR/
  • Paper: https://arxiv.org/abs/2011.13160
  • Code: https://github.com/hughplay/TVR

图像合成(Image Synthesis)

Taming Transformers for High-Resolution Image Synthesis

  • Homepage: https://compvis.github.io/taming-transformers/
  • Paper(Oral): https://arxiv.org/abs/2012.09841
  • Code: https://github.com/CompVis/taming-transformers

视图合成(View Synthesis)

Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes

  • Homepage: https://virtualhumans.mpi-inf.mpg.de/srf/
  • Paper: https://arxiv.org/abs/2104.06935

Self-Supervised Visibility Learning for Novel View Synthesis

  • Paper: https://arxiv.org/abs/2103.15407
  • Code: None

NeX: Real-time View Synthesis with Neural Basis Expansion

  • Homepage: https://nex-mpi.github.io/
  • Paper(Oral): https://arxiv.org/abs/2103.05606

风格迁移(Style Transfer)

Drafting and Revision: Laplacian Pyramid Network for Fast High-Quality Artistic Style Transfer

  • Paper: https://arxiv.org/abs/2104.05376
  • Code: https://github.com/PaddlePaddle/PaddleGAN/

布局生成(Layout Generation)

LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity

  • Paper: None
  • Code: None

Variational Transformer Networks for Layout Generation

  • Paper: https://arxiv.org/abs/2104.02416
  • Code: None

Domain Generalization

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

  • Paper: https://arxiv.org/abs/2103.15597
  • Code: https://github.com/shachoi/RobustNet

Adaptive Methods for Real-World Domain Generalization

  • Paper: https://arxiv.org/abs/2103.15796
  • Code: None

FSDR: Frequency Space Domain Randomization for Domain Generalization

  • Paper: https://arxiv.org/abs/2103.02370
  • Code: None

Domain Adaptation

Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation

  • Paper: https://arxiv.org/abs/2104.00808
  • Code: None

Domain Consensus Clustering for Universal Domain Adaptation

  • Paper: http://reler.net/papers/guangrui_cvpr2021.pdf
  • Code: https://github.com/Solacex/Domain-Consensus-Clustering

Open-Set

Towards Open World Object Detection

  • Paper(Oral): https://arxiv.org/abs/2103.02603
  • Code: https://github.com/JosephKJ/OWOD

Exemplar-Based Open-Set Panoptic Segmentation Network

  • Homepage: https://cv.snu.ac.kr/research/EOPSN/
  • Paper: https://arxiv.org/abs/2105.08336
  • Code: https://github.com/jd730/EOPSN

Learning Placeholders for Open-Set Recognition

  • Paper(Oral): https://arxiv.org/abs/2103.15086
  • Code: None

Adversarial Attack

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

  • Paper: https://arxiv.org/abs/2103.14938
  • Code: https://github.com/VISION-SJTU/IoUattack

"人-物"交互(HOI)检测

HOTR: End-to-End Human-Object Interaction Detection with Transformers

  • Paper: https://arxiv.org/abs/2104.13682
  • Code: None

Query-Based Pairwise Human-Object Interaction Detection with Image-Wide Contextual Information

  • Paper: https://arxiv.org/abs/2103.05399
  • Code: https://github.com/hitachi-rd-cv/qpic

Reformulating HOI Detection as Adaptive Set Prediction

  • Paper: https://arxiv.org/abs/2103.05983
  • Code: https://github.com/yoyomimi/AS-Net

Detecting Human-Object Interaction via Fabricated Compositional Learning

  • Paper: https://arxiv.org/abs/2103.08214
  • Code: https://github.com/zhihou7/FCL

End-to-End Human Object Interaction Detection with HOI Transformer

  • Paper: https://arxiv.org/abs/2103.04503
  • Code: https://github.com/bbepoch/HoiTransformer

阴影去除(Shadow Removal)

Auto-Exposure Fusion for Single-Image Shadow Removal

  • Paper: https://arxiv.org/abs/2103.01255
  • Code: https://github.com/tsingqguo/exposure-fusion-shadow-removal

虚拟换衣(Virtual Try-On)

Parser-Free Virtual Try-on via Distilling Appearance Flows

基于外观流蒸馏的无需人体解析的虚拟换装

  • Paper: https://arxiv.org/abs/2103.04559
  • Code: https://github.com/geyuying/PF-AFN

数据集(Datasets)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

  • Paper: https://arxiv.org/abs/2105.09188
  • Code: https://github.com/csjliang/LPTN
  • Dataset: https://github.com/csjliang/LPTN

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

  • Paper: https://arxiv.org/abs/2105.02440

  • Code: https://github.com/VisDrone/DroneCrowd

  • Dataset: https://github.com/VisDrone/DroneCrowd

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

  • Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
  • Paper(Oral): https://arxiv.org/abs/2104.12690
  • Code: https://github.com/fidler-lab/efficient-annotation-cookbook

论文下载链接:

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

  • Paper: https://arxiv.org/abs/2012.05258
  • Code: https://github.com/joe-siyuan-qiao/ViP-DeepLab
  • Dataset: https://github.com/joe-siyuan-qiao/ViP-DeepLab

Learning To Count Everything

  • Paper: https://arxiv.org/abs/2104.08391
  • Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
  • Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

Semantic Image Matting

  • Paper: https://arxiv.org/abs/2104.08201
  • Code: https://github.com/nowsyn/SIM
  • Dataset: https://github.com/nowsyn/SIM

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

  • Homepage: http://mepro.bjtu.edu.cn/resource.html
  • Paper: https://arxiv.org/abs/2104.06174
  • Code: None

Visual Semantic Role Labeling for Video Understanding

  • Homepage: https://vidsitu.org/

  • Paper: https://arxiv.org/abs/2104.00990

  • Code: https://github.com/TheShadow29/VidSitu

  • Dataset: https://github.com/TheShadow29/VidSitu

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

  • Homepage: https://www.vspwdataset.com/
  • Paper: https://www.vspwdataset.com/CVPR2021__miao.pdf
  • GitHub: https://github.com/sssdddwww2/vspw_dataset_download

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

  • Homepage: https://vap.aau.dk/sewer-ml/
  • Paper: https://arxiv.org/abs/2103.10619

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

  • Homepage: https://vap.aau.dk/sewer-ml/

  • Paper: https://arxiv.org/abs/2103.10895

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

  • Paper: https://arxiv.org/abs/2103.03375
  • Dataset: None

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

  • Homepage: https://github.com/QingyongHu/SensatUrban
  • Paper: http://arxiv.org/abs/2009.03137
  • Code: https://github.com/QingyongHu/SensatUrban
  • Dataset: https://github.com/QingyongHu/SensatUrban

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

  • Paper(Oral): https://arxiv.org/abs/2103.01520
  • Code: https://github.com/Hzzone/MTLFace
  • Dataset: https://github.com/Hzzone/MTLFace

Depth from Camera Motion and Object Detection

  • Paper: https://arxiv.org/abs/2103.01468
  • Code: https://github.com/griffbr/ODMD
  • Dataset: https://github.com/griffbr/ODMD

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Homepage: http://rl.uni-freiburg.de/research/multimodal-distill
  • Paper: https://arxiv.org/abs/2103.01353
  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

  • Paper: https://arxiv.org/abs/2012.02206

  • Code: https://github.com/daveredrum/Scan2Cap

  • Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Paper: https://arxiv.org/abs/2103.01353
  • Code: http://rl.uni-freiburg.de/research/multimodal-distill
  • Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

其他(Others)

Omnimatte: Associating Objects and Their Effects in Video

  • Homepage: https://omnimatte.github.io/

  • Paper(Oral): https://arxiv.org/abs/2105.06993

  • Code: https://omnimatte.github.io/#code

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

  • Homepage: https://fidler-lab.github.io/efficient-annotation-cookbook/
  • Paper(Oral): https://arxiv.org/abs/2104.12690
  • Code: https://github.com/fidler-lab/efficient-annotation-cookbook

Motion Representations for Articulated Animation

  • Paper: https://arxiv.org/abs/2104.11280
  • Code: https://github.com/snap-research/articulated-animation

Deep Lucas-Kanade Homography for Multimodal Image Alignment

  • Paper: https://arxiv.org/abs/2104.11693
  • Code: https://github.com/placeforyiming/CVPR21-Deep-Lucas-Kanade-Homography

Skip-Convolutions for Efficient Video Processing

  • Paper: https://arxiv.org/abs/2104.11487
  • Code: None

KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control

  • Homepage: http://tomasjakab.github.io/KeypointDeformer

  • Paper(Oral): https://arxiv.org/abs/2104.11224

  • Code: https://github.com/tomasjakab/keypoint_deformer/

Learning To Count Everything

  • Paper: https://arxiv.org/abs/2104.08391
  • Code: https://github.com/cvlab-stonybrook/LearningToCountEverything
  • Dataset: https://github.com/cvlab-stonybrook/LearningToCountEverything

SOLD2: Self-supervised Occlusion-aware Line Description and Detection

  • Paper(Oral): https://arxiv.org/abs/2104.03362
  • Code: https://github.com/cvg/SOLD2

Learning Probabilistic Ordinal Embeddings for Uncertainty-Aware Regression

  • Homepage: https://li-wanhua.github.io/POEs/
  • Paper: https://arxiv.org/abs/2103.13629
  • Code: https://github.com/Li-Wanhua/POEs

LEAP: Learning Articulated Occupancy of People

  • Paper: https://arxiv.org/abs/2104.06849
  • Code: None

Visual Semantic Role Labeling for Video Understanding

  • Homepage: https://vidsitu.org/

  • Paper: https://arxiv.org/abs/2104.00990

  • Code: https://github.com/TheShadow29/VidSitu

  • Dataset: https://github.com/TheShadow29/VidSitu

UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

  • Paper: https://arxiv.org/abs/2104.00946
  • Code: https://github.com/SUTDCV/UAV-Human

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

  • Paper(Oral): https://arxiv.org/abs/2104.00924
  • Code: None

Fully Understanding Generic Objects: Modeling, Segmentation, and Reconstruction

  • Paper: https://arxiv.org/abs/2104.00858
  • Code: None

Towards High Fidelity Face Relighting with Realistic Shadows

  • Paper: https://arxiv.org/abs/2104.00825
  • Code: None

BRepNet: A topological message passing system for solid models

  • Paper(Oral): https://arxiv.org/abs/2104.00706
  • Code: None

Visually Informed Binaural Audio Generation without Binaural Audios

  • Homepage: https://sheldontsui.github.io/projects/PseudoBinaural

  • Paper: None

  • GitHub: https://github.com/SheldonTsui/PseudoBinaural_CVPR2021

  • Demo: https://www.youtube.com/watch?v=r-uC2MyAWQc

Exploring intermediate representation for monocular vehicle pose estimation

  • Paper: None
  • Code: https://github.com/Nicholasli1995/EgoNet

Tuning IR-cut Filter for Illumination-aware Spectral Reconstruction from RGB

  • Paper(Oral): https://arxiv.org/abs/2103.14708
  • Code: None

Invertible Image Signal Processing

  • Paper: https://arxiv.org/abs/2103.15061
  • Code: https://github.com/yzxing87/Invertible-ISP

Video Rescaling Networks with Joint Optimization Strategies for Downscaling and Upscaling

  • Paper: https://arxiv.org/abs/2103.14858
  • Code: None

SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences

  • Paper: https://arxiv.org/abs/2103.14898
  • Code: None

Embedding Transfer with Label Relaxation for Improved Metric Learning

  • Paper: https://arxiv.org/abs/2103.14908
  • Code: None

Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

  • Paper: https://arxiv.org/abs/2103.15076
  • Code: https://github.com/hlei-ziyan/Picasso

Meta-Mining Discriminative Samples for Kinship Verification

  • Paper: https://arxiv.org/abs/2103.15108
  • Code: None

Cloud2Curve: Generation and Vectorization of Parametric Sketches

  • Paper: https://arxiv.org/abs/2103.15536
  • Code: None

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

  • Paper: https://arxiv.org/abs/2103.15538
  • Code: https://github.com/SUTDCV/SUTD-TrafficQA

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

  • Homepage: http://wellyzhang.github.io/project/prae.html

  • Paper: https://arxiv.org/abs/2103.14230

  • Code: None

ACRE: Abstract Causal REasoning Beyond Covariation

  • Homepage: http://wellyzhang.github.io/project/acre.html

  • Paper: https://arxiv.org/abs/2103.14232

  • Code: None

Confluent Vessel Trees with Accurate Bifurcations

  • Paper: https://arxiv.org/abs/2103.14268
  • Code: None

Few-Shot Human Motion Transfer by Personalized Geometry and Texture Modeling

  • Paper: https://arxiv.org/abs/2103.14338
  • Code: https://github.com/HuangZhiChao95/FewShotMotionTransfer

Neural Parts: Learning Expressive 3D Shape Abstractions with Invertible Neural Networks

  • Homepage: https://paschalidoud.github.io/neural_parts
  • Paper: None
  • Code: https://github.com/paschalidoud/neural_parts

Knowledge Evolution in Neural Networks

  • Paper(Oral): https://arxiv.org/abs/2103.05152
  • Code: https://github.com/ahmdtaha/knowledge_evolution

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

  • Paper: https://arxiv.org/abs/2103.02148
  • Code: https://github.com/guopengf/FLMRCM

SGP: Self-supervised Geometric Perception

  • Oral

  • Paper: https://arxiv.org/abs/2103.03114

  • Code: https://github.com/theNded/SGP

Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning

  • Paper: https://arxiv.org/abs/2103.02148
  • Code: https://github.com/guopengf/FLMRCM

Diffusion Probabilistic Models for 3D Point Cloud Generation

  • Paper: https://arxiv.org/abs/2103.01458
  • Code: https://github.com/luost26/diffusion-point-cloud

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

  • Paper: https://arxiv.org/abs/2012.02206

  • Code: https://github.com/daveredrum/Scan2Cap

  • Dataset: https://github.com/daveredrum/ScanRefer

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

  • Paper: https://arxiv.org/abs/2103.01353

  • Code: http://rl.uni-freiburg.de/research/multimodal-distill

  • Dataset: http://rl.uni-freiburg.de/research/multimodal-distill

待添加(TODO)

  • 重磅!腾讯优图20篇论文入选CVPR 2021
  • MePro团队三篇论文被CVPR 2021接收

不确定中没中(Not Sure)

CT Film Recovery via Disentangling Geometric Deformation and Photometric Degradation: Simulated Datasets and Deep Models

  • Paper: none
  • Code: https://github.com/transcendentsky/Film-Recovery

Toward Explainable Reflection Removal with Distilling and Model Uncertainty

  • Paper: none
  • Code: https://github.com/ytpeng-aimlab/CVPR-2021-Toward-Explainable-Reflection-Removal-with-Distilling-and-Model-Uncertainty

DeepOIS: Gyroscope-Guided Deep Optical Image Stabilizer Compensation

  • Paper: none
  • Code: https://github.com/lhaippp/DeepOIS

Exploring Adversarial Fake Images on Face Manifold

  • Paper: none
  • Code: https://github.com/ldz666666/Style-atk

Uncertainty-Aware Semi-Supervised Crowd Counting via Consistency-Regularized Surrogate Task

  • Paper: none
  • Code: https://github.com/yandamengdanai/Uncertainty-Aware-Semi-Supervised-Crowd-Counting-via-Consistency-Regularized-Surrogate-Task

Temporal Contrastive Graph for Self-supervised Video Representation Learning

  • Paper: none
  • Code: https://github.com/YangLiu9208/TCG

Boosting Monocular Depth Estimation Models to High-Resolution via Context-Aware Patching

  • Paper: none
  • Code: https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr

Fast and Memory-Efficient Compact Bilinear Pooling

  • Paper: none
  • Code: https://github.com/cvpr2021kp2/cvpr2021kp2

Identification of Empty Shelves in Supermarkets using Domain-inspired Features with Structural Support Vector Machine

  • Paper: none
  • Code: https://github.com/gapDetection/cvpr2021

Estimating A Child’s Growth Potential From Cephalometric X-Ray Image via Morphology-Aware Interactive Keypoint Estimation

  • Paper: none
  • Code: https://github.com/interactivekeypoint2020/Morph

https://github.com/ShaoQiangShen/CVPR2021

https://github.com/gillesflash/CVPR2021

https://github.com/anonymous-submission1991/BaLeNAS

https://github.com/cvpr2021dcb/cvpr2021dcb

https://github.com/anonymousauthorCV/CVPR2021_PaperID_8578

https://github.com/AldrichZeng/FreqPrune

https://github.com/Anonymous-AdvCAM/Anonymous-AdvCAM

https://github.com/ddfss/datadrive-fss

CVPR 2021 论文和开源项目合集(Papers with Code)相关推荐

  1. CVPR 2019 论文和开源项目合集(Papers with Code)

    CVPR 2019 论文开源项目合集 传送门:CVPR 2020 论文开源项目合集 附:530 篇 CVPR 2019 论文代码链接 目标检测 目标跟踪 语义分割 实例分割 GAN 人脸检测 人体姿态 ...

  2. CVPR 2020 论文和开源项目合集(Papers with Code)

    CVPR 2020 论文开源项目合集,同时欢迎各位大佬提交issue,分享CVPR 2020开源项目 [推荐阅读] CVPR 2020 virtual ECCV 2020 论文开源项目合集来了:htt ...

  3. CVPR 2022 论文和开源项目合集

    向AI转型的程序员都关注了这个号

  4. CVPR2020 论文开源项目合集含源码

    CVPR2020论文开源项目合集 Exploring Self-attention for Image Recognition 1 2 3 论文:https://hszhao.github.io/pa ...

  5. ECCV 2020开源项目合集 (ECCV 2020 paper list with code/data)

    文章转载自https://www.paperdigest.org/2020/08/eccv-2020-papers-with-code-data/,如有侵权,留言后删除. 以下表格列出了ECCV 20 ...

  6. vant 半圆仪表盘_超全的 Vue 开源项目合集,签收一下

    原标题:超全的 Vue 开源项目合集,签收一下 写在前面 包括一些ui库和比较完整的小项目.ui库会对主要的单选框,多选框,级联选择器,滑块,日期/时间选择器,进度条,分页,弹框,通知,导航菜单,步骤 ...

  7. 查找论文对应开源代码的神器 Papers with Code

    Papers with Code 是一个总结了机器学习论文及其代码实现的网站.大多数论文都是有GitHub代码的.这个网站最好的地方就是对机器学习做了任务分类,检索对应的模型非常方便. 带有代码的论文 ...

  8. 2.4K Star!450 个重磅前端开源项目合集推荐

    大家好,我是你们的 猫哥,还是那个不喜欢吃鱼.又不喜欢喵 的超级猫 ~ 不知不觉,公众号:前端GitHub 和 GitHub 上的仓库 FrontEndGitHub 都已经更新并运营超过半年了呀, 前 ...

  9. 最新最全的 Android 开源项目合集

    原文链接:https://github.com/opendigg/awesome-github-android-ui awesome-github-android-ui 是由OpenDigg整理并维护 ...

最新文章

  1. Vue生命周期函数详解
  2. 【高级数据结构】并查集
  3. matlab算法时间复杂度,关于降低算法时间复杂度问题的探讨和总结
  4. Android 编程下获得应用程序的签名
  5. vue项目中阻止浏览器返回上一页
  6. java读取文件方法
  7. scrapy 工作流程
  8. 模为2的逆元是什么_两种求模m逆元的方法
  9. curd日志记录php,ThinkPHP学习之CURD操作(一)
  10. Linux上的.NET框架Mono 2.0发布
  11. 154.寻找旋转排序数组中的最小值II
  12. 马尔科夫随机场 matlab,matlab马尔可夫随机场
  13. 计算机键盘基础知识课件,高中信息技术计算机基础应用课件:键盘操作练习
  14. Hulu 2020年校招-算法题《Hulu杀》Python
  15. Centos版Linux 一些常用操作命令 收集
  16. webrtc Native C++视频旋转/裁减/绿屏问题
  17. 利用随机数函数掷骰子
  18. 全景虚拟漫游技术实现(three.js vs ThingJS) Javascript 3D开发 前端 物联网 webgl 三维建模 3D模型 虚拟 全景
  19. CRC校验 串行 并行 长除 移位 查表 矩阵
  20. pyepics PV -- 2

热门文章

  1. python-configparser生成ini配置文件
  2. log4j2常见配置
  3. C# SHA1散列算法
  4. JavaScript之document.cookie
  5. vue单文件props写法_vue开发中怎么按需加载需要被填入props和自定义事件的组件?...
  6. 集宁师范学院泉山校区计算机系,集宁师范学院有几个校区及校区地址
  7. matlab生产计划问题,用MATLAB解决综合生产计划编制过程中的优化问题
  8. ubuntu mysql 2003_Ubuntu 安装 mysql Ubuntu 安装 mysql 忘记初始密码解决方法
  9. 我的Java教程,不断整理,反复学习,记录着那些年大学奋斗的青春
  10. 深度学习和目标检测系列教程 6-300:目标检测Fast-RCNN架构