CVPR 2021 论文和开源项目合集(Papers with Code)

技术

picture.image

向AI转型的程序员都关注了这个号👇👇👇

人工智能大数据与深度学习 公众号:datayx

Visual Transformer
  1. End-to-End Human Pose and Mesh Reconstruction with Transformers
  1. Temporal-Relational CrossTransformers for Few-Shot Action Recognition
  1. Kaleido-BERT:Vision-Language Pre-training on Fashion Domain
  1. HOTR: End-to-End Human-Object Interaction Detection with Transformers
  1. Multi-Modal Fusion Transformer for End-to-End Autonomous Driving
  1. Pose Recognition with Cascade Transformers
  1. Variational Transformer Networks for Layout Generation
  1. LoFTR: Detector-Free Local Feature Matching with Transformers
  1. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
  1. Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
  1. Transformer Tracking
  1. HR-NAS: Searching Efficient High-Resolution Neural Architectures with Transformers
  1. MIST: Multiple Instance Spatial Transformer
  1. Multimodal Motion Prediction with Stacked Transformers
  1. Revamping cross-modal recipe retrieval with hierarchical Transformers and self-supervised learning
  1. Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
  1. Pre-Trained Image Processing Transformer
  1. End-to-End Video Instance Segmentation with Transformers
  1. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
  1. End-to-End Human Object Interaction Detection with HOI Transformer
  1. Transformer Interpretability Beyond Attention Visualization
  1. Diverse Part Discovery: Occluded Person Re-Identification With Part-Aware Transformer
  • Paper: None
  • Code: None
  1. LayoutTransformer: Scene Layout Generation With Conceptual and Spatial Diversity
  • Paper: None
  • Code: None
  1. Line Segment Detection Using Transformers without Edges
  1. MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers
  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None
  1. SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
  1. Facial Action Unit Detection With Transformers
  • Paper: None
  • Code: None
  1. Clusformer: A Transformer Based Clustering Approach to Unsupervised Large-Scale Face and Visual Landmark Recognition
  • Paper: None
  • Code: None
  1. Lesion-Aware Transformers for Diabetic Retinopathy Grading
  • Paper: None
  • Code: None
  1. Topological Planning With Transformers for Vision-and-Language Navigation
  1. Adaptive Image Transformer for One-Shot Object Detection
  • Paper: None
  • Code: None
  1. Multi-Stage Aggregated Transformer Network for Temporal Language Localization in Videos
  • Paper: None
  • Code: None
  1. Taming Transformers for High-Resolution Image Synthesis
  1. Self-Supervised Video Hashing via Bidirectional Transformers
  • Paper: None
  • Code: None
  1. Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos
  1. Gaussian Context Transformer
  • Paper: None
  • Code: None
  1. General Multi-Label Image Classification With Transformers
  1. Bottleneck Transformers for Visual Recognition
  1. VLN BERT: A Recurrent Vision-and-Language BERT for Navigation
  1. Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
  1. Self-attention based Text Knowledge Mining for Text Detection
  1. SSAN: Separable Self-Attention Network for Video Representation Learning
  • Paper: None
  • Code: None
  1. Scaling Local Self-Attention For Parameter Efficient Visual Backbones
图像分类(Image Classification)

Correlated Input-Dependent Label Noise in Large-Scale Image Classification

2D目标检测(Object Detection)

2D目标检测

Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation

PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery

Domain-Specific Suppression for Adaptive Object Detection

IQDet: Instance-wise Quality Distribution Sampling for Object Detection

Multi-Scale Aligned Distillation for Low-Resolution Detection

Adaptive Class Suppression Loss for Long-Tail Object Detection

VarifocalNet: An IoU-aware Dense Object Detector

Scale-aware Automatic Augmentation for Object Detection

OTA: Optimal Transport Assignment for Object Detection

Distilling Object Detectors via Decoupled Features

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Positive-Unlabeled Data Purification in the Wild for Object Detection

  • Paper: None
  • Code: None

Instance Localization for Self-supervised Detection Pretraining

MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection

End-to-End Object Detection with Fully Convolutional Network

Robust and Accurate Object Detection via Adversarial Learning

I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection

YOLOF:You Only Look One-level Feature

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

General Instance Distillation for Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Multiple Instance Active Learning for Object Detection

Towards Open World Object Detection

Few-Shot目标检测

Adaptive Image Transformer for One-Shot Object Detection

  • Paper: None
  • Code: None

Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection

Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection

Few-Shot Object Detection via Contrastive Proposal Encoding

旋转目标检测

ReDet: A Rotation-equivariant Detector for Aerial Object Detection

单/多目标跟踪(Object Tracking)

单目标跟踪

LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking

Graph Attention Tracking

Rotation Equivariant Siamese Networks for Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Transformer Tracking

多目标跟踪

Multiple Object Tracking with Correlation Learning

Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking

Learning a Proposal Classifier for Multiple Object Tracking

Track to Detect and Segment: An Online Multi-Object Tracker

语义分割(Semantic Segmentation)

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Rethinking BiSeNet For Real-time Semantic Segmentation

Progressive Semantic Segmentation

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Bidirectional Projection Network for Cross Dimension Scene Understanding

Cross-Dataset Collaborative Learning for Semantic Segmentation

Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations

Capturing Omni-Range Context for Omnidirectional Segmentation

Learning Statistical Texture for Semantic Segmentation

PLOP: Learning without Forgetting for Continual Semantic Segmentation

弱监督语义分割

Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation

Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation

BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation

半监督语义分割

Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation

域自适应语义分割

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation

Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation

视频语义分割

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

实例分割(Instance Segmentation)

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Incremental Few-Shot Instance Segmentation

A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features

Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation

Multi-Scale Aligned Distillation for Low-Resolution Detection

Boundary IoU: Improving Object-Centric Image Segmentation Evaluation

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Zero-shot instance segmentation(Not Sure)

视频实例分割

STMask: Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

End-to-End Video Instance Segmentation with Transformers

全景分割(Panoptic Segmentation)

Exemplar-Based Open-Set Panoptic Segmentation Network

MaX-DeepLab: End-to-End Panoptic Segmentation With Mask Transformers

  • Paper: MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
  • Code: None

Panoptic Segmentation Forecasting

Fully Convolutional Networks for Panoptic Segmentation

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

医学图像分割

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

3D医学图像分割

DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation

场景文本检测(Scene Text Detection)

Fourier Contour Embedding for Arbitrary-Shaped Text Detection

场景文本识别(Scene Text Recognition)

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

超分辨率(Super-Resolution)

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

AdderSR: Towards Energy Efficient Image Super-Resolution

去雾(Dehazing)

Contrastive Learning for Compact Single Image Dehazing

视频超分辨率

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

图像恢复(Image Restoration)

Multi-Stage Progressive Image Restoration

图像补全(Image Inpainting)

PD-GAN: Probabilistic Diverse GAN for Image Inpainting

TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations

图像编辑(Image Editing)

StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

High-Fidelity and Arbitrary Face Editing

Anycost GANs for Interactive Image Synthesis and Editing

PISE: Person Image Synthesis and Editing with Decoupled GAN

DeFLOCNet: Deep Image Editing via Flexible Low-level Controls

Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing

  • Paper: None

  • Code: None

图像描述(Image Captioning)

Towards Accurate Text-based Image Captioning with Content Diversity Exploration

图像匹配(Image Matcing)

LoFTR: Detector-Free Local Feature Matching with Transformers

Convolutional Hough Matching Networks

图像融合(Image Blending)

Bridging the Visual Gap: Wide-Range Image Blending

数据集(Datasets)

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets

论文下载链接:

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

Learning To Count Everything

Semantic Image Matting

Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline

Visual Semantic Role Labeling for Video Understanding

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark

Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food

Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges

When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework

Depth from Camera Motion and Object Detection

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

Scan2Cap: Context-aware Dense Captioning in RGB-D Scans

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge

机器学习算法AI大数据技术

搜索公众号添加: datanlp

picture.image

长按图片,识别二维码


阅读过本文的人还看了以下文章:

TensorFlow 2.0深度学习案例实战

基于40万表格数据集TableBank,用MaskRCNN做表格检测

《基于深度学习的自然语言处理》中/英PDF

Deep Learning 中文版初版-周志华团队

【全套视频课】最全的目标检测算法系列讲解,通俗易懂!

《美团机器学习实践》_美团算法团队.pdf

《深度学习入门:基于Python的理论与实现》高清中文PDF+源码

特征提取与图像处理(第二版).pdf

python就业班学习视频,从入门到实战项目

2019最新《PyTorch自然语言处理》英、中文版PDF+源码

《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码

《深度学习之pytorch》pdf+附书源码

PyTorch深度学习快速实战入门《pytorch-handbook》

【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》

《Python数据分析与挖掘实战》PDF+完整源码

汽车行业完整知识图谱项目实战视频(全23课)

李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材

笔记、代码清晰易懂!李航《统计学习方法》最新资源全套!

《神经网络与深度学习》最新2018版中英PDF+源码

将机器学习模型部署为REST API

FashionAI服装属性标签图像识别Top1-5方案分享

重要开源!CNN-RNN-CTC 实现手写汉字识别

yolo3 检测出图像中的不规则汉字

同样是机器学习算法工程师,你的面试为什么过不了?

前海征信大数据算法:风险概率预测

【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类

VGG16迁移学习,实现医学图像识别分类工程项目

特征工程(一)

特征工程(二) :文本数据的展开、过滤和分块

特征工程(三):特征缩放,从词袋到 TF-IDF

特征工程(四): 类别特征

特征工程(五): PCA 降维

特征工程(六): 非线性特征提取和模型堆叠

特征工程(七):图像特征提取和深度学习

如何利用全新的决策树集成级联结构gcForest做特征工程并打分?

Machine Learning Yearning 中文翻译稿

蚂蚁金服2018秋招-算法工程师(共四面)通过

全球AI挑战-场景分类的比赛源码(多模型融合)

斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)

python+flask搭建CNN在线识别手写中文网站

中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程

不断更新资源

深度学习、机器学习、数据分析、python

搜索公众号添加: datayx

picture.image

0
0
0
0
关于作者

文章

0

获赞

0

收藏

0

相关资源
字节跳动 XR 技术的探索与实践
火山引擎开发者社区技术大讲堂第二期邀请到了火山引擎 XR 技术负责人和火山引擎创作 CV 技术负责人,为大家分享字节跳动积累的前沿视觉技术及内外部的应用实践,揭秘现代炫酷的视觉效果背后的技术实现。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论