CVPR 2023 论文和开源项目合集

技术

picture.image

向AI转型的程序员都关注了这个号👇👇👇

【CVPR 2023 论文开源目录】
  • Backbone

  • CLIP

  • MAE

  • GAN

  • GNN

  • MLP

  • NAS

  • OCR

  • NeRF

  • DETR

  • Diffusion Models(扩散模型)

  • Avatars

  • ReID(重识别)

  • 长尾分布(Long-Tail)

  • Vision Transformer

  • 视觉和语言(Vision-Language)

  • 自监督学习(Self-supervised Learning)

  • 数据增强(Data Augmentation)

  • 目标检测(Object Detection)

  • 目标跟踪(Visual Tracking)

  • 语义分割(Semantic Segmentation)

  • 实例分割(Instance Segmentation)

  • 全景分割(Panoptic Segmentation)

  • 医学图像分割(Medical Image Segmentation)

  • 视频目标分割(Video Object Segmentation)

  • 参考图像分割(Referring Image Segmentation)

  • 图像抠图(Image Matting)

  • 图像编辑(Image Editing)

  • Low-level Vision

  • 超分辨率(Super-Resolution)

  • 去模糊(Deblur)

  • 3D点云(3D Point Cloud)

  • 3D目标检测(3D Object Detection)

  • 3D语义分割(3D Semantic Segmentation)

  • 3D目标跟踪(3D Object Tracking)

  • 3D人体姿态估计(3D Human Pose Estimation)

  • 3D语义场景补全(3D Semantic Scene Completion)

  • 医学图像(Medical Image)

  • 图像生成(Image Generation)

  • 视频生成(Video Generation)

  • 视频理解(Video Understanding)

  • 行为检测(Action Detection)

  • 文本检测(Text Detection)

  • 知识蒸馏(Knowledge Distillation)

  • 模型剪枝(Model Pruning)

  • 图像压缩(Image Compression)

  • 异常检测(Anomaly Detection)

  • 三维重建(3D Reconstruction)

  • 深度估计(Depth Estimation)

  • 轨迹预测(Trajectory Prediction)

  • 图像描述(Image Captioning)

  • 视觉问答(Visual Question Answering)

  • 手语识别(Sign Language Recognition)

  • 视频预测(Video Prediction)

  • 新视点合成(Novel View Synthesis)

  • Zero-Shot Learning(零样本学习)

  • 立体匹配(Stereo Matching)

  • 场景图生成(Scene Graph Generation)

  • 数据集(Datasets)

  • 新任务(New Tasks)

  • 其他(Others)

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

Stitchable Neural Networks

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

BiFormer: Vision Transformer with Bi-Level Routing Attention

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Vision Transformer with Super Token Sampling

Hard Patches Mining for Masked Image Modeling

  • Paper: None

  • Code: None

CLIP

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Generic-to-Specific Distillation of Masked Autoencoders

GAN

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Panoptic Lifting for 3D Scene Understanding with Neural Fields

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

DETR

DETRs with Hybrid Matching

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

ReID(重识别)

Clothing-Change Feature Augmentation for Person Re-Identification

  • Paper: None
  • Code: None

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

Imagic: Text-Based Real Image Editing with Diffusion Models

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

DiffRF: Rendering-guided 3D Radiance Field Diffusion

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

Learning Trajectory-Aware Transformer for Video Super-Resolution

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

BiFormer: Vision Transformer with Bi-Level Routing Attention

Vision Transformer with Super Token Sampling

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation

  • Paper: None

  • Code: None

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Teaching Structured Vision&Language Concepts to Vision&Language Models

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

All in One: Exploring Unified Video-Language Pre-training

Position-guided Text Prompt for Vision Language Pre-training

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

DETRs with Hybrid Matching

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker

语义分割(Semantic Segmentation)

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

医学图像分割(Medical Image Segmentation)

Label-Free Liver Tumor Segmentation

视频目标分割(Video Object Segmentation)

Two-shot Video Object Segmentation

参考图像分割(Referring Image Segmentation )

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

3D点云(3D-Point-Cloud)

Physical-World Optical Adversarial Attacks on 3D Face Recognition

3D目标检测(3D Object Detection)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

3D Video Object Detection with Learnable Object-Centric Global Optimization

  • Paper: None

  • Code: None

3D语义分割(3D Semantic Segmentation)

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

3D语义场景补全(3D Semantic Scene Completion)
Low-level Vision

Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

超分辨率(Video Super-Resolution)

Super-Resolution Neural Operator

视频超分辨率

Learning Trajectory-Aware Transformer for Video Super-Resolution

图像生成(Image Generation)

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

视频生成(Video Generation)

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

视频理解(Video Understanding)

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

行为检测(Action Detection)

TriDet: Temporal Action Detection with Relative Boundary Modeling

文本检测(Text Detection)

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Generic-to-Specific Distillation of Masked Autoencoders

模型剪枝(Model Pruning)

DepGraph: Towards Any Structural Pruning

图像压缩(Image Compression)

Context-Based Trit-Plane Coding for Progressive Image Compression

异常检测(Anomaly Detection)

Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

三维重建(3D Reconstruction)

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

SparsePose: Sparse-View Camera Pose Regression and Refinement

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

3D Cinemagraphy from a Single Image

Revisiting Rotation Averaging: Uncertainties and Robust Losses

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

深度估计(Depth Estimation)

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

轨迹预测(Trajectory Prediction)

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

图像描述(Image Captioning)

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

视觉问答(Visual Question Answering)

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

手语识别(Sign Language Recognition)

Continuous Sign Language Recognition with Correlation Network

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

新视点合成(Novel View Synthesis)

3D Video Loops from Asynchronous Input

Zero-Shot Learning(零样本学习)

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

Semantic Prompt for Few-Shot Learning

  • Paper: None

  • Code: None

立体匹配(Stereo Matching)

Iterative Geometry Encoding Volume for Stereo Matching

场景图生成(Scene Graph Generation)

Prototype-based Embedding Network for Scene Graph Generation

数据集(Datasets)

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

其他(Others)

Interactive Segmentation as Gaussian Process Classification

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Token Turing Machines

Single Image Backdoor Inversion via Robust Smoothed Classifiers

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

Learning Neural Parametric Head Models

A Meta-Learning Approach to Predicting Performance and Data Requirements

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

Masked Images Are Counterfactual Samples for Robust Fine-tuning

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

UniHCP: A Unified Model for Human-Centric Perceptions

CUDA: Convolution-based Unlearnable Datasets

Masked Images Are Counterfactual Samples for Robust Fine-tuning

AdaptiveMix: Robust Feature Representation via Shrinking Feature Space

Physical-World Optical Adversarial Attacks on 3D Face Recognition

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

  • Paper: None
  • Code: None

Sharpness-Aware Gradient Matching for Domain Generalization

Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization

  • Paper: None
  • Code: None

Blind Video Deflickering by Neural Filtering with a Flawed Atlas

RiDDLE: Reversible and Diversified De-identification with Latent Encryptor

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

Upcycling Models under Domain and Category Shift

Modality-Agnostic Debiasing for Single Domain Generalization

Progressive Open Space Expansion for Open-Set Model Attribution

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

GFPose: Learning 3D Human Pose Prior with Gradient Fields

PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

Boundary Unlearning

机器学习算法AI大数据技术

搜索公众号添加: datanlp

picture.image

长按图片,识别二维码


阅读过本文的人还看了以下文章:

TensorFlow 2.0深度学习案例实战

基于40万表格数据集TableBank,用MaskRCNN做表格检测

《基于深度学习的自然语言处理》中/英PDF

Deep Learning 中文版初版-周志华团队

【全套视频课】最全的目标检测算法系列讲解,通俗易懂!

《美团机器学习实践》_美团算法团队.pdf

《深度学习入门:基于Python的理论与实现》高清中文PDF+源码

《深度学习:基于Keras的Python实践》PDF和代码

特征提取与图像处理(第二版).pdf

python就业班学习视频,从入门到实战项目

2019最新《PyTorch自然语言处理》英、中文版PDF+源码

《21个项目玩转深度学习:基于TensorFlow的实践详解》完整版PDF+附书代码

《深度学习之pytorch》pdf+附书源码

PyTorch深度学习快速实战入门《pytorch-handbook》

【下载】豆瓣评分8.1,《机器学习实战:基于Scikit-Learn和TensorFlow》

《Python数据分析与挖掘实战》PDF+完整源码

汽车行业完整知识图谱项目实战视频(全23课)

李沐大神开源《动手学深度学习》,加州伯克利深度学习(2019春)教材

笔记、代码清晰易懂!李航《统计学习方法》最新资源全套!

《神经网络与深度学习》最新2018版中英PDF+源码

将机器学习模型部署为REST API

FashionAI服装属性标签图像识别Top1-5方案分享

重要开源!CNN-RNN-CTC 实现手写汉字识别

yolo3 检测出图像中的不规则汉字

同样是机器学习算法工程师,你的面试为什么过不了?

前海征信大数据算法:风险概率预测

【Keras】完整实现‘交通标志’分类、‘票据’分类两个项目,让你掌握深度学习图像分类

VGG16迁移学习,实现医学图像识别分类工程项目

特征工程(一)

特征工程(二) :文本数据的展开、过滤和分块

特征工程(三):特征缩放,从词袋到 TF-IDF

特征工程(四): 类别特征

特征工程(五): PCA 降维

特征工程(六): 非线性特征提取和模型堆叠

特征工程(七):图像特征提取和深度学习

如何利用全新的决策树集成级联结构gcForest做特征工程并打分?

Machine Learning Yearning 中文翻译稿

蚂蚁金服2018秋招-算法工程师(共四面)通过

全球AI挑战-场景分类的比赛源码(多模型融合)

斯坦福CS230官方指南:CNN、RNN及使用技巧速查(打印收藏)

python+flask搭建CNN在线识别手写中文网站

中科院Kaggle全球文本匹配竞赛华人第1名团队-深度学习与特征工程

不断更新资源

深度学习、机器学习、数据分析、python

搜索公众号添加: datayx

picture.image

0
0
0
0
关于作者

文章

0

获赞

0

收藏

0

相关资源
云原生机器学习系统落地和实践
机器学习在字节跳动有着丰富业务场景:推广搜、CV/NLP/Speech 等。业务规模的不断增大对机器学习系统从用户体验、训练效率、编排调度、资源利用等方面也提出了新的挑战,而 Kubernetes 云原生理念的提出正是为了应对这些挑战。本次分享将主要介绍字节跳动机器学习系统云原生化的落地和实践。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论