论文合集|| CVPR 2023 论文和开源项目

发表时间：2023-05-31 17:48作者：沃恩智慧

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

Paper: https://arxiv.org/abs/2211.12735
Code: https://github.com/sunsmarterjie/iTPN

Stitchable Neural Networks

Paper: https://arxiv.org/abs/2302.06586
Code: https://github.com/ziplab/SN-Net

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

Paper: https://arxiv.org/abs/2303.03667
Code: https://github.com/JierunChen/FasterNet

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Vision Transformer with Super Token Sampling

Paper: https://arxiv.org/abs/2211.11167
Code: https://github.com/hhb072/SViT

CLIP

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Paper: https://arxiv.org/abs/2301.12959
Code: https://github.com/tobran/GALIP

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

Paper: https://arxiv.org/abs/2303.06285
Code: https://github.com/Yueming6568/DeltaEdit

MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Paper: https://arxiv.org/abs/2212.06785
Code: https://github.com/ZrrSkywalker/I2P-MAE

Generic-to-Specific Distillation of Masked Autoencoders

Paper: https://arxiv.org/abs/2302.14771
Code: https://github.com/pengzhiliang/G2SD

GAN

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

Paper: https://arxiv.org/abs/2303.06285
Code: https://github.com/Yueming6568/DeltaEdit

NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Paper: https://arxiv.org/abs/2212.07388

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

Paper: https://arxiv.org/abs/2211.07600
Code: https://github.com/eladrich/latent-nerf

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Paper: https://arxiv.org/abs/2301.08556

Panoptic Lifting for 3D Scene Understanding with Neural Fields

Paper: https://arxiv.org/abs/2212.09802

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

Paper: https://arxiv.org/abs/2303.06919

DETR

DETRs with Hybrid Matching

Paper: https://arxiv.org/abs/2207.13080
Code: https://github.com/HDETR

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Paper: https://arxiv.org/abs/2302.14772
Code: https://github.com/ShunLu91/PA-DA

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

Paper: https://arxiv.org/abs/2212.06820

ReID(重识别)

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

Paper: https://arxiv.org/abs/2303.07065
Code: https://github.com/vimar-gu/MSINet

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

Paper: https://arxiv.org/abs/2302.07685
Code: https://github.com/sihyun-yu/PVDM

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

Paper: https://arxiv.org/abs/2211.10655

Imagic: Text-Based Real Image Editing with Diffusion Models

Paper: https://arxiv.org/abs/2210.09276

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

Paper: https://arxiv.org/abs/2211.10656

DiffRF: Rendering-guided 3D Radiance Field Diffusion

Paper: https://arxiv.org/abs/2212.01206

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Paper: https://arxiv.org/abs/2212.09478
Code: https://github.com/researchmm/MM-Diffusion

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

Paper: https://arxiv.org/abs/2211.13287
Code: https://github.com/aminshabani/house_diffusion

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Paper: https://arxiv.org/abs/2303.05762
Code: https://github.com/chenweixin107/TrojDiff

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption

Paper: https://arxiv.org/abs/2207.03442
Code: https://github.com/shiyegao/DDA

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

Paper: https://arxiv.org/abs/2303.06885

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

Paper: https://arxiv.org/abs/2211.12735
Code: https://github.com/sunsmarterjie/iTPN

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

Paper: https://arxiv.org/abs/2302.14746

Learning Trajectory-Aware Transformer for Video Super-Resolution

Paper: https://arxiv.org/abs/2204.04216
Code: https://github.com/researchmm/TTVSR

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

Paper: https://arxiv.org/abs/2303.04249

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Paper: https://arxiv.org/abs/2301.06051
Code: https://github.com/Haiyang-W/DSVT

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Paper: https://arxiv.org/abs/2211.10772
Code : https://github.com/ViTAE-Transformer/DeepSolo

BiFormer: Vision Transformer with Bi-Level Routing Attention

Paper: https://arxiv.org/abs/2303.08810
Code: https://github.com/rayleizhu/BiFormer

Vision Transformer with Super Token Sampling

Paper: https://arxiv.org/abs/2211.11167
Code: https://github.com/hhb072/SViT

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Paper: https://arxiv.org/abs/2211.10439

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Paper: https://arxiv.org/abs/2301.01893

Teaching Structured Vision&Language Concepts to Vision&Language Models

Paper: https://arxiv.org/abs/2211.11733

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

Paper: https://arxiv.org/abs/2303.00040

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Paper: https://arxiv.org/abs/2303.02489

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Paper: https://arxiv.org/abs/2303.02483

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

Paper: https://arxiv.org/abs/2303.04077

All in One: Exploring Unified Video-Language Pre-training

Paper: https://arxiv.org/abs/2203.07303
Code: https://github.com/showlab/all-in-one

Position-guided Text Prompt for Vision Language Pre-training

Paper: https://arxiv.org/abs/2212.09737
Code: https://github.com/sail-sg/ptp

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

Paper: https://arxiv.org/abs/2209.14941
Code: https://github.com/yanmin-wu/EDA

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Paper: https://arxiv.org/abs/2303.02489

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Paper: https://arxiv.org/abs/2303.02483
Code: https://github.com/BrandonHanx/FAME-ViL

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

Paper: https://arxiv.org/abs/2303.07284
Code: https://github.com/boheumd/A2Summ

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Paper: https://arxiv.org/abs/2207.02696
Code: https://github.com/WongKinYiu/yolov7

DETRs with Hybrid Matching

Paper: https://arxiv.org/abs/2207.13080
Code: https://github.com/HDETR

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Paper: https://arxiv.org/abs/2212.07593
Code: https://github.com/Fangyi-Chen/SQR

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Paper: https://arxiv.org/abs/2303.05892
Code: https://github.com/LutingWang/OADP

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker

Paper: https://arxiv.org/abs/2206.04656

语义分割(Semantic Segmentation)

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

Paper: https://arxiv.org/abs/2303.07224
Code: https://github.com/THU-LYJ-Lab/AR-Seg

医学图像分割(Medical Image Segmentation)

Label-Free Liver Tumor Segmentation

Paper: https://arxiv.org/abs/2210.14845
Code: https://github.com/MrGiovanni/SyntheticTumors

视频目标分割（Video Object Segmentation）

Two-shot Video Object Segmentation

参考图像分割(Referring Image Segmentation )

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

Paper: https://arxiv.org/abs/2302.07387

3D点云(3D-Point-Cloud)

Physical-World Optical Adversarial Attacks on 3D Face Recognition

Paper: https://arxiv.org/abs/2205.13412
Code: https://github.com/PolyLiYJ/SLAttack.git

3D目标检测(3D Object Detection)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

Paper: https://arxiv.org/abs/2301.06051
Code: https://github.com/Haiyang-W/DSVT

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

Paper: https://arxiv.org/abs/2301.04467

3D语义分割(3D Semantic Segmentation)

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

Paper: https://arxiv.org/abs/2303.11203
Code: https://github.com/l1997i/lim3d

3D语义场景补全(3D Semantic Scene Completion)

Paper: https://arxiv.org/abs/2302.12251
Code: https://github.com/NVlabs/VoxFormer

Low-level Vision

Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

Paper: https://arxiv.org/abs/2303.06859
Code: https://github.com/lixinustc/Casual-IR-DIL

超分辨率(Video Super-Resolution)

Super-Resolution Neural Operator

视频超分辨率

Learning Trajectory-Aware Transformer for Video Super-Resolution

Paper: https://arxiv.org/abs/2204.04216
Code: https://github.com/researchmm/TTVSR

图像生成(Image Generation)

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Paper: https://arxiv.org/abs/2301.12959
Code: https://github.com/tobran/GALIP

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

Paper: https://arxiv.org/abs/2211.09117
Code: https://github.com/LTH14/mage

视频生成(Video Generation)

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Paper: https://arxiv.org/abs/2212.09478
Code: https://github.com/researchmm/MM-Diffusion

视频理解(Video Understanding)

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

Paper: https://arxiv.org/abs/2209.15280
Code: https://github.com/TencentARC/TVTS

行为检测(Action Detection)

TriDet: Temporal Action Detection with Relative Boundary Modeling

Paper: https://arxiv.org/abs/2303.07347
Code: https://github.com/dingfengshi/TriDet

文本检测(Text Detection)

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

Paper: https://arxiv.org/abs/2211.10772
Code : https://github.com/ViTAE-Transformer/DeepSolo

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Paper: https://arxiv.org/abs/2302.14290

Generic-to-Specific Distillation of Masked Autoencoders

Paper: https://arxiv.org/abs/2302.14771
Code: https://github.com/pengzhiliang/G2SD

模型剪枝(Model Pruning)

DepGraph: Towards Any Structural Pruning

Paper: https://arxiv.org/abs/2301.12900
Code: https://github.com/VainF/Torch-Pruning

图像压缩(Image Compression)

Context-Based Trit-Plane Coding for Progressive Image Compression

Paper: https://arxiv.org/abs/2303.05715
Code: https://github.com/seungminjeon-github/CTC

异常检测(Anomaly Detection)

Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

Paper: https://arxiv.org/abs/2111.13495
Code: https://github.com/tiangexiang/SQUID

三维重建(3D Reconstruction)

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

Paper: https://arxiv.org/abs/2211.12886

SparsePose: Sparse-View Camera Pose Regression and Refinement

Paper: https://arxiv.org/abs/2211.16991

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Paper: https://arxiv.org/abs/2303.02375

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

Paper: https://arxiv.org/abs/2302.11566
Code: https://github.com/MoyGcc/vid2avatar

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

Paper: https://arxiv.org/abs/2303.05937

3D Cinemagraphy from a Single Image

Paper: https://arxiv.org/abs/2303.05724
Code: https://github.com/xingyi-li/3d-cinemagraphy

Revisiting Rotation Averaging: Uncertainties and Robust Losses

Paper: https://arxiv.org/abs/2303.05195
Code：https://github.com/zhangganlin/GlobalSfMpy

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

Paper: https://arxiv.org/abs/2211.13874
Code: https://github.com/csbhr/FFHQ-UV

深度估计(Depth Estimation)

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

Paper: https://arxiv.org/abs/2211.13202
Code: https://github.com/noahzn/Lite-Mono

轨迹预测(Trajectory Prediction)

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

Paper: https://arxiv.org/abs/2303.00575

图像描述(Image Captioning)

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

Paper: https://arxiv.org/abs/2303.02437

视觉问答(Visual Question Answering)

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

Paper: https://arxiv.org/abs/2303.01239
Code: https://github.com/jingjing12110/MixPHM

手语识别(Sign Language Recognition)

Continuous Sign Language Recognition with Correlation Network

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

Paper: https://arxiv.org/abs/2303.03684
Code: https://github.com/anonymous202203/MOSO

新视点合成(Novel View Synthesis)

3D Video Loops from Asynchronous Input

Paper: https://arxiv.org/abs/2303.05312
Code: https://github.com/limacv/VideoLoop3D

Zero-Shot Learning(零样本学习)

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

Paper: https://arxiv.org/abs/2303.08698
Code: https://github.com/Zhicaiwww/Bi-VAEGAN

立体匹配(Stereo Matching)

Iterative Geometry Encoding Volume for Stereo Matching

Paper: https://arxiv.org/abs/2303.06615
Code: https://github.com/gangweiX/IGEV

场景图生成(Scene Graph Generation)

Prototype-based Embedding Network for Scene Graph Generation

Paper: https://arxiv.org/abs/2303.07096

数据集(Datasets)

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Paper: https://arxiv.org/abs/2303.02760

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

Paper: https://arxiv.org/abs/2303.07284
Code: https://github.com/boheumd/A2Summ

其他(Others)

Interactive Segmentation as Gaussian Process Classification

Paper: https://arxiv.org/abs/2302.14578

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

Paper: https://arxiv.org/abs/2302.14677

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

Paper: https://arxiv.org/abs/2302.12828

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

Paper: https://arxiv.org/abs/2211.06885

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

Paper: https://arxiv.org/abs/2212.06331
None: https://github.com/ai4ce/DeepMapping2

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Paper: https://arxiv.org/abs/2302.04866

Token Turing Machines

Paper: https://arxiv.org/abs/2211.09119

Single Image Backdoor Inversion via Robust Smoothed Classifiers

Paper: https://arxiv.org/abs/2303.00215
Code: https://github.com/locuslab/smoothinv

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

Paper: https://arxiv.org/abs/2212.07242
Code: https://github.com/dolorousrtur/hood

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Paper: https://arxiv.org/abs/2302.04866

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Paper: https://arxiv.org/abs/2303.00914

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

Paper: https://arxiv.org/abs/2303.01052

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

Paper: https://arxiv.org/abs/2303.00938

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

Paper: https://arxiv.org/abs/2303.00971
Code: https://github.com/zhijieshen-bjtu/DOPNet

Learning Neural Parametric Head Models

Paper: https://arxiv.org/abs/2212.02761

A Meta-Learning Approach to Predicting Performance and Data Requirements

Paper: https://arxiv.org/abs/2303.01598

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

Paper: https://arxiv.org/abs/2303.03315

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Paper: https://arxiv.org/abs/2303.03052

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

Paper: https://arxiv.org/abs/2303.02700

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

Paper: https://arxiv.org/abs/2303.02328

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Paper: https://arxiv.org/abs/2303.03108

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

Paper: https://arxiv.org/abs/2303.04249

UniHCP: A Unified Model for Human-Centric Perceptions

Paper: https://arxiv.org/abs/2303.02936
Code: https://github.com/OpenGVLab/UniHCP

CUDA: Convolution-based Unlearnable Datasets

Masked Images Are Counterfactual Samples for Robust Fine-tuning

Paper: https://arxiv.org/abs/2303.03052

AdaptiveMix: Robust Feature Representation via Shrinking Feature Space

Paper: https://arxiv.org/abs/2303.01559
Code: https://github.com/WentianZhang-ML/AdaptiveMix

Physical-World Optical Adversarial Attacks on 3D Face Recognition

Paper: https://arxiv.org/abs/2205.13412
Code: https://github.com/PolyLiYJ/SLAttack.git

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

Paper: https://arxiv.org/abs/2301.06281
Code: https://carlyx.github.io/DPE/

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Paper: https://arxiv.org/abs/2211.12194
Code: https://github.com/Winfredy/SadTalker

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

Paper: https://arxiv.org/abs/2303.07337

Upcycling Models under Domain and Category Shift

Paper: https://arxiv.org/abs/2303.07110
Code: https://github.com/ispc-lab/GLC

Modality-Agnostic Debiasing for Single Domain Generalization

Paper: https://arxiv.org/abs/2303.07123

Progressive Open Space Expansion for Open-Set Model Attribution

Paper: https://arxiv.org/abs/2303.06877

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

Paper: https://arxiv.org/abs/2303.06856

GFPose: Learning 3D Human Pose Prior with Gradient Fields

Paper: https://arxiv.org/abs/2212.08641
Code: https://github.com/Embracing/GFPose

PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

Paper: https://arxiv.org/abs/2303.11526
Code: https://github.com/Zhang-VISLab

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

Paper: https://arxiv.org/abs/2303.11502

Boundary Unlearning

Paper: https://arxiv.org/abs/2303.11570

分享到：

昵称：

验证码：

友情链接