论文合集|| CVPR 2023 论文和开源项目

发表时间:2023-05-31 17:48作者:沃恩智慧

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

Stitchable Neural Networks

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Vision Transformer with Super Token Sampling

CLIP

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation


MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Generic-to-Specific Distillation of Masked Autoencoders


GAN

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation


NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis

Panoptic Lifting for 3D Scene Understanding with Neural Fields

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

DETR

DETRs with Hybrid Matching

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars

ReID(重识别)

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

Imagic: Text-Based Real Image Editing with Diffusion Models

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

DiffRF: Rendering-guided 3D Radiance Field Diffusion

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

Learning Trajectory-Aware Transformer for Video Super-Resolution

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

BiFormer: Vision Transformer with Bi-Level Routing Attention

Vision Transformer with Super Token Sampling

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

Teaching Structured Vision&Language Concepts to Vision&Language Models

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

All in One: Exploring Unified Video-Language Pre-training

Position-guided Text Prompt for Vision Language Pre-training

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

DETRs with Hybrid Matching

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker

语义分割(Semantic Segmentation)

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

医学图像分割(Medical Image Segmentation)

Label-Free Liver Tumor Segmentation

视频目标分割(Video Object Segmentation)

Two-shot Video Object Segmentation

参考图像分割(Referring Image Segmentation )

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation

3D点云(3D-Point-Cloud)

Physical-World Optical Adversarial Attacks on 3D Face Recognition

3D目标检测(3D Object Detection)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

3D语义分割(3D Semantic Segmentation)

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

3D语义场景补全(3D Semantic Scene Completion)

Low-level Vision

Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

超分辨率(Video Super-Resolution)

Super-Resolution Neural Operator

视频超分辨率

Learning Trajectory-Aware Transformer for Video Super-Resolution

图像生成(Image Generation)

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis

视频生成(Video Generation)

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

视频理解(Video Understanding)

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

行为检测(Action Detection)

TriDet: Temporal Action Detection with Relative Boundary Modeling

文本检测(Text Detection)

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Generic-to-Specific Distillation of Masked Autoencoders

模型剪枝(Model Pruning)

DepGraph: Towards Any Structural Pruning

图像压缩(Image Compression)

Context-Based Trit-Plane Coding for Progressive Image Compression

异常检测(Anomaly Detection)

Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

三维重建(3D Reconstruction)

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields

SparsePose: Sparse-View Camera Pose Regression and Refinement

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

3D Cinemagraphy from a Single Image

Revisiting Rotation Averaging: Uncertainties and Robust Losses

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction

深度估计(Depth Estimation)

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

轨迹预测(Trajectory Prediction)

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

图像描述(Image Captioning)

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

视觉问答(Visual Question Answering)

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

手语识别(Sign Language Recognition)

Continuous Sign Language Recognition with Correlation Network

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

新视点合成(Novel View Synthesis)

3D Video Loops from Asynchronous Input

Zero-Shot Learning(零样本学习)

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

立体匹配(Stereo Matching)

Iterative Geometry Encoding Volume for Stereo Matching

场景图生成(Scene Graph Generation)

Prototype-based Embedding Network for Scene Graph Generation

数据集(Datasets)

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

Align and Attend: Multimodal Summarization with Dual Contrastive Losses

其他(Others)

Interactive Segmentation as Gaussian Process Classification

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries

SCOTCH and SODA: A Transformer Video Shadow Detection Framework

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Token Turing Machines

Single Image Backdoor Inversion via Robust Smoothed Classifiers

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

RelightableHands: Efficient Neural Relighting of Articulated Hand Models

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

Learning Neural Parametric Head Models

A Meta-Learning Approach to Predicting Performance and Data Requirements

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

Masked Images Are Counterfactual Samples for Robust Fine-tuning

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

UniHCP: A Unified Model for Human-Centric Perceptions

CUDA: Convolution-based Unlearnable Datasets

Masked Images Are Counterfactual Samples for Robust Fine-tuning

AdaptiveMix: Robust Feature Representation via Shrinking Feature Space

Physical-World Optical Adversarial Attacks on 3D Face Recognition

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

Upcycling Models under Domain and Category Shift

Modality-Agnostic Debiasing for Single Domain Generalization

Progressive Open Space Expansion for Open-Set Model Attribution

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

GFPose: Learning 3D Human Pose Prior with Gradient Fields

PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

Boundary Unlearning

分享到: