论文合集|| CVPR 2023 论文和开源项目发表时间:2023-05-31 17:48作者:沃恩智慧 BackboneIntegrally Pre-Trained Transformer Pyramid Networks Stitchable Neural Networks Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network Vision Transformer with Super Token Sampling CLIPGALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation MAELearning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders Generic-to-Specific Distillation of Masked Autoencoders GANDeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation NeRFNoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis Panoptic Lifting for 3D Scene Understanding with Neural Fields NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer DETRDETRs with Hybrid Matching NASPA&DA: Jointly Sampling PAth and DAta for Consistent NAS AvatarsStructured 3D Features for Reconstructing Relightable and Animatable Avatars ReID(重识别)MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID Diffusion Models(扩散模型)Video Probabilistic Diffusion Models in Projected Latent Space Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models Imagic: Text-Based Real Image Editing with Diffusion Models Parallel Diffusion Models of Operator and Image for Blind Inverse Problems DiffRF: Rendering-guided 3D Radiance Field Diffusion MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration Vision TransformerIntegrally Pre-Trained Transformer Pyramid Networks Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors Learning Trajectory-Aware Transformer for Video Super-Resolution Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting BiFormer: Vision Transformer with Bi-Level Routing Attention Vision Transformer with Super Token Sampling BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision 视觉和语言(Vision-Language)GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods Teaching Structured Vision&Language Concepts to Vision&Language Models Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training CapDet: Unifying Dense Captioning and Open-World Detection Pretraining FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding All in One: Exploring Unified Video-Language Pre-training Position-guided Text Prompt for Vision Language Pre-training EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding CapDet: Unifying Dense Captioning and Open-World Detection Pretraining FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Align and Attend: Multimodal Summarization with Dual Contrastive Losses 目标检测(Object Detection)YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors DETRs with Hybrid Matching Enhanced Training of Query-Based Object Detection via Selective Query Recollection Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection 目标跟踪(Object Tracking)Simple Cues Lead to a Strong Multi-Object Tracker 语义分割(Semantic Segmentation)Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos 医学图像分割(Medical Image Segmentation)Label-Free Liver Tumor Segmentation 视频目标分割(Video Object Segmentation)Two-shot Video Object Segmentation 参考图像分割(Referring Image Segmentation )PolyFormer: Referring Image Segmentation as Sequential Polygon Generation 3D点云(3D-Point-Cloud)Physical-World Optical Adversarial Attacks on 3D Face Recognition 3D目标检测(3D Object Detection)DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection 3D语义分割(3D Semantic Segmentation)Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation 3D语义场景补全(3D Semantic Scene Completion)Low-level VisionCausal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective 超分辨率(Video Super-Resolution)Super-Resolution Neural Operator 视频超分辨率Learning Trajectory-Aware Transformer for Video Super-Resolution 图像生成(Image Generation)GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis 视频生成(Video Generation)MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation 视频理解(Video Understanding)Learning Transferable Spatiotemporal Representations from Natural Script Knowledge 行为检测(Action Detection)TriDet: Temporal Action Detection with Relative Boundary Modeling 文本检测(Text Detection)DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting 知识蒸馏(Knowledge Distillation)Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation Generic-to-Specific Distillation of Masked Autoencoders 模型剪枝(Model Pruning)DepGraph: Towards Any Structural Pruning 图像压缩(Image Compression)Context-Based Trit-Plane Coding for Progressive Image Compression 异常检测(Anomaly Detection)Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images 三维重建(3D Reconstruction)OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields SparsePose: Sparse-View Camera Pose Regression and Refinement NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction 3D Cinemagraphy from a Single Image Revisiting Rotation Averaging: Uncertainties and Robust Losses FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction 深度估计(Depth Estimation)Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation 轨迹预测(Trajectory Prediction)IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction 图像描述(Image Captioning)ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing 视觉问答(Visual Question Answering)MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering 手语识别(Sign Language Recognition)Continuous Sign Language Recognition with Correlation Network Paper: https://arxiv.org/abs/2303.03202 Code: https://github.com/hulianyuyy/CorrNet 视频预测(Video Prediction)MOSO: Decomposing MOtion, Scene and Object for Video Prediction 新视点合成(Novel View Synthesis)3D Video Loops from Asynchronous Input Zero-Shot Learning(零样本学习)Bi-directional Distribution Alignment for Transductive Zero-Shot Learning 立体匹配(Stereo Matching)Iterative Geometry Encoding Volume for Stereo Matching 场景图生成(Scene Graph Generation)Prototype-based Embedding Network for Scene Graph Generation 数据集(Datasets)Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes Align and Attend: Multimodal Summarization with Dual Contrastive Losses 其他(Others)Interactive Segmentation as Gaussian Process Classification Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries SCOTCH and SODA: A Transformer Video Shadow Detection Framework DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization RelightableHands: Efficient Neural Relighting of Articulated Hand Models Token Turing Machines Single Image Backdoor Inversion via Robust Smoothed Classifiers To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others RelightableHands: Efficient Neural Relighting of Articulated Hand Models Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness Learning Neural Parametric Head Models A Meta-Learning Approach to Predicting Performance and Data Requirements MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision Masked Images Are Counterfactual Samples for Robust Fine-tuning HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes UniHCP: A Unified Model for Human-Centric Perceptions CUDA: Convolution-based Unlearnable Datasets Masked Images Are Counterfactual Samples for Robust Fine-tuning AdaptiveMix: Robust Feature Representation via Shrinking Feature Space Physical-World Optical Adversarial Attacks on 3D Face Recognition DPE: Disentanglement of Pose and Expression for General Video Portrait Editing SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation Upcycling Models under Domain and Category Shift Modality-Agnostic Debiasing for Single Domain Generalization Progressive Open Space Expansion for Open-Set Model Attribution Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies GFPose: Learning 3D Human Pose Prior with Gradient Fields PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings Boundary Unlearning |