转摘CVPR 2022 论文列表
CVPR2022 Papers (Papers/Codes/Demos)
https://github.com/gbstack/cvpr-2022-papers
分类目录:
1. 检测
2. 分割(Segmentation)
3. 图像处理(Image Processing)
4. 估计(Estimation)
5. 图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
6. 人脸(Face)
7. 三维视觉(3D Vision)
8. 目标跟踪(Object Tracking)
9. 医学影像(Medical Imaging)
10. 文本检测/识别(Text Detection/Recognition)
11. 遥感图像(Remote Sensing Image)
12. GAN/生成式/对抗式(GAN/Generative/Adversarial)
13. 图像生成/合成(Image Generation/Image Synthesis)
14. 场景图(Scene Graph
15. 视觉定位(Visual Localization)
16. 视觉推理/视觉问答(Visual Reasoning/VQA)
17. 图像分类(Image Classification)
18. 神经网络结构设计(Neural Network Structure Design)
19. 模型压缩(Model Compression)
20. 模型训练/泛化(Model Training/Generalization)
21. 模型评估(Model Evaluation)
22. 数据处理(Data Processing)
23. 主动学习(Active Learning)
24. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)
25. 持续学习(Continual Learning/Life-long Learning)
26. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)
27. 度量学习(Metric Learning)
28. 对比学习(Contrastive Learning)
29. 增量学习(Incremental Learning)
30. 强化学习(Reinforcement Learning)
31. 元学习(Meta Learning)
32. 多模态学习(Multi-Modal Learning)
33. 视觉预测(Vision-based Prediction)
34. 数据集(Dataset)
35. 机器人(Robotic)
36. 自监督学习/半监督学习
检测
2D目标检测(2D Object Detection)
Oriented RepPoints for Aerial Object Detection(面向空中目标检测的 RepPoints)(小目标检测)
[paper](https://arxiv.org/abs/2105.11111)
| [code](https://github.com/LiWentomng/OrientedRepPoints)
Confidence Propagation Cluster: Unleash Full Potential of Object Detectors(信心传播集群:释放物体检测器的全部潜力)
[paper](https://arxiv.org/abs/2112.00342)
Semantic-aligned Fusion Transformer for One-shot Object Detection(用于一次性目标检测的语义对齐融合转换器)
[paper](https://arxiv.org/abs/2203.09093)
A Dual Weighting Label Assignment Scheme for Object Detection(一种用于目标检测的双重加权标签分配方案)
[paper](https://arxiv.org/abs/2203.09730)
| [code](https://github.com/strongwolf/DW)
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection(混合图像块和 UnMix 特征块用于半监督目标检测)
[paper](https://arxiv.org/abs/2111.10958)
| [code](https://github.com/JongMokKim/mix-unmix)
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection(域自适应对象检测的语义完全图匹配)
[paper](https://arxiv.org/abs/2203.06398)
| [code](https://github.com/CityU-AIM-Group/SIGMA)
Accelerating DETR Convergence via Semantic-Aligned Matching(通过语义对齐匹配加速 DETR 收敛)
[paper](https://arxiv.org/abs/2203.06883)
| [code](https://github.com/ZhangGongjie/SAM-DETR)
Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
[paper](https://arxiv.org/abs/2111.11837)
| [code](https://github.com/yzd-v/FGD)
Unknown-Aware Object Detection: Learning What You Don't Know from Videos in the Wild(未知感知对象检测:从野外视频中学习你不知道的东西)
[paper](https://arxiv.org/abs/2203.03800)
| [code](https://github.com/deeplearning-wisc/stud)
Localization Distillation for Dense Object Detection(密集对象检测的定位蒸馏)
keywords: Bounding Box Regression, Localization Quality Estimation, Knowledge Distillation
[paper](https://arxiv.org/abs/2102.12252)
| [code](https://github.com/HikariTJU/LD)
视频目标检测(Video Object Detection)
Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering(通过联合表示学习和在线聚类进行无监督活动分割)
[paper](https://arxiv.org/abs/2105.13353)
3D目标检测(3D object detection)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers(用于 3D 对象检测的稳健 LiDAR-Camera Fusion 与 Transformer)
[paper](https://arxiv.org/abs/2203.11496)
| [code](https://github.com/XuyangBai/TransFusion)
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds(学习用于 3D LiDAR 点云的高效基于点的检测器)
[paper](https://arxiv.org/abs/2203.11139)
| [code](https://github.com/yifanzhang713/IA-SSD)
Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion(迈向具有深度完成的高质量 3D 检测)
[paper](https://arxiv.org/abs/2203.09780)
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer(使用深度感知 Transformer 的单目 3D 对象检测)
[paper](https://arxiv.org/abs/2203.10981)
| [code](https://github.com/kuanchihhuang/MonoDTR)
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds(从点云进行 3D 对象检测的 Set-to-Set 方法)
[paper](https://arxiv.org/abs/2203.10314)
| [code](https://github.com/skyhehe123/VoxSeT)
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention
[paper](https://arxiv.org/abs/2203.09704)
| [code](https://github.com/Gorilla-Lab-SCUT/VISTA)
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection(单目 3D 目标检测的联合语义和几何成本量)
[paper](https://arxiv.org/abs/2203.08563)
| [code](https://github.com/lianqing11/MonoJSG)
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection(用于多模态 3D 目标检测的激光雷达相机深度融合)
[paper](https://arxiv.org/abs/2203.08195)
| [code](https://github.com/tensorflow/lingvo/tree/master/lingvo/)
Point Density-Aware Voxels for LiDAR 3D Object Detection(用于 LiDAR 3D 对象检测的点密度感知体素)
[paper](https://arxiv.org/abs/2203.05662)
| [code](https://github.com/TRAILab/PDV)
Back to Reality: Weakly-supervised 3D Object Detection with Shape-guided Label Enhancement(带有形状引导标签增强的弱监督 3D 对象检测)
[paper](https://arxiv.org/abs/2203.05238)
| [code](https://github.com/xuxw98/BackToReality)
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes(在 3D 场景中实现稳健的定向边界框检测)
[paper](https://arxiv.org/abs/2011.12001)
| [code](https://github.com/qq456cvb/CanonicalVoting)
A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation(在全景分割的指导下,用于基于 LiDAR 的 3D 对象检测的多功能多视图框架)
keywords: 3D Object Detection with Point-based Methods, 3D Object Detection with Grid-based Methods, Cluster-free 3D Panoptic Segmentation, CenterPoint 3D Object Detection
[paper](https://arxiv.org/abs/2203.02133)
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving(自动驾驶中用于单目 3D 目标检测的伪立体)
keywords: Autonomous Driving, Monocular 3D Object Detection
[paper](https://arxiv.org/abs/2203.02112)
| [code](https://github.com/revisitq/Pseudo-Stereo-3D)
伪装目标检测(Camouflaged Object Detection)
Implicit Motion Handling for Video Camouflaged Object Detection(视频伪装对象检测的隐式运动处理)
[paper](https://arxiv.org/abs/2203.07363)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection(放大和缩小:用于伪装目标检测的混合尺度三元组网络)
[paper](https://arxiv.org/abs/2203.02688)
| [code](https://github.com/lartpang/ZoomNet)
显著性目标检测(Saliency Object Detection)
Bi-directional Object-context Prioritization Learning for Saliency Ranking(显着性排名的双向对象上下文优先级学习)
[paper](https://arxiv.org/abs/2203.09416)
| [code](https://github.com/GrassBro/OCOR)
Democracy Does Matter: Comprehensive Feature Mining for Co-Salient Object Detection()
[paper](https://arxiv.org/abs/2203.05787)
关键点检测(Keypoint Detection)
UKPGAN: A General Self-Supervised Keypoint Detector(一个通用的自监督关键点检测器)
[paper](https://arxiv.org/abs/2011.11974)
| [code](https://github.com/qq456cvb/UKPGAN)
车道线检测(Lane Detection)
CLRNet: Cross Layer Refinement Network for Lane Detection(用于车道检测的跨层细化网络)
[paper](https://arxiv.org/abs/2203.10350)
Rethinking Efficient Lane Detection via Curve Modeling(通过曲线建模重新思考高效车道检测)
keywords: Segmentation-based Lane Detection, Point Detection-based Lane Detection, Curve-based Lane Detection, autonomous driving
[paper](https://arxiv.org/abs/2203.02431)
| [code](https://github.com/voldemortX/pytorch-auto-drive)
边缘检测(Edge Detection)
EDTER: Edge Detection with Transformer(使用transformer的边缘检测)
[paper](https://arxiv.org/abs/2203.08566)
| [code](https://github.com/MengyangPu/EDTER)
消失点检测(Vanishing Point Detection)
Deep vanishing point detection: Geometric priors make dataset variations vanish(深度 消失点检测**:几何先验使数据集变化消失)**
[paper](https://arxiv.org/abs/2203.08586)
| [code](https://github.com/yanconglin/VanishingPoint_HoughTransform_GaussianSphere)
分割(Segmentation)
图像分割(Image Segmentation)
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation(学习不分割的内容:关于小样本分割的新视角)
[paper](https://arxiv.org/abs/2203.07615)
| [code](http://github.com/chunbolang/BAM)
CRIS: CLIP-Driven Referring Image Segmentation(CLIP 驱动的参考图像分割)
[paper](https://arxiv.org/abs/2111.15174)
Hyperbolic Image Segmentation(双曲线图像分割)
[paper](https://arxiv.org/abs/2203.05898)
全景分割(Panoptic Segmentation)
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers(使用 Transformers 深入研究全景分割)
[paper](https://arxiv.org/abs/2109.03814)
| [code](https://github.com/zhiqi-li/Panoptic-SegFormer)
Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation(弯曲现实:适应全景语义分割的失真感知Transformer)
keywords: Semanticand panoramic segmentation, Unsupervised domain adaptation, Transformer
[paper](https://arxiv.org/abs/2203.01452)
| [code](https://github.com/jamycheung/Trans4PASS)
语义分割(Semantic Segmentation)
Class-Balanced Pixel-Level Self-Labeling for Domain Adaptive Semantic Segmentation(用于域自适应语义分割的类平衡像素级自标记)
[paper](https://arxiv.org/abs/2203.09744)
| [code](https://github.com/lslrh/CPSL)
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation(弱监督语义分割的区域语义对比和聚合)
[paper](https://arxiv.org/abs/2203.09653)
| [code](https://github.com/maeve07/RCA.git)
Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation(走向稀疏注释的语义分割)
[paper](https://arxiv.org/abs/2203.10739)
| [code](https://github.com/megviiresearch/TEL)
Scribble-Supervised LiDAR Semantic Segmentation
[paper](https://arxiv.org/abs/2203.08537)
| [code](http://github.com/ouenal/scribblekitti)
ADAS: A Direct Adaptation Strategy for Multi-Target Domain Adaptive Semantic Segmentation(多目标域自适应语义分割的直接适应策略)
[paper](https://arxiv.org/abs/2203.06811)
Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast(通过像素到原型对比的弱监督语义分割)
[paper](https://arxiv.org/abs/2110.07110)
Representation Compensation Networks for Continual Semantic Segmentation(连续语义分割的表示补偿网络)
[paper](https://arxiv.org/abs/2203.05402)
| [code](https://github.com/zhangchbin/RCIL)
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels(使用不可靠伪标签的半监督语义分割)
[paper](https://arxiv.org/abs/2203.03884)
| [code](https://github.com/Haochen-Wang409/U2PL/)
Weakly Supervised Semantic Segmentation using Out-of-Distribution Data(使用分布外数据的弱监督语义分割)
[paper](https://arxiv.org/abs/2203.03860)
| [code](https://github.com/naver-ai/w-ood)
Self-supervised Image-specific Prototype Exploration for Weakly Supervised Semantic Segmentation(弱监督语义分割的自监督图像特定原型探索)
[paper](https://arxiv.org/abs/2203.02909)
| [code](https://github.com/chenqi1126/SIPE)
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的多类token Transformer)
[paper](https://arxiv.org/abs/2203.02891)
| [code](https://github.com/xulianuwa/MCTformer)
Cross Language Image Matching for Weakly Supervised Semantic Segmentation(用于弱监督语义分割的跨语言图像匹配)
[paper](https://arxiv.org/abs/2203.02668)
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers(从注意力中学习亲和力:使用 Transformers 的端到端弱监督语义分割)
[paper](https://arxiv.org/abs/2203.02664)
| [code](https://github.com/rulixiang/afa)
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation(让自我训练更好地用于半监督语义分割)
keywords: Semi-supervised learning, Semantic segmentation, Uncertainty estimation
[paper](https://arxiv.org/abs/2106.05095)
| [code](https://github.com/LiheYoung/ST-PlusPlus)
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)
[paper](https://arxiv.org/pdf/2203.00962.pdf)
| [code](https://github.com/zhaozhengChen/ReCAM)
实例分割(Instance Segmentation)
ContrastMask: Contrastive Learning to Segment Every Thing(对比学习分割每件事)
[paper](https://arxiv.org/abs/2203.09775)
Discovering Objects that Can Move(发现可以移动的物体)
[paper](https://arxiv.org/abs/2203.10159)
| [code](https://github.com/zpbao/Discovery_Obj_Move/)
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation(一种基于端到端轮廓的高质量高速实例分割方法)
[paper](https://arxiv.org/abs/2203.04074)
| [code](https://github.com/zhang-tao-whu/e2ec)
Efficient Video Instance Segmentation via Tracklet Query and Proposal(通过 Tracklet Query 和 Proposal 进行高效的视频实例分割)
[paper](https://arxiv.org/abs/2203.01853)
SoftGroup for 3D Instance Segmentation on Point Clouds(用于点云上的 3D 实例分割)
keywords: 3D Vision, Point Clouds, Instance Segmentation
[paper](https://arxiv.org/abs/2203.01509)
| [code](https://github.com/thangvubk/SoftGroup.git)
视频目标分割(Video Object Segmentation)
Language as Queries for Referring Video Object Segmentation(语言作为引用视频对象分割的查询)
[paper](https://arxiv.org/abs/2201.00487)
| [code](https://github.com/wjn922/ReferFormer)
密集预测(Dense Prediction)
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting(具有上下文感知提示的语言引导密集预测)
[paper](https://arxiv.org/abs/2112.01518)
| [code](https://github.com/raoyongming/DenseCLIP)
视频处理(Video Processing)
视频处理(Video Processing)
Neural Compression-Based Feature Learning for Video Restoration(用于视频复原的基于神经压缩的特征学习)
[paper](https://arxiv.org/abs/2203.09208)
视频编辑(Video Editing)
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformers(M3L:通过多模式多级transformer进行基于语言的视频编辑)
[paper](https://arxiv.org/abs/2104.01122)
视频生成/视频合成(Video Generation/Video Synthesis)
Depth-Aware Generative Adversarial Network for Talking Head Video Generation(用于说话头视频生成的深度感知生成对抗网络)
[paper](https://arxiv.org/abs/2203.06605)
| [code](https://github.com/harlanhong/CVPR2022-DaGAN)
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning(告诉我什么并告诉我如何:通过多模式调节进行视频合成)
[paper](https://arxiv.org/abs/2203.02573)
| [code](https://github.com/snap-research/MMVID)
估计(Estimation)
光流/运动估计(Optical Flow/Motion Estimation)
Global Matching with Overlapping Attention for Optical Flow Estimation(具有重叠注意力的全局匹配光流估计)
[paper](https://arxiv.org/abs/2203.11335)
| [code](https://github.com/xiaofeng94/GMFlowNet)
CamLiFlow: Bidirectional Camera-LiDAR Fusion for Joint Optical Flow and Scene Flow Estimation(用于联合光流和场景流估计的双向相机-LiDAR 融合)
[paper](https://arxiv.org/abs/2111.10502)
深度估计(Depth Estimation)
Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation(基于自适应相关的级联循环网络的实用立体匹配)
[paper](https://arxiv.org/abs/2203.11483)
Depth Estimation by Combining Binocular Stereo and Monocular Structured-Light(结合双目立体和单目结构光的深度估计)
[paper](https://arxiv.org/abs/2203.10493)
| [code](https://github.com/YuhuaXu/MonoStereoFusion)
RGB-Depth Fusion GAN for Indoor Depth Completion(用于室内深度完成的 RGB 深度融合 GAN)
[paper](https://arxiv.org/abs/2203.10856)
Revisiting Domain Generalized Stereo Matching Networks from a Feature Consistency Perspective(从特征一致性的角度重新审视域广义立体匹配网络)
[paper](https://arxiv.org/abs/2203.10887)
Deep Depth from Focus with Differential Focus Volume(具有不同焦点体积的焦点深度)
[paper](https://arxiv.org/abs/2112.01712)
ChiTransformer:Towards Reliable Stereo from Cues(从线索走向可靠的立体声)
[paper](https://arxiv.org/abs/2203.04554)
Rethinking Depth Estimation for Multi-View Stereo: A Unified Representation and Focal Loss(重新思考多视图立体的深度估计:统一表示和焦点损失)
[paper](https://arxiv.org/abs/2201.01501)
| [code](https://github.com/prstrive/UniMVSNet)
ITSA: An Information-Theoretic Approach to Automatic Shortcut Avoidance and Domain Generalization in Stereo Matching Networks(立体匹配网络中自动避免捷径和域泛化的信息论方法)
keywords: Learning-based Stereo Matching Networks, Single Domain Generalization, Shortcut Learning
[paper](https://arxiv.org/pdf/2201.02263.pdf)
Attention Concatenation Volume for Accurate and Efficient Stereo Matching(用于精确和高效立体匹配的注意力连接体积)
keywords: Stereo Matching, cost volume construction, cost aggregation
[paper](https://arxiv.org/pdf/2203.02146.pdf)
| [code](https://github.com/gangweiX/ACVNet)
Occlusion-Aware Cost Constructor for Light Field Depth Estimation(光场深度估计的遮挡感知成本构造函数)
[paper](https://arxiv.org/pdf/2203.01576.pdf)
| [code](https://github.com/YingqianWang/OACC- Net)
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation(用于单目深度估计的神经窗口全连接 CRF)
keywords: Neural CRFs for Monocular Depth
[paper](https://arxiv.org/pdf/2203.01502.pdf)
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)
keywords: monocular depth estimation(单目深度估计),transformer
[paper](https://arxiv.org/abs/2203.00838)
人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
Ray3D: ray-based 3D human pose estimation for monocular absolute 3D localization(用于单目绝对 3D 定位的基于射线的 3D 人体姿态估计)
[paper](https://arxiv.org/abs/2203.11471)
| [code](https://github.com/YxZhxn/Ray3D)
Capturing Humans in Motion: Temporal-Attentive 3D Human Pose and Shape Estimation from Monocular Video(捕捉运动中的人类:来自单目视频的时间注意 3D 人体姿势和形状估计)
[paper](https://arxiv.org/abs/2203.08534)
Physical Inertial Poser (PIP): Physics-aware Real-time Human Motion Tracking from Sparse Inertial Sensors(来自稀疏惯性传感器的物理感知实时人体运动跟踪)
[paper](https://arxiv.org/abs/2203.08528)
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation(用于多人 3D 姿势估计的分布感知单阶段模型)
[paper](https://arxiv.org/abs/2203.07697)
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation(用于 3D 人体姿势估计的多假设transformer)
[paper](https://arxiv.org/abs/2111.12707)
| [code](https://github.com/Vegetebird/MHFormer)
CDGNet: Class Distribution Guided Network for Human Parsing(用于人类解析的类分布引导网络)
[paper](https://arxiv.org/abs/2111.14173)
Forecasting Characteristic 3D Poses of Human Actions(预测人类行为的特征 3D 姿势)
[paper](https://arxiv.org/abs/2011.15079)
Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)
keywords: Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression
[paper](https://arxiv.org/pdf/2109.03622.pdf)
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)
[paper](https://arxiv.org/pdf/2203.00859.pdf)
图像处理(Image Processing)
超分辨率(Super Resolution)
Local Texture Estimator for Implicit Representation Function(隐式表示函数的局部纹理估计器)
[paper](https://arxiv.org/abs/2111.08918)
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution(一种用于空间变形鲁棒场景文本图像超分辨率的文本注意网络)
[paper](https://arxiv.org/abs/2203.09388)
| [code](https://github.com/mjq11302010044/TATT)
Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution(一种真实图像超分辨率的局部判别学习方法)
[paper](https://arxiv.org/abs/2203.09195)
| [code](https://github.com/csjliang/LDL)
Blind Image Super-resolution with Elaborate Degradation Modeling on Noise and Kernel(对噪声和核进行精细退化建模的盲图像超分辨率)
[paper](https://arxiv.org/abs/2107.00986)
| [code](https://github.com/zsyOAOA/BSRDM)
Reflash Dropout in Image Super-Resolution(图像超分辨率中的闪退dropout)
[paper](https://arxiv.org/abs/2112.12089)
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence(迈向双向任意图像缩放:联合优化和循环幂等)
[paper](https://arxiv.org/abs/2203.00911)
HyperTransformer: A Textural and Spectral Feature Fusion Transformer for Pansharpening(用于全色锐化的纹理和光谱特征融合Transformer)
[paper](https://arxiv.org/abs/2203.02503)
| [code](https://github.com/wgcban/HyperTransformer)
HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging(光谱压缩成像的高分辨率双域学习)
keywords: HSI Reconstruction, Self-Attention Mechanism, Image Frequency Spectrum Analysis
[paper](https://arxiv.org/pdf/2203.02149.pdf)
图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)
Exploring and Evaluating Image Restoration Potential in Dynamic Scenes(探索和评估动态场景中的图像复原潜力)
[paper](https://arxiv.org/abs/2203.11754)
Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction(通过随机收缩加速逆问题的条件扩散模型)
[paper](https://arxiv.org/abs/2112.05146)
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction(用于高效高光谱图像重建的掩模引导光谱变换器)
[paper](https://arxiv.org/abs/2111.07910)
| [code](https://github.com/caiyuanhao1998/MST/)
Restormer: Efficient Transformer for High-Resolution Image Restoration(用于高分辨率图像复原的高效transformer)
[paper](https://arxiv.org/abs/2111.09881)
| [code](https://github.com/swz30/Restormer)
Event-based Video Reconstruction via Potential-assisted Spiking Neural Network(通过电位辅助尖峰神经网络进行基于事件的视频重建)
[paper](https://arxiv.org/pdf/2201.10943.pdf)
图像去噪/去模糊/去雨去雾(Image Denoising)
AP-BSN: Self-Supervised Denoising for Real-World Images via Asymmetric PD and Blind-Spot Network(通过非对称 PD 和盲点网络对真实世界图像进行自监督去噪)
[paper](https://arxiv.org/abs/2203.11799)
| [code](https://github.com/wooseoklee4/AP-BSN)
IDR: Self-Supervised Image Denoising via Iterative Data Refinement(通过迭代数据细化的自监督图像去噪)
[paper](https://arxiv.org/abs/2111.14358)
| [code](https://github.com/zhangyi-3/IDR)
Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots(具有可见盲点的自监督图像去噪)
[paper](https://arxiv.org/abs/2203.06967)
| [code](https://github.com/demonsjin/Blind2Unblind)
E-CIR: Event-Enhanced Continuous Intensity Recovery(事件增强的连续强度恢复)
keywords: Event-Enhanced Deblurring, Video Representation
[paper](https://arxiv.org/abs/2203.01935)
| [code](https://github.com/chensong1995/E-CIR)
图像编辑/图像修复(Image Edit/Inpainting)
High-Fidelity GAN Inversion for Image Attribute Editing(用于图像属性编辑的高保真 GAN 反演)
[paper](https://arxiv.org/abs/2109.06590)
| [code](https://github.com/Tengfei-Wang/HFGI)
Style Transformer for Image Inversion and Editing(用于图像反转和编辑的样式transformer)
[paper](https://arxiv.org/abs/2203.07932)
| [code](https://github.com/sapphire497/style-transformer)
MISF: Multi-level Interactive Siamese Filtering for High-Fidelity Image Inpainting(用于高保真图像修复的多级交互式 Siamese 过滤)
[paper](https://arxiv.org/abs/2203.06304)
| [code](https://github.com/tsingqguo/misf)
HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
[paper](https://arxiv.org/abs/2112.05142)
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)
keywords: Image Inpainting, Transformer, Image Generation
[paper](https://arxiv.org/abs/2203.00867)
| [code](https://github.com/DQiaole/ZITS_inpainting)
图像翻译(Image Translation)
Globetrotter: Connecting Languages by Connecting Images(通过连接图像连接语言)
[paper](https://arxiv.org/abs/2012.04631)
QS-Attn: Query-Selected Attention for Contrastive Learning in I2I Translation(图像翻译中对比学习的查询选择注意)
[paper](https://arxiv.org/abs/2203.08483)
| [code](https://github.com/sapphire497/query-selected-attention)
FlexIT: Towards Flexible Semantic Image Translation(迈向灵活的语义图像翻译)
[paper](https://arxiv.org/abs/2203.04705)
Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks(探索图像到图像翻译任务中对比学习的补丁语义关系)
keywords: image translation, knowledge transfer,Contrastive learning
[paper](https://arxiv.org/pdf/2203.01532.pdf)
风格迁移(Style Transfer)
Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization(任意风格迁移和域泛化的精确特征分布匹配)
[paper](https://arxiv.org/abs/2203.07740)
| [code](https://github.com/YBZh/EFDM)
Style-ERD: Responsive and Coherent Online Motion Style Transfer(响应式和连贯的在线运动风格迁移)
[paper](https://arxiv.org/abs/2203.02574)
CLIPstyler: Image Style Transfer with a Single Text Condition(具有单一文本条件的图像风格转移)
keywords: Style Transfer, Text-guided synthesis, Language-Image Pre-Training (CLIP)
[paper](https://arxiv.org/abs/2112.00374)
人脸(Face)
人脸(Face)
Cross-Modal Perceptionist: Can Face Geometry be Gleaned from Voices?(跨模态感知者:可以从声音中收集面部几何形状吗?)
[paper](https://arxiv.org/abs/2203.09824)
Portrait Eyeglasses and Shadow Removal by Leveraging 3D Synthetic Data(利用 3D 合成数据去除人像眼镜和阴影)
[paper](https://arxiv.org/abs/2203.10474)
| [code](https://github.com/StoryMY/take-off-eyeglasses)
HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network(分层解析胶囊网络的无监督人脸部分发现)
[paper](https://arxiv.org/abs/2203.10699)
FaceFormer: Speech-Driven 3D Facial Animation with Transformers(FaceFormer:带有transformer的语音驱动的 3D 面部动画)
[paper](https://arxiv.org/abs/2112.05329)
| [code](https://evelynfan.github.io/audio2face/)
Sparse Local Patch Transformer for Robust Face Alignment and Landmarks Inherent Relation Learning(用于鲁棒人脸对齐和地标固有关系学习的稀疏局部补丁transformer)
[paper](https://arxiv.org/abs/2203.06541)
| [code](https://github.com/Jiahao-UTS/SLPT-master)
人脸识别/检测(Facial Recognition/Detection)
Privacy-preserving Online AutoML for Domain-Specific Face Detection(用于特定领域人脸检测的隐私保护在线 AutoML)
[paper](https://arxiv.org/abs/2203.08399)
An Efficient Training Approach for Very Large Scale Face Recognition(一种有效的超大规模人脸识别训练方法)
[paper](https://arxiv.org/pdf/2105.10375.pdf)
| [code](https://github.com/tiandunx/FFC)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
FENeRF: Face Editing in Neural Radiance Fields(神经辐射场中的人脸编辑)
[paper](https://arxiv.org/abs/2111.15490)
GCFSR: a Generative and Controllable Face Super Resolution Method Without Facial and GAN Priors(一种没有面部和 GAN 先验的生成可控人脸超分辨率方法)
[paper](https://arxiv.org/abs/2203.07319)
Sparse to Dense Dynamic 3D Facial Expression Generation(稀疏到密集的动态 3D 面部表情生成)
keywords: Facial expression generation, 4D face generation, 3D face modeling
[paper](https://arxiv.org/pdf/2105.07463.pdf)
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)
Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing(通过 Shuffled Style Assembly 进行域泛化以进行人脸反欺骗)
[paper](https://arxiv.org/abs/2203.05340)
| [code](https://github.com/wangzhuo2019/SSAN)
Voice-Face Homogeneity Tells Deepfake
[paper](https://arxiv.org/abs/2203.02195)
| [code](https://github.com/xaCheng1996/VFD)
Protecting Celebrities with Identity Consistency Transformer(使用身份一致性transformer保护名人)
[paper](https://arxiv.org/abs/2203.01318)
目标跟踪(Object Tracking)
目标跟踪(Object Tracking)
Transforming Model Prediction for Tracking(转换模型预测以进行跟踪)
[paper](https://arxiv.org/abs/2203.11192)
| [code](https://github.com/visionml/pytracking)
MixFormer: End-to-End Tracking with Iterative Mixed Attention(具有迭代混合注意力的端到端跟踪)
[paper](https://arxiv.org/abs/2203.11082)
| [code](https://github.com/MCG-NJU/MixFormer)
Unsupervised Domain Adaptation for Nighttime Aerial Tracking(夜间空中跟踪的无监督域自适应)
[paper](https://arxiv.org/abs/2203.10541)
| [code](https://github.com/vision4robotics/UDAT)
Iterative Corresponding Geometry: Fusing Region and Depth for Highly Efficient 3D Tracking of Textureless Objects(迭代对应几何:融合区域和深度以实现无纹理对象的高效 3D 跟踪)
[paper](https://arxiv.org/abs/2203.05334)
| [code](https://github.com/DLR- RM/3DObjectTracking)
TCTrack: Temporal Contexts for Aerial Tracking(空中跟踪的时间上下文)
[paper](https://arxiv.org/abs/2203.01885)
| [code](https://github.com/vision4robotics/TCTrack)
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds(超越 3D 连体跟踪:点云中 3D 单对象跟踪的以运动为中心的范式)
keywords: Single Object Tracking, 3D Multi-object Tracking / Detection, Spatial-temporal Learning on Point Clouds
[paper](https://arxiv.org/abs/2203.01730)
Correlation-Aware Deep Tracking(相关感知深度跟踪)
[paper](https://arxiv.org/abs/2203.01666)
图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)
Bridging Video-text Retrieval with Multiple Choice Questions(桥接视频文本检索与多项选择题)
[paper](https://arxiv.org/abs/2201.04850)
| [code](https://github.com/TencentARC/MCQ)
BEVT: BERT Pretraining of Video Transformers(视频Transformer的 BERT 预训练)
keywords: Video understanding, Vision transformers, Self-supervised representation learning, BERT pretraining
[paper](https://arxiv.org/abs/2112.01529)
| [code](https://github.com/xyzforever/BEVT)
行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)
E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition(用于以自我为中心的动作识别的运动增强事件流)
[paper](https://arxiv.org/abs/2112.03596)
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos(寻找变化:从未修剪的网络视频中学习对象状态和状态修改操作)
[paper](https://arxiv.org/abs/2203.11637)
| [code](https://github.com/zju-vipa/MEAT-TIL)
DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition(鲁棒动作识别的 Transformer 方法中的定向注意)
[paper](https://arxiv.org/abs/2203.10233)
Self-supervised Video Transformer(自监督视频transformer)
[paper](https://arxiv.org/abs/2112.01514)
| [code](https://git.io/J1juJ)
Spatio-temporal Relation Modeling for Few-shot Action Recognition(小样本动作识别的时空关系建模)
[paper](https://arxiv.org/abs/2112.05132)
| [code](https://github.com/Anirudh257/strm)
RCL: Recurrent Continuous Localization for Temporal Action Detection(用于时间动作检测的循环连续定位)
[paper](https://arxiv.org/abs/2203.07112)
OpenTAL: Towards Open Set Temporal Action Localization(走向开放集时间动作定位)
[paper](https://arxiv.org/abs/2203.05114)
| [code](https://www.rit.edu/actionlab/opental)
End-to-End Semi-Supervised Learning for Video Action Detection(视频动作检测的端到端半监督学习)
[paper](https://arxiv.org/abs/2203.04251)
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos(模态特定注释视频上多模态动作识别的可学习不相关模态丢失)
[paper](https://arxiv.org/abs/2203.03014)
Weakly Supervised Temporal Action Localization via Representative Snippet Knowledge Propagation(通过代表性片段知识传播的弱监督时间动作定位)
[paper](https://arxiv.org/abs/2203.02925)
| [code](https://github.com/LeonHLJ/RSKP)
Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)
keywords: Online action detection(在线动作检测)
[paper](https://arxiv.org/pdf/2203.01057.pdf)
行人重识别/检测(Re-Identification/Detection)
Cascade Transformers for End-to-End Person Search(用于端到端人员搜索的级联transformer)
[paper](https://arxiv.org/abs/2203.09642)
| [code](https://github.com/Kitware/COAT)
图像/视频字幕(Image/Video Caption)
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources(通过在线资源对上下文外图像进行开放域、基于内容、多模式的事实检查)
[paper](https://arxiv.org/abs/2112.00061)
| [code](https://s-abdelnabi.github.io/OoC-multi-modal-fc/)
Hierarchical Modular Network for Video Captioning(用于视频字幕的分层模块化网络)
[paper](https://arxiv.org/abs/2111.12476)
| [code](https://github.com/MarcusNerva/HMN)
X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
[paper](https://arxiv.org/pdf/2203.00843.pdf)
医学影像(Medical Imaging)
医学影像(Medical Imaging)
ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(半监督医学图像分类的反课程伪标签)
[paper](https://arxiv.org/abs/2111.12918)
Vox2Cortex: Fast Explicit Reconstruction of Cortical Surfaces from 3D MRI Scans with Geometric Deep Neural Networks(使用几何深度神经网络从 3D MRI 扫描中快速显式重建皮质表面)
[paper](https://arxiv.org/abs/2203.09446)
| [code](https://github.com/ai-med/Vox2Cortex)
Generalizable Cross-modality Medical Image Segmentation via Style Augmentation and Dual Normalization(通过风格增强和双重归一化的可泛化跨模态医学图像分割)
[paper](https://arxiv.org/abs/2112.11177)
| [code](https://github.com/zzzqzhou/Dual-Normalization)
Adaptive Early-Learning Correction for Segmentation from Noisy Annotations(从噪声标签中分割的自适应早期学习校正)
keywords: medical-imaging segmentation, Noisy Annotations
[paper](https://arxiv.org/abs/2110.03740)
| [code](https://github.com/Kangningthu/ADELE)
Temporal Context Matters: Enhancing Single Image Prediction with Disease Progression Representations(时间上下文很重要:使用疾病进展表示增强单图像预测)
keywords: Self-supervised Transformer, Temporal modeling of disease progression
[paper](https://arxiv.org/abs/2203.01933)
文本检测/识别/理解(Text Detection/Recognition/Understanding)
文本检测/识别/理解(Text Detection/Recognition/Understanding)
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition(通过文本检测和文本识别之间更好的协同作用进行场景文本定位)
[paper](https://arxiv.org/abs/2203.10209)
| [code](https://github.com/mxin262/SwinTextSpotter)
Fourier Document Restoration for Robust Document Dewarping and Recognition(用于鲁棒文档去扭曲和识别的傅里叶文档恢复)
[paper](https://arxiv.org/abs/2203.09910)
| [code](https://sg-vilab.github.io/event/warpdoc/)
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding(迈向布局感知多模式网络,以实现视觉丰富的文档理解)
[paper](https://arxiv.org/abs/2203.06947)
GAN/生成式/对抗式(GAN/Generative/Adversarial)
GAN/生成式/对抗式(GAN/Generative/Adversarial)
Subspace Adversarial Training(子空间对抗训练)
[paper](https://arxiv.org/abs/2111.12229)
| [code](https://github.com/nblt/Sub-AT)
DTA: Physical Camouflage Attacks using Differentiable Transformation Network(使用可微变换网络的物理伪装攻击)
[paper](https://arxiv.org/abs/2203.09831)
| [code](https://islab-ai.github.io/dta-cvpr2022/)
Improving the Transferability of Targeted Adversarial Examples through Object-Based Diverse Input(通过基于对象的多样化输入提高目标对抗样本的可迁移性)
[paper](https://arxiv.org/abs/2203.09123)
| [code](https://github.com/dreamflake/ODI)
Towards Practical Certifiable Patch Defense with Vision Transformer(使用 Vision Transformer 实现实用的可认证补丁防御)
[paper](https://arxiv.org/abs/2203.08519)
Few Shot Generative Model Adaption via Relaxed Spatial Structural Alignment(基于松弛空间结构对齐的小样本生成模型自适应)
[paper](https://arxiv.org/abs/2203.04121)
Enhancing Adversarial Training with Second-Order Statistics of Weights(使用权重的二阶统计加强对抗训练)
[paper](https://arxiv.org/abs/2203.06020)
| [code](https://github.com/Alexkael/S2O)
Practical Evaluation of Adversarial Robustness via Adaptive Auto Attack(通过自适应自动攻击对对抗鲁棒性的实际评估)
[paper](https://arxiv.org/abs/2203.05154)
Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity(对语义相似性的频率驱动的不可察觉的对抗性攻击)
[paper](https://arxiv.org/abs/2203.05151)
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon(阴影可能很危险:自然现象的隐秘而有效的物理世界对抗性攻击)
[paper](https://arxiv.org/abs/2203.03818)
Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer(保护面部隐私:通过风格稳健的化妆转移生成对抗性身份面具)
[paper](https://arxiv.org/pdf/2203.03121.pdf)
Adversarial Texture for Fooling Person Detectors in the Physical World(物理世界中愚弄人探测器的对抗性纹理)
[paper](https://arxiv.org/abs/2203.03373)
Label-Only Model Inversion Attacks via Boundary Repulsion(通过边界排斥的仅标签模型反转攻击)
[paper](https://arxiv.org/pdf/2203.01925.pdf)
图像生成/图像合成(Image Generation/Image Synthesis)
图像生成/图像合成(Image Generation/Image Synthesis)
Modulated Contrast for Versatile Image Synthesis(用于多功能图像合成的调制对比度)
[paper](https://arxiv.org/abs/2203.09333)
| [code](https://github.com/fnzhan/MoNCE)
Attribute Group Editing for Reliable Few-shot Image Generation(属性组编辑用于可靠的小样本图像生成)
[paper](https://arxiv.org/abs/2203.08422)
| [code](https://github.com/UniBester/AGE)
Text to Image Generation with Semantic-Spatial Aware GAN(使用语义空间感知 GAN 生成文本到图像)
[paper](https://arxiv.org/abs/2104.00567)
| [code](https://github.com/wtliao/text2image)
Playable Environments: Video Manipulation in Space and Time(可播放环境:空间和时间的视频操作)
[paper](https://arxiv.org/abs/2203.01914)
| [code](https://willi-menapace.github.io/playable-environments-website)
FLAG: Flow-based 3D Avatar Generation from Sparse Observations(从稀疏观察中生成基于流的 3D 头像)
[paper](https://arxiv.org/abs/2203.05789)
Dynamic Dual-Output Diffusion Models(动态双输出扩散模型)
[paper](https://arxiv.org/abs/2203.04304)
Exploring Dual-task Correlation for Pose Guided Person Image Generation(探索姿势引导人物图像生成的双任务相关性)
[paper](https://arxiv.org/abs/2203.02910)
| [code](https://github.com/PangzeCheung/Dual-task-Pose-Transformer-Network)
3D Shape Variational Autoencoder Latent Disentanglement via Mini-Batch Feature Swapping for Bodies and Faces(基于小批量特征交换的三维形状变化自动编码器潜在解纠缠)
[paper](https://arxiv.org/pdf/2111.12448.pdf)
| [code](https://github.com/simofoti/3DVAE-SwapDisentangled)
Interactive Image Synthesis with Panoptic Layout Generation(具有全景布局生成的交互式图像合成)
[paper](https://arxiv.org/abs/2203.02104)
Polarity Sampling: Quality and Diversity Control of Pre-Trained Generative Networks via Singular Values(极性采样:通过奇异值对预训练生成网络的质量和多样性控制)
[paper](https://arxiv.org/abs/2203.01993)
Autoregressive Image Generation using Residual Quantization(使用残差量化的自回归图像生成)
[paper](https://arxiv.org/abs/2203.01941)
| [code](https://github.com/kakaobrain/rq-vae-transformer)
三维视觉(3D Vision)
三维视觉(3D Vision)
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings(在 3D 网格中嵌入消息并从 2D 渲染中提取它们)
[paper](https://arxiv.org/abs/2104.13450)
X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
[paper](https://arxiv.org/pdf/2203.00843.pdf)
点云(Point Cloud)
IDEA-Net: Dynamic 3D Point Cloud Interpolation via Deep Embedding Alignment(通过深度嵌入对齐的动态 3D 点云插值)
[paper](https://arxiv.org/abs/2203.11590)
| [code](https://github.com/ZENGYIMING-EAMON/IDEA-Net.git)
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces(没有痛苦,收获很大:通过拟合特征级时空表面,用静态模型对动态点云序列进行分类)
[paper](https://arxiv.org/abs/2203.11113)
| [code](https://github.com/jx-zhong-for-academic-purpose/Kinet)
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation(通用 3D 零件分割的中间监督搜索)
[paper](https://arxiv.org/abs/2203.06558)
Geometric Transformer for Fast and Robust Point Cloud Registration(用于快速和稳健点云配准的几何transformer)
[paper](https://arxiv.org/abs/2202.06688)
| [code](https://github.com/qinzheng93/GeoTransformer)
Contrastive Boundary Learning for Point Cloud Segmentation(点云分割的对比边界学习)
[paper](https://arxiv.org/abs/2203.05272)
| [code](https://github.com/LiyaoTang/contrastBoundary)
Shape-invariant 3D Adversarial Point Clouds(形状不变的 3D 对抗点云)
[paper](https://arxiv.org/abs/2203.04041)
| [code](https://github.com/shikiw/SI-Adv)
ART-Point: Improving Rotation Robustness of Point Cloud Classifiers via Adversarial Rotation(通过对抗旋转提高点云分类器的旋转鲁棒性)
[paper](https://arxiv.org/abs/2203.03888)
Lepard: Learning partial point cloud matching in rigid and deformable scenes(Lepard:在刚性和可变形场景中学习部分点云匹配)
[paper](https://arxiv.org/abs/2111.12591)
| [code](https://github.com/rabbityl/lepard)
A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)
[paper](https://arxiv.org/pdf/2203.01252.pdf)
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)
keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning
[paper](https://arxiv.org/abs/2203.00680)
| [code](http://github.com/MohamedAfham/CrossPoint)
三维重建(3D Reconstruction)
ϕ-SfT: Shape-from-Template with a Physics-Based Deformation Model(具有基于物理的变形模型的模板形状)
[paper](https://arxiv.org/abs/2203.11938)
| [code](https://4dqv.mpi-inf.mpg.de/phi-SfT/)
Input-level Inductive Biases for 3D Reconstruction(用于 3D 重建的输入级归纳偏差)
[paper](https://arxiv.org/abs/2112.03243)
AutoSDF: Shape Priors for 3D Completion, Reconstruction and Generation(用于 3D 完成、重建和生成的形状先验)
[paper](https://arxiv.org/abs/2203.09516)
Interacting Attention Graph for Single Image Two-Hand Reconstruction(单幅图像双手重建的交互注意力图)
[paper](https://arxiv.org/abs/2203.09364)
| [code](https://github.com/Dw1010/IntagHand)
OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction(实时动态 3D 重建的遮挡感知运动估计)
[paper](https://arxiv.org/abs/2203.07977)
Neural RGB-D Surface Reconstruction(神经 RGB-D 表面重建)
[paper](https://arxiv.org/abs/2104.04532)
Neural Face Identification in a 2D Wireframe Projection of a Manifold Object(流形对象的二维线框投影中的神经人脸识别)
[paper](https://arxiv.org/abs/2203.04229)
| [code](https://manycore- research.github.io/faceformer)
Generating 3D Bio-Printable Patches Using Wound Segmentation and Reconstruction to Treat Diabetic Foot Ulcers(使用伤口分割和重建生成 3D 生物可打印贴片以治疗糖尿病足溃疡)
keywords: semantic segmentation, 3D reconstruction, 3D bio-printers
[paper](https://arxiv.org/pdf/2203.03814.pdf)
H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)
keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)
[paper](https://arxiv.org/pdf/2203.01247.pdf)
场景重建/视图合成/新视角合成(Novel View Synthesis)
NeRFusion: Fusing Radiance Fields for Large-Scale Scene Reconstruction(用于大规模场景重建的融合辐射场)
[paper](https://arxiv.org/abs/2203.11283)
GeoNeRF: Generalizing NeRF with Geometry Priors(用几何先验概括 NeRF)
[paper](https://arxiv.org/abs/2111.13539)
| [code](https://www.idiap.ch/paper/geonerf)
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions(室内 3D 场景重建的风格转换)
[paper](https://arxiv.org/abs/2112.01530)
| [code](https://github.com/lukasHoel/stylemesh)
Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image(向外看:从单个图像合成一致的长期 3D 场景视频)
[paper](https://arxiv.org/abs/2203.09457)
| [code](https://github.com/xrenaa/Look-Outside-Room)
Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
[paper](https://arxiv.org/abs/2201.08845)
| [code](https://github.com/Xharlie/pointnerf)
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
[paper](https://arxiv.org/abs/2112.05139)
| [code](https://cassiepython.github.io/clipnerf/)
Point-NeRF: Point-based Neural Radiance Fields(基于点的神经辐射场)
[paper](https://arxiv.org/pdf/2201.08845.pdf)
| [code](https://github.com/Xharlie/pointnerf)
模型压缩(Model Compression)
知识蒸馏(Knowledge Distillation)
Decoupled Knowledge Distillation(解耦知识蒸馏)
[paper](https://arxiv.org/abs/2203.08679)
| [code](https://github.com/megvii-research/mdistiller)
Wavelet Knowledge Distillation: Towards Efficient Image-to-Image Translation(小波知识蒸馏:迈向高效的图像到图像转换)
[paper](https://arxiv.org/abs/2203.06321)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability(知识蒸馏作为高效的预训练:更快的收敛、更高的数据效率和更好的可迁移性)
[paper](https://arxiv.org/abs/2203.05180)
| [code](https://github.com/CVMI-Lab/KDEP)
Focal and Global Knowledge Distillation for Detectors(探测器的焦点和全局知识蒸馏)
keywords: Object Detection, Knowledge Distillation
[paper](https://arxiv.org/abs/2111.11837)
| [code](https://github.com/yzd-v/FGD)
剪枝(Pruning)
Interspace Pruning: Using Adaptive Filter Representations to Improve Training of Sparse CNNs(空间剪枝:使用自适应滤波器表示来改进稀疏 CNN 的训练)
[paper](https://arxiv.org/abs/2203.07808)
量化(Quantization)
Implicit Feature Decoupling with Depthwise Quantization(使用深度量化的隐式特征解耦)
[paper](https://arxiv.org/abs/2203.08080)
IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization(学习具有类内异质性的合成图像以进行零样本网络量化)
[paper](https://arxiv.org/abs/2111.09136)
| [code](https://github.com/zysxmu/IntraQ)
神经网络结构设计(Neural Network Structure Design)
神经网络结构设计(Neural Network Structure Design)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning(学习探索样本关系以进行鲁棒表征学习)
keywords: sample relationship, data scarcity learning, Contrastive Self-Supervised Learning, long-tailed recognition, zero-shot learning, domain generalization, self-supervised learning
[paper](https://arxiv.org/abs/2203.01522)
| [code](https://github.com/zhihou7/BatchFormer)
CNN
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing(用于布局感知视觉处理的高效翻译变体卷积)(动态卷积)
[paper](https://arxiv.org/abs/2203.10489)
| [code](https://github.com/JierunChen/TVConv)
On the Integration of Self-Attention and Convolution(自注意力和卷积的整合)
[paper](https://arxiv.org/abs/2111.14556)
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs(将内核扩展到 31x31:重新审视 CNN 中的大型内核设计)
[paper](https://arxiv.org/abs/2203.06717)
| [code](https://github.com/megvii-research/RepLKNet)
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos(视频中稀疏帧差异的端到端 CNN 推断)
keywords: sparse convolutional neural network, video inference accelerating
[paper](https://arxiv.org/abs/2203.03996)
A ConvNet for the 2020s
[paper](https://arxiv.org/abs/2201.03545)
| [code](https://github.com/facebookresearch/ConvNeXt)
Transformer
Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning
[paper](https://arxiv.org/abs/2203.09064)
| [code](https://github.com/StomachCold/HCTransformers)
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition(在视觉transformer中为视觉识别指定协同上下文)
[paper](https://arxiv.org/abs/2111.12994)
| [code](https://github.com/TencentYoutuResearch/VisualRecognition-NomMer)
Delving Deep into the Generalization of Vision Transformers under Distribution Shifts(深入研究分布变化下的视觉Transformer的泛化)
keywords: out-of-distribution (OOD) generalization, Vision Transformers
[paper](https://arxiv.org/abs/2106.07617)
| [code](https://github.com/Phoenix1153/ViT_OOD_generalization)
Mobile-Former: Bridging MobileNet and Transformer(连接 MobileNet 和 Transformer)
keywords: Light-weight convolutional neural networks(轻量卷积神经网络),Combination of CNN and ViT
[paper](https://arxiv.org/abs/2108.05895)
神经网络架构搜索(NAS)
Global Convergence of MAML and Theory-Inspired Neural Architecture Search for Few-Shot Learning(MAML 的全局收敛和受理论启发的神经架构搜索以进行 Few-Shot 学习)
[paper](https://arxiv.org/abs/2203.09137)
| [code](https://github.com/YiteWang/MetaNTK-NAS)
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search(可微架构搜索的 Beta-Decay 正则化)
[paper](https://arxiv.org/abs/2203.01665)
MLP
Dynamic MLP for Fine-Grained Image Classification by Leveraging Geographical and Temporal Information(利用地理和时间信息进行细粒度图像分类的动态 MLP)
[paper](https://arxiv.org/abs/2203.03253)
| [code](https://github.com/ylingfeng/DynamicMLP.git)
Revisiting the Transferability of Supervised Pretraining: an MLP Perspective(重新审视监督预训练的可迁移性:MLP 视角)
[paper](https://arxiv.org/abs/2112.00496)
An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪:量子启发的视觉 MLP)
[paper](https://arxiv.org/abs/2111.12294)
| [code](https://github.com/huawei-noah/CV-Backbones/tree/master/wavemlp_pytorch)
数据处理(Data Processing)
数据处理(Data Processing)
Dataset Distillation by Matching Training Trajectories(通过匹配训练轨迹进行数据集蒸馏)(数据集蒸馏)
[paper](https://arxiv.org/abs/2203.11932)
| [code](https://github.com/GeorgeCazenavette/mtt-distillation)
数据增广(Data Augmentation)
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge(使用教师知识进行数据增强优化)
[paper](https://arxiv.org/abs/2202.12513)
| [code](https://github.com/DensoITLab/TeachAugment)
3D Common Corruptions and Data Augmentation(3D 常见损坏和数据增强)
keywords: Data Augmentation, Image restoration, Photorealistic image synthesis
[paper]```(https://arxiv.org/a