Py学习  »  aigc

CV&AIGC顶会整理 [2024-11-21]

晓飞的算法工程笔记 • 4 天前 • 38 次点击  

今日更新16篇:

  • 计算机视觉会议 12篇
  • 自然语言处理会议 4篇
请注意,大模型的论文多发布于自然语言处理会议中。而由于多模态的发展迅速,部分计算机视觉相关的论文也会发布在自然语言处理顶会中。

计算机视觉会议: 12篇


[0] Towards motion from video diffusion models[cs.CV]
标题:通往视频扩散模型中的运动
作者:Paul Janson, Tiberiu Popa, Eugene Belilovsky
链接:http://arxiv.org/abs/2411.12831
备注:Accepted at ECCV 2024 Workshop :Foundation Models for 3D Humans

[1] Data-to-Model Distillation: Data-Efficient Learning Framework[cs.CV]
标题:数据到模型蒸馏:数据高效学习框架
作者:Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A. Lawryshyn, Konstantinos N. Plataniotis
链接:http://arxiv.org/abs/2411.12841
备注:Accepted in the 18th European Conference on Computer Vision (ECCV 2024), Milan, Italy, September 29 October 4, 2024

[2] From Text to Pose to Image: Improving Diffusion Model Control and Quality[cs.CV]
标题:从文本到姿态到图像:提升扩散模型的控制和质量
作者:Clément Bonnett, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy Cizain, Pablo Ducru
链接:http://arxiv.org/abs/2411.12872
代码:https://github.com/clement-bonnet/text-to-pose
备注:Published at the NeurIPS 2024 Workshop on Compositional Learning: Perspectives, Methods, and Paths Forward

[3] Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity[cs.CV]
标题:提升热MOT:一种利用热身份和运动相似性进行的新型盒子关联方法
作者:Wassim El Ahmar, Dhanvin Kolhatkar, Farzan Nowruzi, Robert Laganiere
链接:http://arxiv.org/abs/2411.12943
代码:https://github.com/wassimea/thermalMOT
备注:Workshop on Towards a Complete Analysis of People, part of the European Conference on Computer Vision (ECCV) 2024

[4] ORID: Organ-Regional Information Driven Framework for Radiology Report Generation[cs.CV]
标题:器官-区域信息驱动框架用于放射学报告生成
作者:Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai
链接:http://arxiv.org/abs/2411.13025
备注:13 pages, 11 figures, WACV2025

[5] Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization[cs.CV]
标题:无监督多模态图像对交替优化法进行的单应性估计
作者:Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon
链接:http://arxiv.org/abs/2411.13036
代码:https://github.com/songsang7/AltO
备注:This paper is accepted to the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)

[6] Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles[cs.CV]
标题:通过对齐嵌入空间集成提高预训练编码器的OoD泛化
作者:Shuman Peng, Arash Khoeini, Sharan Vaswani, Martin Ester
链接:http://arxiv.org/abs/2411.13073
备注:Accepted at the Self-Supervised Learning Workshop and the Unifying Representations in Neural Models Workshop at NeurIPS 2024

[7] RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation[cs.CV]
标题:原始扩散:基于RGB的扩散模型,用于高保真RAW图像生成
作者:Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka, Daisuke Iso
链接:http://arxiv.org/abs/2411.13150
备注:Accepted at WACV 2025

[8] VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation[cs.CV]
标题:变聚合多帧激光雷达3D物体检测
作者:Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki
链接:http://arxiv.org/abs/2411.13186
备注:Accepted by WACV 2025

[9] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation[cs.CV]
标题:XMask3D:开放式词汇量3D语义分割的跨模态掩码推理
作者:Ziyi Wang, Yanbo Wang, Xumin Yu, Jie Zhou, Jiwen Lu
链接:http://arxiv.org/abs/2411.13243
代码:https://github.com/wangzy22/XMask3D
备注:Accepted to NeurIPS 2024

[10] BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation[cs.CV]
标题:BelHouse3D:用于评估3D点云语义分割中遮挡鲁棒性的基准数据集
作者:Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, Patrick Vandewalle
链接:http://arxiv.org/abs/2411.13251
备注:20 pages, 6 figures, 3 tables, accepted at ECCV 2024 Workshops

[11] Entropy Bootstrapping for Weakly Supervised Nuclei Detection[cs.CV]
标题:熵引导式弱监督核检测
作者:James Willoughby, Irina Voiculescu
链接:http://arxiv.org/abs/2411.13528
备注:Submitted for CVPR 2025

自然语言处理会议: 4篇


[0] Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction[cs.CL]
标题:探测语言模型代理在遭受干扰的情况下操作不同经验情境的能力
作者:Sonny George, Chris Sypherd, Dylan Cashman
链接:http://arxiv.org/abs/2411.12828
代码:https://github.com/sonnygeorge/OEDD
期刊:Findings Assoc. Comput. Linguistics: EMNLP 2024 15447-15459 (2024)

[1] SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus[cs.CL]
标题:Scout:情境化多模态人机对话语料库
作者:Stephanie M. Lukin, Claire Bonial, Matthew Marge, Taylor Hudson, Cory J. Hayes, Kimberly A. Pollard, Anthony Baker, Ashley N. Foots, Ron Artstein, Felix Gervits, Mitchell Abrams, Cassidy Henry, Lucia Donatelli, Anton Leuski, Susan G. Hill, David Traum, Clare R. Voss
链接:http://arxiv.org/abs/2411.12844
期刊:2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) https://aclanthology.org/2024.lrec-main.1259/
备注:14 pages, 7 figures

[2] MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers[cs.CL]
标题:内存前驱:通过移除全连接层最小化Transformer计算
作者:Ning Ding, Yehui Tang, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Heng Liao, Yunhe Wang
链接:http://arxiv.org/abs/2411.12992
备注:NeurIPS2024

[3] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations[cs.CL]
标题:适应代理:通过人类演示进行少样本学习以适应多模态网络代理
作者:Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso
链接:http://arxiv.org/abs/2411.13451
备注:18 pages, 3 figures, an abridged version to appear in NeurIPS 2024 AFM Workshop

感谢arxiv.org


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/176148
 
38 次点击