[0] Towards motion from video diffusion models[cs.CV] 标题:通往视频扩散模型中的运动 作者:Paul Janson, Tiberiu Popa, Eugene Belilovsky 链接:http://arxiv.org/abs/2411.12831 备注:Accepted at ECCV 2024 Workshop :Foundation Models for 3D Humans
[1] Data-to-Model Distillation: Data-Efficient Learning Framework[cs.CV] 标题:数据到模型蒸馏:数据高效学习框架 作者:Ahmad Sajedi, Samir Khaki, Lucy Z. Liu, Ehsan Amjadian, Yuri A. Lawryshyn, Konstantinos N. Plataniotis 链接:http://arxiv.org/abs/2411.12841 备注:Accepted in the 18th European Conference on Computer Vision (ECCV 2024), Milan, Italy, September 29 October 4, 2024
[2] From Text to Pose to Image: Improving Diffusion Model Control and Quality[cs.CV] 标题:从文本到姿态到图像:提升扩散模型的控制和质量 作者:Clément Bonnett, Ariel N. Lee, Franck Wertel, Antoine Tamano, Tanguy Cizain, Pablo Ducru 链接:http://arxiv.org/abs/2411.12872 代码:https://github.com/clement-bonnet/text-to-pose 备注:Published at the NeurIPS 2024 Workshop on Compositional Learning: Perspectives, Methods, and Paths Forward
[3] Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity[cs.CV] 标题:提升热MOT:一种利用热身份和运动相似性进行的新型盒子关联方法 作者:Wassim El Ahmar, Dhanvin Kolhatkar, Farzan Nowruzi, Robert Laganiere 链接:http://arxiv.org/abs/2411.12943 代码:https://github.com/wassimea/thermalMOT 备注:Workshop on Towards a Complete Analysis of People, part of the European Conference on Computer Vision (ECCV) 2024
[4] ORID: Organ-Regional Information Driven Framework for Radiology Report Generation[cs.CV] 标题:器官-区域信息驱动框架用于放射学报告生成 作者:Tiancheng Gu, Kaicheng Yang, Xiang An, Ziyong Feng, Dongnan Liu, Weidong Cai 链接:http://arxiv.org/abs/2411.13025 备注:13 pages, 11 figures, WACV2025
[5] Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization[cs.CV] 标题:无监督多模态图像对交替优化法进行的单应性估计 作者:Sanghyeob Song, Jaihyun Lew, Hyemi Jang, Sungroh Yoon 链接:http://arxiv.org/abs/2411.13036 代码:https://github.com/songsang7/AltO 备注:This paper is accepted to the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
[6] Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles[cs.CV] 标题:通过对齐嵌入空间集成提高预训练编码器的OoD泛化 作者:Shuman Peng, Arash Khoeini, Sharan Vaswani, Martin Ester 链接:http://arxiv.org/abs/2411.13073 备注:Accepted at the Self-Supervised Learning Workshop and the Unifying Representations in Neural Models Workshop at NeurIPS 2024
[7] RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation[cs.CV] 标题:原始扩散:基于RGB的扩散模型,用于高保真RAW图像生成 作者:Christoph Reinders, Radu Berdan, Beril Besbinar, Junji Otsuka, Daisuke Iso 链接:http://arxiv.org/abs/2411.13150 备注:Accepted at WACV 2025
[8] VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation[cs.CV] 标题:变聚合多帧激光雷达3D物体检测 作者:Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki 链接:http://arxiv.org/abs/2411.13186 备注:Accepted by WACV 2025
[9] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation[cs.CV] 标题:XMask3D:开放式词汇量3D语义分割的跨模态掩码推理 作者:Ziyi Wang, Yanbo Wang, Xumin Yu, Jie Zhou, Jiwen Lu 链接:http://arxiv.org/abs/2411.13243 代码:https://github.com/wangzy22/XMask3D 备注:Accepted to NeurIPS 2024
[10] BelHouse3D: A Benchmark Dataset for Assessing Occlusion Robustness in 3D Point Cloud Semantic Segmentation[cs.CV] 标题:BelHouse3D:用于评估3D点云语义分割中遮挡鲁棒性的基准数据集 作者:Umamaheswaran Raman Kumar, Abdur Razzaq Fayjie, Jurgen Hannaert, Patrick Vandewalle 链接:http://arxiv.org/abs/2411.13251 备注:20 pages, 6 figures, 3 tables, accepted at ECCV 2024 Workshops
[11] Entropy Bootstrapping for Weakly Supervised Nuclei Detection[cs.CV] 标题:熵引导式弱监督核检测 作者:James Willoughby, Irina Voiculescu 链接:http://arxiv.org/abs/2411.13528 备注:Submitted for CVPR 2025
自然语言处理会议: 4篇
[0] Probing the Capacity of Language Model Agents to Operationalize Disparate Experiential Context Despite Distraction[cs.CL] 标题:探测语言模型代理在遭受干扰的情况下操作不同经验情境的能力 作者:Sonny George, Chris Sypherd, Dylan Cashman 链接:http://arxiv.org/abs/2411.12828 代码:https://github.com/sonnygeorge/OEDD 期刊:Findings Assoc. Comput. Linguistics: EMNLP 2024 15447-15459 (2024)
[1] SCOUT: A Situated and Multi-Modal Human-Robot Dialogue Corpus[cs.CL] 标题:Scout:情境化多模态人机对话语料库 作者:Stephanie M. Lukin, Claire Bonial, Matthew Marge, Taylor Hudson, Cory J. Hayes, Kimberly A. Pollard, Anthony Baker, Ashley N. Foots, Ron Artstein, Felix Gervits, Mitchell Abrams, Cassidy Henry, Lucia Donatelli, Anton Leuski, Susan G. Hill, David Traum, Clare R. Voss 链接:http://arxiv.org/abs/2411.12844 期刊:2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) https://aclanthology.org/2024.lrec-main.1259/ 备注:14 pages, 7 figures
[2] MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers[cs.CL] 标题:内存前驱:通过移除全连接层最小化Transformer计算 作者:Ning Ding, Yehui Tang, Haochen Qin, Zhenli Zhou, Chao Xu, Lin Li, Kai Han, Heng Liao, Yunhe Wang 链接:http://arxiv.org/abs/2411.12992
备注:NeurIPS2024
[3] AdaptAgent: Adapting Multimodal Web Agents with Few-Shot Learning from Human Demonstrations[cs.CL] 标题:适应代理:通过人类演示进行少样本学习以适应多模态网络代理 作者:Gaurav Verma, Rachneet Kaur, Nishan Srishankar, Zhen Zeng, Tucker Balch, Manuela Veloso 链接:http://arxiv.org/abs/2411.13451 备注:18 pages, 3 figures, an abridged version to appear in NeurIPS 2024 AFM Workshop