[0] Don't Look Twice: Faster Video Transformers with Run-Length Tokenization[cs.CV] 标题:不要两次看:带运行长度归一化的更快的视频转换器 作者:Rohan Choudhury, Guanglei Zhu, Sihan Liu, Koichiro Niinuma, Kris M. Kitani, László Jeni 链接:http://arxiv.org/abs/2411.05222 备注:16 pages, 6 figures. Accepted to NeurIPS 2024 (spotlight)
[1] Generalizable Single-Source Cross-modality Medical Image Segmentation via Invariant Causal Mechanisms[cs.CV] 标题:通用的单源跨模态医学图像分割:通过不变的因果机制 作者:Boqi Chen, Yuanzhi Zhu, Yunke Ao, Sebastiano Caprara, Reto Sutter, Gunnar Rätsch, Ender Konukoglu, Anna Susmelj 链接:http://arxiv.org/abs/2411.05223 代码:https://github.com/ratschlab/ICMSeg 备注:WACV 2025
[2] Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding[cs.CV] 标题:分层视觉特征聚合用于无OCR文档理解 作者:Jaeyoo Park, Jin Young Choi, Jeonghyung Park, Bohyung Han 链接:http://arxiv.org/abs/2411.05254 备注:NeurIPS 2024
[3] ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving[cs.CV] 标题:ZOPP:面向自动驾驶的零样本离线全景感知框架 作者:Tao Ma, Hongbin Zhou, Qiusheng Huang, Xuemeng Yang, Jianfei Guo, Bo Zhang, Min Dou, Yu Qiao, Botian Shi, Hongsheng Li 链接:http://arxiv.org/abs/2411.05311 备注:Accepted by NeurIPS 2024
[4] Rate-aware Compression for NeRF-based Volumetric Video[cs.CV] 标题:基于NeRF的体积视频的速率感知压缩 作者:Zhiyu Zhang, Guo Lu, Huanxiong Liang, Zhengxue Cheng, Anni Tang, Li Song 链接:http://arxiv.org/abs/2411.05322 备注:Accepted by ACM MM 2024 (Oral)
[5] Enhancing Visual Classification using Comparative Descriptors[cs.CV] 标题:基于比较描述符的视觉分类增强 作者:Hankyeol Lee, Gawon Seo, Wonseok Choi, Geunyoung Jung, Kyungwoo Song, Jiyoung Jung 链接:http://arxiv.org/abs/2411.05357 备注:Accepted to WACV 2025. Main paper with 8 pages
[6] From Transparent to Opaque: Rethinking Neural Implicit Surfaces with -NeuS[cs.CV] 标题:从透明到不透明:以 -NeuS 重新思考神经隐式表面 作者:Haoran Zhang, Junkai Deng, Xuhui Chen, Fei Hou, Wencheng Wang, Hong Qin, Chen Qian, Ying He 链接:http://arxiv.org/abs/2411.05362 代码:https://github.com/728388808/alpha-NeuS 期刊:NeurIPS 2024
[7] VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM[cs.CV] 标题:VISTA:基于大型语言模型(LLM)的定制数学问题生成的可视化集成系统 作者:Jeongwoo Lee, Kwangsuk Park, Jihyeon Park 链接:http://arxiv.org/abs/2411.05423 备注:Accepted at NeurIPS 2024 Workshop on Large Foundation Models for Educational Assessment (FM-Assess)
[8] Do Histopathological Foundation Models Eliminate Batch Effects? A Comparative Study[cs.CV] 标题:病理组织学基础模型是否消除了批处理效应?一项比较研究 作者:Jonah Kömen, Hannah Marienwald, Jonas Dippel, Julius Hense 链接:http://arxiv.org/abs/2411.05489 备注:Accepted to AIM-FM Workshop @ NeurIPS'24
[9] Open-set object detection: towards unified problem formulation and benchmarking[cs.CV] 标题:开放集目标检测:迈向统一问题表述和基准测试 作者:Hejer Ammar, Nikita Kiselov, Guillaume Lapouge, Romaric Audigier 链接:http://arxiv.org/abs/2411.05564 备注:Accepted at ECCV 2024 Workshop: "The 3rd Workshop for Out-of-Distribution Generalization in Computer Vision Foundation Models"
[10] SynDroneVision: A Synthetic Dataset for Image-Based Drone Detection[cs.CV] 标题:SynDroneVision:基于图像的无人机检测合成数据集 作者:Tamara R. Lenhard, Andreas Weinmann, Kai Franke, Tobias Koch 链接:http://arxiv.org/abs/2411.05633 备注:Accepted at the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
[12] Tell What You Hear From What You See -- Video to Audio Generation Through Text[cs.CV] 标题:从你所见听你所言——通过文本的视频音频生成 作者:Xiulong Liu, Kun Su, Eli Shlizerman 链接:http://arxiv.org/abs/2411.05679 备注:NeurIPS 2024
[13] Autoregressive Adaptive Hypergraph Transformer for Skeleton-based Activity Recognition[cs.CV] 标题:自回归自适应超图变换器在骨骼动作识别中的应用 作者:Abhisek Ray, Ayush Raj, Maheshkumar H. Kolekar 链接:http://arxiv.org/abs/2411.05692 备注:Accepted to WACV 2025
[14] GazeSearch: Radiology Findings Search Benchmark[cs.CV] 标题:注视搜索:放射学检查发现搜索基准 作者:Trong Thang Pham, Tien-Phat Nguyen, Yuki Ikebe, Akash Awasthi, Zhigang Deng, Carol C. Wu, Hien Nguyen, Ngan Le 链接:http://arxiv.org/abs/2411.05780 备注:Aceepted WACV 2025
自然语言处理会议: 15篇
[0] Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale[cs.CL] 标题:性能引导的LLM知识蒸馏,实现大规模文本分类的高效能 作者:Flavio Di Palo, Prateek Singhi, Bilal Fadlallah 链接:http://arxiv.org/abs/2411.05045 备注:Published in EMNLP 2024
[1] FMEA Builder: Expert Guided Text Generation for Equipment Maintenance[cs.CL] 标题:FMEA构建器:设备维护的专家引导文本生成 作者:Karol Lynch, Fabio Lorenzi, John Sheehan, Duygu Kabakci-Zorlu, Bradley Eck 链接:http://arxiv.org/abs/2411.05054 备注:4 pages, 2 figures. AI for Critical Infrastructure Workshop @ IJCAI 2024
[2] Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model[cs.CV] 标题:精确度还是召回率?图像标题在训练文本到图像生成模型中的应用分析 作者:Sheng Cheng, Maitreya Patel, Yezhou Yang 链接:http://arxiv.org/abs/2411.05079 代码:https://github.com/shengcheng/Captions4T2I 备注:EMNLP 2024 Findings. Code: this https URL
[3] Beyond the Numbers: Transparency in Relation Extraction Benchmark Creation and Leaderboards[cs.CL] 标题:超越数字:关系抽取基准创建和排行榜的透明度 作者:Varvara Arzt, Allan Hanbury 链接:http://arxiv.org/abs/2411.05224 备注:This paper was accepted at the GenBench workshop at EMNLP2024
[4] CHATTER: A Character Attribution Dataset for Narrative Understanding[cs.CL] 标题:CHATTER:一个用于叙事理解的中文角色归因数据集 作者:Sabyasachee Baruah, Shrikanth Narayanan 链接:http://arxiv.org/abs/2411.05227 备注:submitted to NAACL 2025
[5] Revisiting the Robustness of Watermarking to Paraphrasing Attacks[cs.CL] 标题:重新审视水印抵抗释义攻击的能力 作者:Saksham Rastogi, Danish Pruthi 链接:http://arxiv.org/abs/2411.05277 备注:EMNLP 2024
[7] SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers[cs.CL] 标题:SciDQA:一篇关于科学论文的深度阅读理解数据集 作者:Shruti Singh, Nandan Sarkar, Arman Cohan 链接:http://arxiv.org/abs/2411.05338 备注:18 pages, Accepted to EMNLP 2024
[8] Towards Low-Resource Harmful Meme Detection with LMM Agents[cs.CL] 标题:关于使用低资源多模态代理的低资源有害迷因检测 作者:Jianzhao Huang, Hongzhan Lin, Ziyan Liu, Ziyang Luo, Guang Chen, Jing Ma 链接:http://arxiv.org/abs/2411.05383 备注:EMNLP 2024
[9] VISTA: Visual Integrated System for Tailored Automation in Math Problem Generation Using LLM[cs.CV] 标题:VISTA:基于大型语言模型(LLM)的定制数学问题生成的可视化集成系统 作者:Jeongwoo Lee, Kwangsuk Park, Jihyeon Park 链接:http://arxiv.org/abs/2411.05423 备注:Accepted at NeurIPS 2024 Workshop on Large Foundation Models for Educational Assessment (FM-Assess)
[10] Multi-hop Evidence Pursuit Meets the Web: Team Papelo at FEVER 2024[cs.CL] 标题:多跳证据追寻遇见网络:Papelo队在FEVER 2024 作者:Christopher Malon 链接:http://arxiv.org/abs/2411.05762 备注:To appear in the Seventh FEVER Workshop at EMNLP 2024
[12] Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?[cs.CL] 标题:事实还是虚构?大型语言模型是否能可靠地标注政治真相?
作者:Veronica Chatrath, Marcelo Lotif, Shaina Raza 链接:http://arxiv.org/abs/2411.05775 备注:Accepted at Socially Responsible Language Modelling Research (SoLaR) Workshop at NeurIPS 2024
[13] Using Language Models to Disambiguate Lexical Choices in Translation[cs.CL] 标题:利用语言模型在翻译中消解词汇选择歧义 作者:Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr 链接:http://arxiv.org/abs/2411.05781 备注:Accepted to EMNLP 2024
[14] ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles[cs.CV] 标题:美国手语STEM百科:解读STEM文章的数据集和基准 作者:Kayo Yin, Chinmay Singh, Fyodor O. Minakov, Vanessa Milan, Hal Daumé III, Cyril Zhang, Alex X. Lu, Danielle Bragg 链接:http://arxiv.org/abs/2411.05783 备注:Accepted to EMNLP 2024