[0] INQUIRE: A Natural World Text-to-Image Retrieval Benchmark[cs.CV] 标题:自然世界文本到图像检索基准:INQUIRE 作者:Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn 链接:http://arxiv.org/abs/2411.02537 代码:https://inquire-benchmark.github.io 备注:Published in NeurIPS 2024, Datasets and Benchmarks Track
[1] TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives[cs.CV] 标题:三联CLIP:通过合成视觉-语言负例提升CLIP的组合推理能力 作者:Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang 链接:http://arxiv.org/abs/2411.02545 代码:https://tripletclip.github.io 备注:Accepted at: NeurIPS 2024 | Project Page: this https URL
[2] ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy[cs.CV] 标题:ViTally Consistent:规模化扩展细胞显微摄影中的生物表示学习 作者:Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Marta Fay, Juan Sebastian Rodriguez Vera, Imran S Haque, Oren Kraus 链接:http://arxiv.org/abs/2411.02572 备注:NeurIPS 2024 Foundation Models for Science Workshop (38th Conference on Neural Information Processing Systems). 18 pages, 7 figures
[3] Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading[cs.CV] 标题:异源域,同向评分:提升糖尿病视网膜病变分级中的泛化能力 作者:Sharon Chokuwa, Muhammad Haris Khan 链接:http://arxiv.org/abs/2411.02614 代码:https://github.com/sharonchokuwa/dg-adr 备注:Accepted at WACV 2025
[5] Test-Time Dynamic Image Fusion[cs.CV] 标题:测试时动态图像融合 作者:Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu 链接:http://arxiv.org/abs/2411.02840 代码:https://github.com/Yinan-Xia/TTD 备注:Accepted by NeurIPS 2024
[6] OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing[cs.CV] 标题:OLAF:一种用于增强多目标多部件场景解析的即插即用框架 作者:Pranav Gupta, Rishubh Singh, Pradeep Shenoy, Ravikiran Sarvadevabhatla 链接:http://arxiv.org/abs/2411.02858 代码:http://olafseg.github.io 备注:Accepted in The European Conference on Computer Vision (ECCV) 2024
[8] Membership Inference Attacks against Large Vision-Language Models[cs.CV] 标题:大型视觉语言模型的成员身份推理攻击 作者:Zhan Li, Yongtao Wu, Yihang Chen, Francesco Tonin, Elias Abad Rocamora, Volkan Cevher 链接:http://arxiv.org/abs/2411.02902 代码:https://github.com/LIONS-EPFL/VL-MIA 备注:NeurIPS 2024
[9] CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection[cs.CV] 标题:CRT-Fusion:结合运动信息的摄像头、雷达及时间融合的三维目标检测 作者:Jisong Kim, Minjae Seong, Jun Won Choi 链接:http://arxiv.org/abs/2411.03013 备注:Accepted at NeurIPS2024
[10] Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need[cs.CV] 标题:重新思考基于Transformer的语义分割解码器:压缩即是全部所需 作者:Qishuai Wen, Chun-Guang Li 链接:http://arxiv.org/abs/2411.03033 代码:https://github.com/QishuaiWen/DEPICT/ 备注:NeurIPS2024. Code:this https URL
[11] Pre-trained Visual Dynamics Representations for Efficient Policy Learning[cs.CV] 标题:预训练的视觉动态表征以提高政策学习效率 作者:Hao Luo, Bohan Zhou, Zongqing Lu 链接:http://arxiv.org/abs/2411.03169 备注:ECCV 2024
[12] On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models[cs.CV] 标题:关于扩散模型改进条件机制与预训练策略 作者:Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane Askari-Hemmat, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal 链接:http://arxiv.org/abs/2411.03177 备注:Accepted as a conference paper (poster) for NeurIPS 2024
[13] Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution[cs.CV] 标题:解耦细部细节与全局几何的压缩深度图超分辨率 作者:Huan Zheng, Wencheng Han, Jianbing Shen 链接:http://arxiv.org/abs/2411.03239 备注:The 1st solution for the ECCV 2024 AIM Compressed Depth Upsampling Challenge
[14] Classification Done Right for Vision-Language Pre-Training[cs.CV] 标题:正确进行视觉-语言预训练的分类 作者:Huang Zilong, Ye Qinghao, Kang Bingyi, Feng Jiashi, Fan Haoqi 链接:http://arxiv.org/abs/2411.03313 代码:https://github.com/x-cls/superclass 备注:Accepted by NeurIPS 2024
自然语言处理会议: 7篇
[0] INQUIRE: A Natural World Text-to-Image Retrieval Benchmark[cs.CV] 标题:自然世界文本到图像检索基准:INQUIRE 作者:Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn 链接:http://arxiv.org/abs/2411.02537 代码:https://inquire-benchmark.github.io 备注:Published in NeurIPS 2024, Datasets and Benchmarks Track
[1] TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives[cs.CV] 标题:三联CLIP:通过合成视觉-语言负例提升CLIP的组合推理能力 作者:Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang 链接:http://arxiv.org/abs/2411.02545 代码:https://tripletclip.github.io 备注:Accepted at: NeurIPS 2024 | Project Page: this https URL
[2] Extracting Unlearned Information from LLMs with Activation Steering[cs.CL] 标题:从大型语言模型中提取未学习信息的激活引导 作者:Atakan Seyitoğlu, Aleksei Kuvshinov, Leo Schwinn, Stephan Günnemann 链接:http://arxiv.org/abs/2411.02631 备注:Accepted at NeurIPS 2024 Workshop Safe Generative AI