社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  aigc

CV&AIGC顶会整理 [2024-11-06]

晓飞的算法工程笔记 • 4 月前 • 181 次点击  

今日更新22篇:

  • 计算机视觉会议 15篇
  • 自然语言处理会议 7篇
请注意,大模型的论文多发布于自然语言处理会议中。而由于多模态的发展迅速,部分计算机视觉相关的论文也会发布在自然语言处理顶会中。

计算机视觉会议: 15篇


[0] INQUIRE: A Natural World Text-to-Image Retrieval Benchmark[cs.CV]
标题:自然世界文本到图像检索基准:INQUIRE
作者:Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn
链接:http://arxiv.org/abs/2411.02537
代码:https://inquire-benchmark.github.io
备注:Published in NeurIPS 2024, Datasets and Benchmarks Track

[1] TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives[cs.CV]
标题:三联CLIP:通过合成视觉-语言负例提升CLIP的组合推理能力
作者:Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang
链接:http://arxiv.org/abs/2411.02545
代码:https://tripletclip.github.io
备注:Accepted at: NeurIPS 2024 | Project Page: this https URL

[2] ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy[cs.CV]
标题:ViTally Consistent:规模化扩展细胞显微摄影中的生物表示学习
作者:Kian Kenyon-Dean, Zitong Jerry Wang, John Urbanik, Konstantin Donhauser, Jason Hartford, Saber Saberian, Nil Sahin, Ihab Bendidi, Safiye Celik, Marta Fay, Juan Sebastian Rodriguez Vera, Imran S Haque, Oren Kraus
链接:http://arxiv.org/abs/2411.02572
备注:NeurIPS 2024 Foundation Models for Science Workshop (38th Conference on Neural Information Processing Systems). 18 pages, 7 figures

[3] Divergent Domains, Convergent Grading: Enhancing Generalization in Diabetic Retinopathy Grading[cs.CV]
标题:异源域,同向评分:提升糖尿病视网膜病变分级中的泛化能力
作者:Sharon Chokuwa, Muhammad Haris Khan
链接:http://arxiv.org/abs/2411.02614
代码:https://github.com/sharonchokuwa/dg-adr
备注:Accepted at WACV 2025

[4] Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning[cs.CV]
标题:朝向通过层次表示学习实现鲁棒的半结构化多模态情感分析
作者:Mingcheng Li, Dingkang Yang, Yang Liu, Shunli Wang, Jiawei Chen, Shuaibing Wang, Jinjie Wei, Yue Jiang, Qingyao Xu, Xiaolu Hou, Mingyang Sun, Ziyun Qian, Dongliang Kou, Lihua Zhang
链接:http://arxiv.org/abs/2411.02793
备注:Accepted by NeurIPS 2024

[5] Test-Time Dynamic Image Fusion[cs.CV]
标题:测试时动态图像融合
作者:Bing Cao, Yinan Xia, Yi Ding, Changqing Zhang, Qinghua Hu
链接:http://arxiv.org/abs/2411.02840
代码:https://github.com/Yinan-Xia/TTD
备注:Accepted by NeurIPS 2024

[6] OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing[cs.CV]
标题:OLAF:一种用于增强多目标多部件场景解析的即插即用框架
作者:Pranav Gupta, Rishubh Singh, Pradeep Shenoy, Ravikiran Sarvadevabhatla
链接:http://arxiv.org/abs/2411.02858
代码:http://olafseg.github.io
备注:Accepted in The European Conference on Computer Vision (ECCV) 2024

[7] Continual Audio-Visual Sound Separation[cs.CV]
标题:持续音视频分离
作者:Weiguo Pian, Yiyang Nan, Shijian Deng, Shentong Mo, Yunhui Guo, Yapeng Tian
链接:http://arxiv.org/abs/2411.02860
代码:https://github.com/weiguoPian/ContAV-Sep_NeurIPS2024
备注:NeurIPS 2024

[8] Membership Inference Attacks against Large Vision-Language Models[cs.CV]
标题:大型视觉语言模型的成员身份推理攻击
作者:Zhan Li, Yongtao Wu, Yihang Chen, Francesco Tonin, Elias Abad Rocamora, Volkan Cevher
链接:http://arxiv.org/abs/2411.02902
代码:https://github.com/LIONS-EPFL/VL-MIA
备注:NeurIPS 2024

[9] CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection[cs.CV]
标题:CRT-Fusion:结合运动信息的摄像头、雷达及时间融合的三维目标检测
作者:Jisong Kim, Minjae Seong, Jun Won Choi
链接:http://arxiv.org/abs/2411.03013
备注:Accepted at NeurIPS2024

[10] Rethinking Decoders for Transformer-based Semantic Segmentation: Compression is All You Need[cs.CV]
标题:重新思考基于Transformer的语义分割解码器:压缩即是全部所需
作者:Qishuai Wen, Chun-Guang Li
链接:http://arxiv.org/abs/2411.03033
代码:https://github.com/QishuaiWen/DEPICT/
备注:NeurIPS2024. Code:this https URL

[11] Pre-trained Visual Dynamics Representations for Efficient Policy Learning[cs.CV]
标题:预训练的视觉动态表征以提高政策学习效率
作者:Hao Luo, Bohan Zhou, Zongqing Lu
链接:http://arxiv.org/abs/2411.03169
备注:ECCV 2024

[12] On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models[cs.CV]
标题:关于扩散模型改进条件机制与预训练策略
作者:Tariq Berrada Ifriqi, Pietro Astolfi, Melissa Hall, Reyhane Askari-Hemmat, Yohann Benchetrit, Marton Havasi, Matthew Muckley, Karteek Alahari, Adriana Romero-Soriano, Jakob Verbeek, Michal Drozdzal
链接:http://arxiv.org/abs/2411.03177
备注:Accepted as a conference paper (poster) for NeurIPS 2024

[13] Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution[cs.CV]
标题:解耦细部细节与全局几何的压缩深度图超分辨率
作者:Huan Zheng, Wencheng Han, Jianbing Shen
链接:http://arxiv.org/abs/2411.03239
备注:The 1st solution for the ECCV 2024 AIM Compressed Depth Upsampling Challenge

[14] Classification Done Right for Vision-Language Pre-Training[cs.CV]
标题:正确进行视觉-语言预训练的分类
作者:Huang Zilong, Ye Qinghao, Kang Bingyi, Feng Jiashi, Fan Haoqi
链接:http://arxiv.org/abs/2411.03313
代码:https://github.com/x-cls/superclass
备注:Accepted by NeurIPS 2024

自然语言处理会议: 7篇


[0] INQUIRE: A Natural World Text-to-Image Retrieval Benchmark[cs.CV]
标题:自然世界文本到图像检索基准:INQUIRE
作者:Edward Vendrow, Omiros Pantazis, Alexander Shepard, Gabriel Brostow, Kate E. Jones, Oisin Mac Aodha, Sara Beery, Grant Van Horn
链接:http://arxiv.org/abs/2411.02537
代码:https://inquire-benchmark.github.io
备注:Published in NeurIPS 2024, Datasets and Benchmarks Track

[1] TripletCLIP: Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives[cs.CV]
标题:三联CLIP:通过合成视觉-语言负例提升CLIP的组合推理能力
作者:Maitreya Patel, Abhiram Kusumba, Sheng Cheng, Changhoon Kim, Tejas Gokhale, Chitta Baral, Yezhou Yang
链接:http://arxiv.org/abs/2411.02545
代码:https://tripletclip.github.io
备注:Accepted at: NeurIPS 2024 | Project Page: this https URL

[2] Extracting Unlearned Information from LLMs with Activation Steering[cs.CL]
标题:从大型语言模型中提取未学习信息的激活引导
作者:Atakan Seyitoğlu, Aleksei Kuvshinov, Leo Schwinn, Stephan Günnemann
链接:http://arxiv.org/abs/2411.02631
备注:Accepted at NeurIPS 2024 Workshop Safe Generative AI

[3] Multimodal Commonsense Knowledge Distillation for Visual Question Answering[cs.CL]
标题:多模态常识知识蒸馏用于视觉问答
作者:Shuo Yang, Siwen Luo, Soyeon Caren Han
链接:http://arxiv.org/abs/2411.02722
备注:AAAI 2025 (Accepted, Oral)

[4] Toward Robust Incomplete Multimodal Sentiment Analysis via Hierarchical Representation Learning[cs.CV]
标题:朝向通过层次表示学习实现鲁棒的半结构化多模态情感分析
作者:Mingcheng Li, Dingkang Yang, Yang Liu, Shunli Wang, Jiawei Chen, Shuaibing Wang, Jinjie Wei, Yue Jiang, Qingyao Xu, Xiaolu Hou, Mingyang Sun, Ziyun Qian, Dongliang Kou, Lihua Zhang
链接:http://arxiv.org/abs/2411.02793
备注:Accepted by NeurIPS 2024

[5] Membership Inference Attacks against Large Vision-Language Models[cs.CV]
标题:大型视觉语言模型的成员身份推理攻击
作者:Zhan Li, Yongtao Wu, Yihang Chen, Francesco Tonin, Elias Abad Rocamora, Volkan Cevher
链接:http://arxiv.org/abs/2411.02902
代码:https://github.com/LIONS-EPFL/VL-MIA
备注:NeurIPS 2024

[6] Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning[cs.CL]
标题:预测器-校正器增强的具有指数移动平均系数学习的Transformer
作者:Bei Li, Tong Zheng, Rui Wang, Jiahao Liu, Qingyan Guo, Junliang Guo, Xu Tan, Tong Xiao, Jingbo Zhu, Jingang Wang, Xunliang Cai
链接:http://arxiv.org/abs/2411.03042
备注:Accepted by NeurIPS 2024

感谢arxiv.org


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/175699
 
181 次点击