社区所有版块导航
Python
python开源   Django   Python   DjangoApp   pycharm  
DATA
docker   Elasticsearch  
aigc
aigc   chatgpt  
WEB开发
linux   MongoDB   Redis   DATABASE   NGINX   其他Web框架   web工具   zookeeper   tornado   NoSql   Bootstrap   js   peewee   Git   bottle   IE   MQ   Jquery  
机器学习
机器学习算法  
Python88.com
反馈   公告   社区推广  
产品
短视频  
印度
印度  
Py学习  »  aigc

CV&AIGC顶会整理 [2024-12-19]

晓飞的算法工程笔记 • 3 月前 • 142 次点击  

今日更新44篇:

  • 计算机视觉会议 25篇
  • 自然语言处理会议 19篇
请注意,大模型的论文多发布于自然语言处理会议中。而由于多模态的发展迅速,部分计算机视觉相关的论文也会发布在自然语言处理顶会中。

计算机视觉会议: 25篇


[0] Targeted View-Invariant Adversarial Perturbations for 3D Object Recognition[cs.CV]
标题:3D目标识别中的定向视不变对抗扰动
作者:Christian Green, Mehmet Ergezer, Abdurrahman Zeybey
链接:http://arxiv.org/abs/2412.13376
备注:Accepted to AAAI-25 Workshop on Artificial Intelligence for Cyber Security (AICS): this http URL

[1] FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding[cs.CV]
标题:FlashVTG:视频时序定位的特征层叠与自适应分数处理网络
作者:Zhuo Cao, Bingqing Zhang, Heming Du, Xin Yu, Xue Li, Sen Wang
链接:http://arxiv.org/abs/2412.13441
代码:https://github.com/Zhuo-Cao/FlashVTG
备注:Accepted to WACV 2025

[2] ConDo: Continual Domain Expansion for Absolute Pose Regression[cs.CV]
标题:持续领域扩展用于绝对位姿回归
作者:Zijun Li, Zhipeng Cai, Bochun Yang, Xuelun Shen, Siqi Shen, Xiaoliang Fan, Michael Paulitsch, Cheng Wang
链接:http://arxiv.org/abs/2412.13452
备注:AAAI2025

[3] Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation[cs.CV]
标题:预先训练一个密度感知的姿态转换器以实现鲁棒的基于激光雷达的3D人体姿态估计
作者:Xiaoqi An, Lin Zhao, Chen Gong, Jun Li, Jian Yang
链接:http://arxiv.org/abs/2412.13454
备注:Accepted to AAAI 2025

[4] Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection[cs.CV]
标题:探索内在以发现更多:3D异常检测的内部空间模态感知
作者:Hanzhe Liang, Guoyang Xie, Chengbin Hou, Bingshu Wang, Can Gao, Jinbao Wang
链接:http://arxiv.org/abs/2412.13461
备注:AAAI2025 Accepted

[5] FlexPose: Pose Distribution Adaptation with Limited Guidance[cs.CV]
标题:FlexPose:有限指导下的姿态分布适应
作者:Zixiao Wang, Junwu Weng, Mengyuan Liu, Bei Yu
链接:http://arxiv.org/abs/2412.13463
备注:Accepted by AAAI25, 12 pages, 10 figures

[6] Enabling Region-Specific Control via Lassos in Point-Based Colorization[cs.CV]
标题:通过拉索实现基于点的颜色着色区域特定控制的启用
作者:Sanghyeon Lee, Jooyeol Yun, Jaegul Choo
链接:http://arxiv.org/abs/2412.13469
备注:Accepted to AAAI2025

[7] QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images[cs.CV]
标题:查询CDR:基于查询的可控畸变校正网络,用于鱼眼图像
作者:Pengbo Guo, Chengxu Liu, Xingsong Hou, Xueming Qian
链接:http://arxiv.org/abs/2412.13496
备注:ECCV2024

[8] Plug-and-Play Tri-Branch Invertible Block for Image Rescaling[cs.CV]
标题:通用三分支可逆块用于图像缩放
作者:Jingwei Bao, Jinhua Hao, Pengcheng Xu, Ming Sun, Chao Zhou, Shuyuan Zhu
链接:http://arxiv.org/abs/2412.13508
代码:https://github.com/Jingwei-Bao/T-InvBlocks
备注:Accepted by AAAI 2025. Code is available at this https URL

[9] Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval[cs.CV]
标题:动态语义解耦型跨语言跨模态检索适配器
作者:Rui Cai, Zhiyu Dong, Jianfeng Dong, Xun Wang
链接:http://arxiv.org/abs/2412.13510
备注:Accepted by the 39th AAAI Conference on Artificial Intelligence (AAAI-25)

[10] Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning[cs.CV]
标题:基于查询中心的音频视觉认知网络用于瞬间检索、分割和步句标注
作者:Yunbin Tu, Liang Li, Li Su, Qingming Huang
链接:http://arxiv.org/abs/2412.13543
备注:Accepted by AAAI 2025

[11] Multi-View Pedestrian Occupancy Prediction with a Novel Synthetic Dataset[cs.CV]
标题:多视角新型合成数据集下的行人占用预测
作者:Sithu Aung, Min-Cheol Sagong, Junghyun Cho
链接:http://arxiv.org/abs/2412.13569
备注:AAAI 2025

[12] Bridge then Begin Anew: Generating Target-relevant Intermediate Model for Source-free Visual Emotion Adaptation[cs.CV]
标题:桥梁再启程:生成针对源图像情绪自适应的基于目标图像的中介模型
作者:Jiankun Zhu, Sicheng Zhao, Jing Jiang, Wenbo Tang, Zhaopan Xu, Tingting Han, Pengfei Xu, Hongxun Yao
链接:http://arxiv.org/abs/2412.13577
备注:Accepted by AAAI2025

[13] Generalizable Sensor-Based Activity Recognition via Categorical Concept Invariant Learning[cs.CV]
标题:通用型的基于传感器的活动识别:通过类别概念不变学习
作者:Di Xiong, Shuoyuan Wang, Lei Zhang, Wenbo Huang, Chaolei Han
链接:http://arxiv.org/abs/2412.13594
备注:Accepted by AAAI 2025

[14] Robust Tracking via Mamba-based Context-aware Token Learning[cs.CV]
标题:基于Mamba的上下文感知标记学习实现稳健跟踪
作者:Jinxia Xie, Bineng Zhong, Qihua Liang, Ning Li, Zhiyi Mo, Shuxiang Song
链接:http://arxiv.org/abs/2412.13611
代码:https://github.com/GXNU-ZhongLab/TemTrack
备注:AAAI2025

[15] Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking[cs.CV]
标题:反向区域到实体注释用于像素级视觉实体链接
作者:Zhengfei Xu, Sijia Zhao, Yanchao Hao, Xiaolong Liu, Lili Li, Yuyang Yin, Bo Li, Xi Chen, Xin Xin
链接:http://arxiv.org/abs/2412.13614
备注:AAAI 2025;Dataset are released at this https URL

[16] Consistency of Compositional Generalization across Multiple Levels[cs.CV]
标题:组成泛化多层次的连贯性
作者:Chuanhao Li, Zhen Li, Chenchen Jing, Xiaomeng Fan, Wenbo Ye, Yuwei Wu, Yunde Jia
链接:http://arxiv.org/abs/2412.13636
备注:Accepted by AAAI 2025

[17] VIIS: Visible and Infrared Information Synthesis for Severe Low-light Image Enhancement[cs.CV]
标题:VIIS:可见光与红外信息融合用于改善严重低光照图像
作者:Chen Zhao, Mengyuan Yu, Fan Yang, Peiguang Jing
链接:http://arxiv.org/abs/2412.13655
代码:https://github.com/Chenz418/VIIS
备注:Accepted to WACV 2025

[18] JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts[cs.CV]
标题:JoVALE:利用视听和语言上下文在视频中检测人类动作
作者:Taein Son, Soo Won Seo, Jisong Kim, Seok Hwan Lee, Jun Won Choi
链接:http://arxiv.org/abs/2412.13708
代码:https://github.com/taeiin/AAAI2025-JoVALE
备注:Accepted to AAAI Conference on Artificial Intelligence 2025, 9 pages, 5 figures

[19] Physics-Based Adversarial Attack on Near-Infrared Human Detector for Nighttime Surveillance Camera Systems[cs.CV]
标题:基于物理的近红外人体检测器针对夜间监控摄像头系统的对抗攻击
作者:Muyao Niu, Zhuoxiao Li, Yifan Zhan, Huy H. Nguyen, Isao Echizen, Yinqiang Zheng
链接:http://arxiv.org/abs/2412.13709
代码:https://github.com/MyNiuuu/AdvNIR
备注:Appeared in ACM MM 2023

[20] Mesoscopic Insights: Orchestrating Multi-scale & Hybrid Architecture for Image Manipulation Localization[cs.CV]
标题:微观洞察:为图像操纵定位构建跨尺度与混合架构
作者:Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, Jizhe Zhou
链接:http://arxiv.org/abs/2412.13753
代码:https://github.com/scu-zjz/Mesorch
备注:AAAI 2025. Code: 

[21] On Explaining Knowledge Distillation: Measuring and Visualising the Knowledge Transfer Process[cs.CV]
标题:关于解释知识蒸馏:测量和可视化知识迁移过程
作者:Gereziher Adhane, Mohammad Mahdi Dehshibi, Dennis Vetter, David Masip, Gemma Roig
链接:http://arxiv.org/abs/2412.13943
备注:Accepted to 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV'25). Includes 5 pages of supplementary material

[22] GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians[cs.CV]
标题:图肖像:基于图神经网络生成的3D高斯分布的紧凑型头部肖像
作者:Xiaobao Wei, Peng Chen, Ming Lu, Hui Chen, Feng Tian
链接:http://arxiv.org/abs/2412.13983
代码:https://github.com/ucwxb/GraphAvatar
备注:accepted by AAAI2025

[23] Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts[cs.CV]
标题:自适应概念瓶颈在分布变化下的基础模型
作者:Jihye Choi, Jayaram Raghuram, Yixuan Li, Somesh Jha
链接:http://arxiv.org/abs/2412.14097
备注:The preliminary version of the work appeared in the ICML 2024 Workshop on Foundation Models in the Wild

[24] GaraMoSt: Parallel Multi-Granularity Motion and Structural Modeling for Efficient Multi-Frame Interpolation in DSA Images[cs.CV]
标题:GaraMoSt:DSA图像中高效多帧插值的多粒度运动和结构建模的并行处理
作者:Ziyang Xu, Huangxuan Zhao, Wenyu Liu, Xinggang Wang
链接:http://arxiv.org/abs/2412.14118
代码:https://github.com/ZyoungXu/GaraMoSt
备注:Accepted by AAAI2025

自然语言处理会议: 19篇


[0] Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation[cs.CL]
标题:扩展LLMs到新语言:拉马和波斯语适应案例研究
作者:Samin Mahdizadeh Sani, Pouya Sadeghi, Thuy-Trang Vu, Yadollah Yaghoobzadeh, Gholamreza Haffari
链接:http://arxiv.org/abs/2412.13375
备注:accepted at COLING 2025

[1] An Automated Explainable Educational Assessment System Built on LLMs[cs.CL]
标题:基于大语言模型的自动化可解释教育评估系统
作者:Jiazheng Li, Artem Bobrov, David West, Cesare Aloisi, Yulan He
链接:http://arxiv.org/abs/2412.13381
备注:Accepted to AAAI 2025

[2] Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom Teaching Discourse[cs.CL]
标题:通过课堂教学话语提升数学辅导中的谈话行为分析
作者:Jie Cao, Abhijit Suresh, Jennifer Jacobs, Charis Clevenger, Amanda Howard, Chelsea Brown, Brent Milne, Tom Fischaber, Tamara Sumner, James H. Martin
链接:http://arxiv.org/abs/2412.13395
备注:Accepted to COLING'2025

[3] VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction[cs.CL]
标题:VaeDiff-DocRE:文档级关系抽取的端到端数据增强框架
作者:Khai Phan Tran, Wen Hua, Xue Li
链接:http://arxiv.org/abs/2412.13503
备注:COLING 2025

[4] Dynamic Adapter with Semantics Disentangling for Cross-lingual Cross-modal Retrieval[cs.CV]
标题:动态语义解耦型跨语言跨模态检索适配器
作者:Rui Cai, Zhiyu Dong, Jianfeng Dong, Xun Wang
链接:http://arxiv.org/abs/2412.13510
备注:Accepted by the 39th AAAI Conference on Artificial Intelligence (AAAI-25)

[5] CEHA: A Dataset of Conflict Events in the Horn of Africa[cs.CL]
标题:东非角冲突事件数据集
作者:Rui Bai, Di Lu, Shihao Ran, Elizabeth Olson, Hemank Lamba, Aoife Cahill, Joel Tetreault, Alex Jaimes
链接:http://arxiv.org/abs/2412.13511
备注:Accepted by COLING 2025

[6] Information-Theoretic Generative Clustering of Documents[cs.CL]
标题:信息论文档生成聚类
作者:Xin Du, Kumiko Tanaka-Ishii
链接:http://arxiv.org/abs/2412.13534
备注:Accepted to AAAI 2025

[7] Multi-Granularity Open Intent Classification via Adaptive Granular-Ball Decision Boundary[cs.CL]
标题:多粒度自适应粒度球决策边界开放意图分类
作者:Yanhua Li, Xiaocao Ouyang, Chaofan Pan, Jie Zhang, Sen Zhao, Shuyin Xia, Xin Yang, Guoyin Wang, Tianrui Li
链接:http://arxiv.org/abs/2412.13542
期刊:AAAI2025
备注:This paper has been Accepted on AAAI2025

[8] Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning[cs.CV]
标题:基于查询中心的音频视觉认知网络用于瞬间检索、分割和步句标注
作者:Yunbin Tu, Liang Li, Li Su, Qingming Huang
链接:http://arxiv.org/abs/2412.13543
备注:Accepted by AAAI 2025

[9] Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation[cs.CL]
标题:基于LLM的中文内容审查的社会文化感知评价框架
作者:Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat
链接:http://arxiv.org/abs/2412.13578
备注:Accepted in SUMEval Workshop in COLING 2025

[10] Reverse Region-to-Entity Annotation for Pixel-Level Visual Entity Linking[cs.CV]
标题:反向区域到实体注释用于像素级视觉实体链接
作者:Zhengfei Xu, Sijia Zhao, Yanchao Hao, Xiaolong Liu, Lili Li, Yuyang Yin, Bo Li, Xi Chen, Xin Xin
链接:http://arxiv.org/abs/2412.13614
备注:AAAI 2025;Dataset are released at this https URL

[11] Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization[cs.CL]
标题:语义收敛:通过两阶段对齐和行为语义标记实现推荐系统的和谐统一
作者:Guanghan Li, Xun Zhang, Yufei Zhang, Yifan Yin, Guojun Yin, Wei Lin
链接:http://arxiv.org/abs/2412.13771
备注:7 pages, 3 figures, AAAI 2025

[12] Knowledge Editing with Dynamic Knowledge Graphs for Multi-hop Question Answering[cs.CL]
标题:基于动态知识图谱的多跳问答中的知识编辑
作者:Yifan Lu, Yigeng Zhou, Jing Li, Yequan Wang, Xuebo Liu, Daojing He, Fangming Liu, Min Zhang
链接:http://arxiv.org/abs/2412.13782
备注:AAAI 2025

[13] Physics Reasoner: Knowledge-Augmented Reasoning for Solving Physics Problems with Large Language Models[cs.CL]
标题:物理学推理器:用于解决物理问题的基于大型语言模型的知识增强推理
作者:Xinyu Pang, Ruixin Hong, Zhanke Zhou, Fangrui Lv, Xinwei Yang, Zhilong Liang, Bo Han, Changshui Zhang
链接:http://arxiv.org/abs/2412.13791
备注:COLING 2025

[14] Enhancing Rhetorical Figure Annotation: An Ontology-Based Web Application with RAG Integration[cs.CL]
标题:增强修辞手法标注:一个基于本体论的具有RAG集成的网络应用程序
作者:Ramona Kühn, Jelena Mitrović, Michael Granitzer
链接:http://arxiv.org/abs/2412.13799
备注:The 31st International Conference on Computational Linguistics (COLING 2025)

[15] What makes a good metric? Evaluating automatic metrics for text-to-image consistency[cs.CL]
标题:评价文本到图像一致性自动指标的标准是什么?
作者:Candace Ross, Melissa Hall, Adriana Romero Soriano, Adina Williams
链接:http://arxiv.org/abs/2412.13989
备注:Accepted and presented at COLM 2024

[16] FarExStance: Explainable Stance Detection for Farsi[cs.CL]
标题:远态立场:针对波斯语的解释性立场检测
作者:Majid Zarharan, Maryam Hashemi, Malika Behroozrazegh, Sauleh Eetemadi, Mohammad Taher Pilehvar, Jennifer Foster
链接:http://arxiv.org/abs/2412.14008
备注:Accepted in COLING 2025

[17] Hansel: Output Length Controlling Framework for Large Language Models[cs.CL]
标题:汉塞尔:大型语言模型输出长度控制框架
作者:Seoha Song, Junhyun Lee, Hyeonmok Ko
链接:http://arxiv.org/abs/2412.14033
备注:13 pages, 6 figures; accepted to AAAI-25

[18] Compositional Generalization Across Distributional Shifts with Sparse Tree Operations[cs.CL]
标题:跨分布偏移的稀疏树操作组合泛化
作者:Paul Soulos, Henry Conklin, Mattia Opper, Paul Smolensky, Jianfeng Gao, Roland Fernandez
链接:http://arxiv.org/abs/2412.14076
代码:https://github.com/psoulos/sdtm
备注:NeurIPS 2024. Code available at this https URL

感谢arxiv.org


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/177145
 
142 次点击