基于生成式检索器:属于比较新的方式,代表工作是谷歌 Tay Yi 等人的 Differentiable Search Index [12], 将知识都保存在语言模型的参数当中,给一个 query 后,直接输出对应知识的 doc id 或者 doc content. 因为语言模型,就是知识库 [13]!
基于强化学习:也是比较前沿的方式,代表工作比如 OpenAI 的 WebGPT [14],使用 human feedback 训练模型,以进行正确知识的检索。
与模型或者工具交互
语言模型与模型或者工具交互,主要的目的是进行复杂任务的分解,比如将复杂的推理任务分解为若干子任务,这也是 Chain of Thought [17] 的核心思想。不同的子任务可以使用具有不同能力的模型或者工具解决,比如计算任务可以使用计算器解决,检索任务可以使用检索模型解决。因此,这种类型的交互不仅可以提升语言模型的推理 (reasoning)、规划 (planning)、决策 (decision making) 能力,还能减轻语言模型的 “幻觉” (hallucination)、不准确输出等局限。特别地,当使用工具执行某种特定的子任务时,可能会对外部世界产生一定影响,比如使用 WeChat API 发了一条朋友圈等,称为 “面向工具的学习”(Tool-Oriented Learning) [2].
另外,有时候显式地分解一个复杂的任务是很困难的,这种时候,可以为不同的语言模型赋予不同的角色或者技能,然后让这些语言模型在互相协作、沟通的过程当中,隐式、自动地形成某种分工方案 (division of labor),进行任务的分解。这种类型的交互不仅仅可以简化复杂任务的解决流程,还可以对人类社会进行模拟,构造某种形式的智能体社会。
作者们将模型和工具放在一起,主要是因为模型和工具不一定是分开的两个范畴,比如一个搜索引擎工具和一个 retriever model 并没有本质的不同。这种本质,作者们使用 “任务分解后,怎样的子任务由怎样的对象来承担” 进行界定。
[1] Experience Grounds Language,https://arxiv.org/abs/2004.10151
[2] Tool Learning with Foundation Models
[3] Foundation Models for Decision Making: Problems, Methods, and Opportunities
[4] ChatGPT for Robotics: Design Principles and Model Abilities
[5] Augmented Language Models: a Survey
[6] Sparks of Artificial General Intelligence: Early experiments with GPT-4
[7] Training language models to follow instructions with human feedback, https://arxiv.org/abs/2203.02155
[8] Conversational AI, http://coai.cs.tsinghua.edu.cn/
[9] AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts, https://arxiv.org/abs/2110.01691
[10] Interactive Text Generation
[11] Evaluating Human-Language Model Interaction
[12] Transformer Memory as a Differentiable Search Index, https://arxiv.org/abs/2202.06991
[13] Language Models as Knowledge Bases?, https://arxiv.org/abs/1909.01066
[14] WebGPT: Browser-assisted question-answering with human feedback, https://arxiv.org/abs/2112.09332
[15] Atlas:Few-shot Learning withRetrieval Augmented Language Models, https://arxiv.org/pdf/2208.03299.pdf
[16] MINEDOJO:Building Open-EndedEmbodied Agents with Internet-Scale Knowledge, https://arxiv.org/pdf/2206.08853.pdf
[17] Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, https://arxiv.org/abs/2201.11903
[18] ReAct: Synergizing Reasoning and Acting Inlanguage Models, https://arxiv.org/abs/2210.03629
[19] Least-to-Most Prompting Enables complex reasoning in Large Language Models, https://arxiv.org/pdf/2205.10625.pdf
[20] Measuring and Narrowingthe Compositionality Gap in Language Models, https://ofir.io/self-ask.pdf
[21] HuggingGPT, https://arxiv.org/abs/2303.17580
[22] Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/abs/2302.04761
[23] Socratic Models, https://arxiv.org/pdf/2204.00598.pdf
[24] MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks, https://aclanthology.org/2021.emnlp-main.85/
[25] Computational Language Acquisition with Theory of Mind, https://openreview.net/forum?id=C2ulri4duIs
[26] Generative Agents: Interactive Simulacra of Human Behavior, https://arxiv.org/pdf/2304.03442.pdf
[27] CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society, https://www.camel-ai.org/
[28] OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework, https://arxiv.org/abs/2202.03052
[29] BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning, https://arxiv.org/abs/2206.08657
[30] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models, https://arxiv.org/pdf/2301.12597.pdf
[31] Do As I Can,Not As I Say:Grounding Language in Robotic Affordances, https://say-can.github.io/
[32] Grounded Decoding: Guiding Text Generation with Grounded Models for Robot Control, https://grounded-decoding.github.io/
[33] Inner Monologue:Embodied Reasoning through Planning with Language Models, https://innermonologue.github.io/
[34] Large Language Models with Controllable Working Memory, https://arxiv.org/abs/2211.05110