《大语言模型提示词工程原理及最佳实践-郑昊pdf.pdf》由会员分享,可在线阅读,更多相关《大语言模型提示词工程原理及最佳实践-郑昊pdf.pdf(41页珍藏版)》请在三个皮匠报告上搜索。
1、 2023,Amazon Web Services,Inc.or its affiliates.All rights reserved.大语言模型提示词工程原理大语言模型提示词工程原理及最佳实践及最佳实践郑昊 Zheng HaoAI/ML Specialist SAPrompt Engineering(PE)Basics One/Few ShotZero ShotSome simple examplesPrompt Engineering(PE)Agenda-基本基本原理原理-通过提示词文本提升通过提示词文本提升LLM任务表现任务表现-通过思维结构进一步挖掘通过思维结构进一步挖掘LLM潜力潜力
2、-Claude PE 最佳实践最佳实践Prompt Engineering(PE)Basics Instruction 指令指令:您想让模型执行的特定任务或指令(做什么,也可以放题干)Context 上下文上下文:涉及到的外部信息或附加上下文,可以引导模型产生更好的响应(背景知识)Input 输入数据输入数据:我们有兴趣为其找到响应的输入或问题(一般来说是题干、或选项)Output Indicator 输出指示符输出指示符:指示输出的类型或格式(以何种形式做)*不是所有组件都需要出现在Prompt中,取决于任务定义及需要Elements of a PromptWhy Prompts can w
3、ork?PretrainSFTInputs:“once upon a time,.”“the AI model is powerful and.”“1+1=.”Inputs:“Write me a story happened long time ago.”“Tell me how AI model can empower our life.”“what is the result of 1+1?”PPO/DPO/RL为什么有效为什么有效?可简单理解为可简单理解为,训练&对齐后,LLM记住了语料中的Query跟Response的共现共现;同时输入的Query在训练的语料中有出现出现;Promp
4、ts&labelsIn-context Learning*https:/arxiv.org/abs/2301.00234*https:/arxiv.org/abs/2111.02080贝叶斯视角贝叶斯视角“文文-文文”匹配匹配 分布匹配分布匹配如果指令复杂&抽象,LLM is powerful enough,不需要额外的训练,能够只通过若干带有input-output样例的prompt来学习到如何进行downstream task。有技巧的Prompt相当于在帮助LLM进行更丰富的回忆及联想,并帮助In-context LearningTunable Hyper-params for Caus
5、al GenerationTop-K:每个时间步,保留Top K个字,re-scale后重采样。增加了衍生Sequence的数量和可能性,但不好控制数量(容易采到long-tail)。Top-P(Nucleus Sampling):每个时间步,保留累积概率在P以内的字数,re-scale后重采样。-K&P联合调控联合调控,提升生成效果提升生成效果(提升多样性并减少预测分布平缓时采样到长尾词的概率提升多样性并减少预测分布平缓时采样到长尾词的概率)Temperature:控制文本生成随机性(re-scale时拉大gap)。值越低,生成的文本就越接近于模型已有的知识和样本,可能会更加保守和可预测。当
6、Temperature值较高时,生成的文本就会更加随机化、多样化、新颖和出人意料。因为模型更容易从多个可能的单词和短语中进行选择和组合,从而创造出更加多样化的文本。-对于对于基于事实的问答基于事实的问答,可以使用较低的温度可以使用较低的温度,以鼓励更具事实性和简洁的回答以鼓励更具事实性和简洁的回答;-对于诗歌生成或其他对于诗歌生成或其他创意任务创意任务,增加温度可能会更适合增加温度可能会更适合。-对于LLM的逐字生成,用怎样的策略选择?Greedy-Beam search-Accumulated prob.小结:训练内容+Hyper params,构成了LLM的personality通过提示词
7、文本提升通过提示词文本提升LLM任务表现任务表现为什么LLM引发了企业积极开始重构应用?12分类分类归纳总结归纳总结信息提取信息提取问答问答闲聊闲聊预训练预训练Natual Language PE&ICLFoundation Model任务任务未未标记数据标记数据Foundation ModelClassic ML Model分类分类归纳总结归纳总结信息提取信息提取问答问答闲聊闲聊训练训练任务任务机器学习机器学习模型模型标记数据标记数据多任务适配多任务适配One Model for AllBoth Classic NLP tasks&Generative tasks经典语言任务经典语言任务文本
8、分类、NER、主题抽取、Summary.对话对话客服Bot、RAG代码代码生成生成辅助编程、程序语言转换、代码检查、程序文档编写其他其他翻译,内容审核,训练大模型内容生成内容生成剧本/小说、营销文案Role PlayVirtual Being,Multi-Agent for Gaming/Virtual SocietyLLM Use CasesHuman:ABC is an e-commerce company.You are a customer service agent in ABC,and you are classifying customer questions by type.B
9、esides,only give the option and do not give your own explanations.Categories are:(1)Refund,(2)Complaint,(3)Ask for help,(4)Cannot finish payment,(5)Cannot open account,.Customer question:QUESTION1.客服场景文本分类客服场景文本分类,取代取代Slot-Intent范式的范式的传统传统BOT2.UGC场景内容审核场景内容审核(Content Moderation),),取代传统的取代传统的关键词匹配关键词
10、匹配*Use Amazon Bedrock ClaudeHuman:You are a language model trained to detect inappropriate content in text.Here is the content:CONTENTIf the content refers to harmful,pornographic,or illegal activities,reply with(Y),and provide explanations.If not,reply with(N).Be aware that some users may attempt t
11、o bypass detection by altering offensive words with symbols,spaces,or elongated letters(e.g.,dog as dooooog,d.0.g,d.o_g,or lixu as lixo,etc),should also be identified and considered in your evaluation.Tips-避免幻觉避免幻觉-如果LLM不知道问题的答案,需要给予LLM明确的许可,让其能够回答“I dont know”。有时,trying to be helpful is much worse
12、than telling“I dont know”。What is the heaviest hippo ever recorded?The heaviest hippo ever recorded was a bull named Hubert,who weighed 5,000 kg or 11,023 lbs.Hubert was measured in Kenya in the early 20th century.Bad PromptBad ResponseAnswer the following question only if you know the answer or can
13、 make a well-informed guess;otherwise tell me you dont know it.What was the heaviest hippo ever recorded?Im sorry,I dont know the exact weight of the heaviest hippo ever recorded.That seems like a very specific and obscure piece of information.Good PromptGood Response*Tips-Cooperate with your Assist
14、ant“Human:请重新审阅重新审阅您的上述回复。您能发现存在的错误吗?如果存在,请找出这些错误并进行必要的编辑。”“Human:查看您刚刚生成的代码。目前它没有运行。您是否能够看到可以纠正的语法错误?如果是这样,请找出有问题的代码部分并重新生成它。”Iteratively Self-criticismPrompt of prompt“Human:.。我现在可以使用什么提示来进一步帮助您完成此任务?”引导性引导性prompt“Human:我希望你编写一个 Python 程序来管理我的客户信息,这些信息存储在AWS S3中。为了完成这项任务,请询问我您需要我回答的任何问题。”通过通过思维结构进
15、一步挖掘思维结构进一步挖掘LLM潜力潜力Least-to-MostHuman:我想写一个美人鱼的故事。先帮我生成一个简短的人物概要,不要包含细节。Assistant:这里是一个简短的美人鱼故事人物概要,不包含细节:主人公:一位美丽的年轻美人鱼公主 王子:一位英俊的人类王子海王:美人鱼的父亲,海洋之王巫婆:一个古老而神秘的巫师.船夫:王子所在的船上的水手Human:根据以上的故事标题和人物概要,生成一些故事发生的场景,不要包含细节。Assistant:好的,根据之前的概要,这里生成一些没有细节的场景:-美人鱼公主在海洋中遇见了人类王子的船只-美人鱼公主救了落水的王子-美人鱼公主迷恋上了王子,常常
16、潜在王子的船边观望-巫婆给了美人鱼一个魔药,可以让她变成人类-海妖伤害了王子,美人鱼舍生救了王子-美人鱼无法再次变成美人鱼,她的生命消逝了Human:基于上文的框架,生成一个长故事。Assistant:好的,这里是一个根据前面框架创作的较长美人鱼故事:海洋深处,有一个美丽的海洋王国,那里居住着美人鱼和其他海洋生物。王国的公主-艾丽儿,有着如阳光般灿烂的长发和美丽的鱼尾,她是海洋中最美的美人鱼。一天,艾丽儿游到海面上,看见一艘华丽的船驶过。船上站着一位英俊的王子,艾丽儿立刻被王子深深吸引,爱上了他。从那天起,艾丽儿经常躲在礁石后,遥望王子的船只,幻想着能跟王子在一起的生活。一天,海上刮起了狂风,
17、王子的船遇难了。其他水手都落海而亡,只有王子还抓着船桅的碎.*Use Amazon Bedrock Claude长小说CoT-Chain Of ThoughtCoT PromptingZero-shot CoT也可以直接使用自定义的reasoning step-复杂的(Numeric)Reasoning是LLM天然的弱项。而当人类面临复杂的推理任务时(如多步骤的数学应用题),直觉上,我们会将任务或问题分解为子任务,然后将这些子任务链接在一起,形成最终答案。*https:/arxiv.org/abs/2201.11903Self-consistency CoT*https:/arxiv.org/
18、abs/2203.11171-触发大模型的CoT,多次生成拿到多个reasoning path的Result。类似于人类的自然推理:如果多个不同的思路最终指向的是同一个答案,那选择该答案有更大的概率是正确的。为了获取多样化reasoning paths,论文中采用T=0.5&k=40 for UL2-20B and LaMDA-137B,T=0.7,k=40 for PaLM-540B.总计10 runs&40 samples.ToT-Tree of Thought*https:/arxiv.org/abs/2305.10601-跟self-consistency CoT不同的是,ToT是生成
19、多个Reasoning Path/Plan,让LLM vote之后,使用最佳的plan来进行进一步生成。ReAct-Combine Reasoning and Action-taking-Reasoning让模型能够归纳、跟踪和更新action plan,而Action让模型能够从外部来源获取额外信息。因此ReAct(Reasoning+Action)被应用于多种自然语言任务和决策任务,展示出优于SOTA的表现,提升了人类可解释性和trustworthiness。*https:/arxiv.org/abs/2210.03629向向Agent进一步进一步迭代迭代-使用工具、外设来装备LLM 20
20、23,Amazon Web Services,Inc.or its affiliates.All rights reserved.使用基础模型构建和扩展生成式 AI 应用程序的最简单方法模型选择自定义大模型执行任务的代理安全和隐私负责任的人工智能海外区域可用Claude 2Claude 2Claude 2.1Claude 2.1Claude InstantClaude InstantJurassic-2 UltraJurassic-2 UltraJurassic-2 MidJurassic-2 MidTitan Text EmbeddingsTitan Text EmbeddingsTitan
21、 Multimodal EmbeddingsTitan Multimodal EmbeddingsTitan Text LiteTitan Text LiteTitan Text ExpressTitan Text ExpressTitan Image Generator Titan Image Generator Llama 2Llama 2Llama 2 13BLlama 2 13BLlama 2 70BLlama 2 70BCommand+Embed Command+Embed Cohere Command LightCohere Command LightCohere Embed En
22、glish Cohere Embed English Cohere Embed Multilingual Cohere Embed Multilingual 广泛的模型选择Amazon Amazon Stable Diffusion XL1.0Stable Diffusion XL1.0Claude专为对话、总结、问答、工作流程自动化、编码等而设计。支持复杂的对话和创造性的内容生成,以及详细的指令跟踪。支持Max tokens 100K版本,更大的上下文长度减少了对基础模型进行微调的需要。Tokens长度:Max 100K支持语种:多语种多语种Claude 2Claude instant每个对
23、话任务接受最多 100K token,意味着一次可处理数百页文本,甚至一整本书可用于大量生产数据、论文等提供更多的context改善输出质量编程、数学和推理方面的改进CodexHumanEval Python编程测试中得分为71.2%(Claude V1得分为56%)GRE 阅读和写作部分达到 90 百分位安全性得到改进,内容输出更无害,尽量避免生成具有冒犯性或危险内容(Constitutional AI)在Red-teaming评估中,性能优于 Claude V1.3达到2 倍亮点亮点业界领先的 200K token上下文窗口Claude 2.1相对于Claude 2.0,Claude2.1
24、上下文窗口提升2倍Bedrock上的单次提示词完成成本降低 25%对于长上下文使用场景中显著提升准确率更多输入,支持大量信息处理,例如整个代码库,财务报表,内部数据集等Claude 2.1幻觉率下降50%对于开放域对话场景,虚假陈述减少一半对于基于文档的问答场景(RAG),幻觉率降低3-4倍Claude越来越清楚自己不知道什么,因此可能直接回答不知道或提出异议而不是产生幻觉;Claude 2.1Claude 2.1System prompts提供定制、稳定的输出Claude 2.1为Claude提供自定义的指示,例如任务描述,输出风格,生成规则,扮演角色等,提升客户场景的表现;主要优势包括:在
25、更长时间的对话中保持其角色特性;提高Claude遵守规则和指令的能力;表现出更具创造性的自然行为Claude 2.1提示词工程最佳实践Guidance(Text):Anthropic Claude“Human人类:”/“Assistant 助手:”格式Claude is trained on alternating“Human:”/“Assistant:”dialogue:Human:InstructionsAssistant:Claudes responseFor any API prompt,you must start with“Human:”and end with“Assistant
26、:”Also include two new lines before eachHuman:Why is the sky blue?Assistant:Human:Why are sunsets orange?Assistant:*symbols above shown for illustrationExamples:清晰且直接指示Claude responds best to clear and direct instructionsWhen in doubt,follow the Golden Rule of Clear Prompting:show your prompt to a f
27、riend and ask them if they can follow the instructions themselves and produce the exact result youre looking forHuman:Write a haiku about robotsAssistant:Here is a haiku about robots:Metal bodies moveCircuits calculate tasksMachines mimic lifeExample:Human:Write a haiku about robots.Skip the preambl
28、e;go straight into the poem.Assistant:Metal bodies moveCircuits calculate tasksMachines mimic life指定角色(角色提示 Role Prompting)Claude sometimes needs context about what role it should inhabitAssigning roles changes Claudes response in two ways:Improved accuracy in certain situations(such as mathematics)
29、Changed tone and demeanor to match the specified roleHuman:How is the U.S.President elected?Assistant:The president is not voted on directly by the people.Instead,each state gets a certain number of electors based on its population.The states with more people get more electors.When people vote on El
30、ection Day in November,they are actually voting for which candidate they want their states electors toExample:Human:You are a kindergarten teacher.How is the U.S.President elected?Assistant:The president is the leader of our country.Every four years,we have a presidential election to choose a new pr
31、esident or re-elect the current president.During the election,people called candidates try to become president by getting votes.On Election Day in November,people go to voting places使用 XML 标签杂乱无章的提示对Claude很难理解就像章节标题和主题可以帮助人们更好地跟踪信息一样,使用 XML 标标签签 等可以帮助等可以帮助 Claude 理解提理解提示的结构示的结构Human:Hey Claude.Show
32、up at 6AM because I say so.Make this email more polite.Assistant:Dear Claude,I hope this message finds you wellExample:Human:Hey Claude.Show up at 6AM because I say so.Make this email more polite.Assistant:Good morning team,I hope you all had a restful weekend我们建议使用 XML 标签,因为 Claude 在XML 标签上经过特定训练PE
33、最佳实践-实际案例 1(Text):51TranslationPE最佳实践-实际案例 2(Text):52Comment ClassificationHuman:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all possible categories:Here are descriptions to each categoryHere is a cus
34、tomer messagePut your predicted categorynnAssistant:Human:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all possible categories:labelsHere are descriptions to each categorydescHere is a customer message
35、textPut your predicted category in tagnnAssistant:based on the customer message and category descriptions,I would classify this message into the Human:Assume you are an analyst in an e-commerce company.Your work is to analyze customer messages and classify each messageinto one categoryHere are all p
36、ossible categories:labelsHere are descriptions to each categorydescHere is a customer messagetextPut your predicted category in tagnnAssistant:based on the customer message and category descriptions,I would classify this message into the ACC:65%ACC:73%ACC:82%XMLDescriptionOptimizationSeparating data
37、 from instructions将输入数据直接包含在提示中可能会使提示过长且难以排除故障将提示结构与输入数据分离,可达到效果有:更容易编辑容易编辑提示词本身处理多个数据集的速度更快速度更快Human:I will tell you the name of an animal.Please respond with the noise that animal makes.ANIMALAssistant:Example:Tip:虽然并不总是必需的,但我们建议使用 XML 标签来分离数据集,以便Claude能更轻松地解析CowDogSealInput dataPrompt template Pl
38、ease respond with the noise that animal makes.Cow Please respond with the noise that animal makes.Dog Please respond with the noise that animal makes.SealComplete prompt54Human:Im going to give you a document.Then Im going to ask you a question about it.Id like you to first write down exact quotes o
39、f parts of the document that would help answer the question.Here is the document context:contextHere is the first question:question First,answer the question.Do not include or reference quoted content verbatim in the answer.Dont say“According to Quote 1”when answering.Instead make references to quot
40、es relevant to each section of the answer solely by adding their bracketed numbers at the end of the relevant sentences.Then,find the quotes from the document that are most relevant to answering the question,and then print them in numbered order.Please add a n for each quote.Quotes should extract fr
41、om the original context and you can not modify them.If there are no relevant quotes,write“No relevant quotes”instead.Thus the format of your overall outputs should look like whats shown between the tags and include in tags.Make sure to follow the formatting and spacing exactly.Company X earned$12 mi
42、llion in 2021.1 Almost 90%of it was from widget sales.2Relevant quotes:1 Company X reported revenue of$12 million in 2021.n2 Almost 90%of revenue came from widget sales,with gadget sales making up the remaining 10%.nIf the question cannot be answered by the document,say so.Answer the question immedi
43、ately without preamble.Once again,my question is:question Assistant:RAG with ReferenceOutput IndicatorInputDataContextInstructionPE最佳实践最佳实践-实际案例实际案例 3(Text):格式化输出&speaking for Claude你可以通过以下方式让 Claude 说出你需要的:明确所需的确切输出格式确切输出格式 Speaking for Claude 明确在Prompt中写出Claude回复的开头(在“Assistant:”之后)Human:Please wr
44、ite a haiku about a cat.Put it in tags.Assistant:Example:Curious little catPouncing on strings and lasersNaps in sunny spotsPromptClaudes response使用示例样例样例是让 Claude 表现好的最有效工具 确保给 Claude 提供常见情况的样例常见情况的样例通常,更多示例更多示例=更可靠的响应更可靠的响应(副作用是有较高延迟和消耗更多Token)样例样例是让 Claude 表现好的最有效工具 Human:Quote:“When the reasonin
45、g mind is forced to confront the impossible again and again,it has no choice but to adapt.”N.K.Jemisin,The Fifth SeasonAuthor:N.K.JemisinQuote:“Some humans theorize that intelligent species go extinct before they can expand into outer space.If theyre correct,then the hush of the night sky is the sil
46、ence of the graveyard.”Ted Chiang,ExhalationAuthor:Assistant:Ted ChiangExample:Sometimes,examples are so powerful that they dont even need to be accompanied by task instructions!Cooperate&cowork with your LLM AssitantRecommendLimitations提供有价值的Few-Shot Samples进行实验和迭代,留出测试集来进行效果验证组合不同的Thought Structur
47、e进一步挖掘潜力(Chaining,Bootstraping,Boosting,Bagging,Stacking,.)-分治-Break complex tasks into subtasks -指出前序产出的问题并让LLM进一步迭代Hallucination无法绝对避免(可以通过设计Prompts,提供Context来削弱)LLM训练成本的原因,用于其训练的内容天然是有滞后的(可以通过RAG来注入新内容)LLM无法执行Action,但可以规划Action(使用Tool和Chain把LLM变成LLM Agent e.g.Amazon Bedrock Agent)Equip LLM with Tools&Chains 总结总结Good Prompting Mattershttps:/