上海品茶后花园,后花园上海,爱上海419,上海品茶网

上海品茶

构建云原生的端到端生成式 AI 应用-肖元君.pdf

上传人：张**

编号：153182

2024-01-15

29页 6.84MB

《构建云原生的端到端生成式 AI 应用-肖元君.pdf》由会员分享，可在线阅读，更多相关《构建云原生的端到端生成式 AI 应用-肖元君.pdf（29页珍藏版）》请在三个皮匠报告上搜索。

1、构建云原生的端到端生成式 AI 应用亚马逊云科技解决方案架构师/肖元君议程大模型推理面临的挑战通过 Amazon SageMaker 和 Amazon Bedrock 降低大模型推理复杂度构建端到端的生成式 AI 应用与 Demo 演示大模型推理面临的挑战大模型推理面临的挑战复杂性复杂性模型体量大模型并行模型 serving 基础设施设置成本成本模型编译模型托管成本运营开销待部署和管理的模型数量性能性能模型编译模型压缩延迟吞吐量可用性模型多卡并行推理方式模型多卡并行推理方式Tensor parallel Intra layerPipeline parallelism Inter l

2、ayer模型压缩模型压缩PruningDistillation模型 Pruning模型蒸馏模型量化（BitsandBytes，GPTQ，SmoothQuant）AttentionAttention 层计算优化层计算优化FlashAttentionPageAttentionContinuousContinuous BatchingBatchingLLM LLM 推理优化框架推理优化框架DeepSpeed inference 模型量化Tensor ParallelMoE 推理（模型蒸馏）Hugging Face TGITPAttention 计算优化（Flash Attention and Pag

3、ed Attention）Continuous batching模型量化Nvidia FasterTransformer模型压缩TP/PP模型量化vLLMTPContinuous batchingPaged AttentionHugging Face AcceleratePP模型量化TensorRT-LLM议程大模型推理面临的挑战通过 Amazon SageMaker 和 Amazon Bedrock 降低大模型推理复杂度构建端到端的生成式 AI 应用与 Demo 演示SageMakerSageMaker 大模型推理镜像大模型推理镜像（LMILMI）DJL ServingDeepSpeed,H

4、ugging Face Accelerate,FasterTransformer,transformers-neuronxPyTorchGPU:cuDNN cuBLASNCCL CUDA toolkitAWS Inferentia:NeuronCPU:mklBase imageZero-code setup:DeepSpeed,and Hugging Face;FasterTransformer;transformers-neuronx handlersSupported instances type:G4dnG5P3P4P5Inf2使用基于使用基于Amazon Amazon SageMake

5、rSageMaker 的的 Large model Large model inference/LMI inference/LMI 容器容器：支持多种不同的推理引擎:HF accelerate,deepspeed,fastertransformer等利用内置的S5cmd命令可以快速的下载大模型Static/Dynamic/Rolling Batch支持流式生成新的tokendeepspeed推理引擎支持bf16模型：开源版本的deepspeed inference不支持bf16模型Flash attention/Paged attentionLargeLarge ModelModel Inf

6、erence(LMI)Inference(LMI)镜像新特性镜像新特性 Now available on DLC with DJLServing 0.25.0 Integrate with vLLM(DeepSpeed Container,MPI Engine,rolling_batch to vllm)Integrate with TensorRT-LLM TensorRT-LLM Smoothquant(75%latency drop comparing to TGI INT8)Very Fast FP16/BF16 inference speed Rolling batch withou

7、t Triton server Zero code setupLMI LMI TensorRTTensorRT-LLM-LLM 部署示例部署示例可通过单个 API 选择业界领先的基础模型使用企业内部数据自定义模型企业级安全和隐私Amazon Bedrock助力快速利用基础模型构建和扩展生成式 AI 应用程序Claude 2Claude 2Claude 2.1Claude 2.1Claude InstantClaude InstantJurassic-2 UltraJurassic-2 UltraJurassic-2 MidJurassic-2 MidTitan Text EmbeddingsT

8、itan Text EmbeddingsTitan Multimodal EmbeddingsTitan Multimodal EmbeddingsTitan Text LiteTitan Text LiteTitan Text ExpressTitan Text ExpressTitan Image Generator Titan Image Generator Llama 2 13BLlama 2 13BLlama 2 70BLlama 2 70BCommand+Embed Command+Embed Cohere Command LightCohere Command LightCohe

9、re Embed English Cohere Embed English Cohere Embed Multilingual Cohere Embed Multilingual 丰富的模型选择Amazon Stable Diffusion XL1.0Stable Diffusion XL1.0以以 RAGRAG 架构为例架构为例运维关注点：文档切分及导入Embedding Model 服务器向量数据库LLM 模型服务器知识库增，删，改等操作LangChain 编排应用服务器，上下文存储，会话管理等Knowledge Bases for原生 RAG 支持的全托管服务自动将文本文档转化为 Emb

10、edding 内容将 Embedding 内容存储在向量数据库中检索 Embedding 内容并增强提示词基于基于 BedrockBedrock KnowledgeKnowledge basebase 的的 RAG RAG 架构架构运维关注点：文档切分及导入Embedding Model 服务器向量数据库LLM 模型服务器知识库增，删，改等操作LangChain 编排应用服务器，上下文存储，会话管理等议程大模型推理面临的挑战通过 Amazon SageMaker 和 Amazon Bedrock 降低大模型推理复杂度构建端到端的生成式 AI 应用与 Demo 演示生成式生成式AIAI典型应用场

11、景潜力典型应用场景潜力您预计哪些生成式 AI 用例在组织中最有前景？来源：IDC全球 CIO快速调研，2023年2月15.219.623.932.639.160.9020406080无，我不认为它适用于我的组织设计应用代码生成应用营销应用对话式应用知识管理应用金融行业、电商零售、能源行业、医疗行业、法律行业头部机构会在1年内尝试引入大模型以及生成式AI能力；首先在相对成熟的场景中引入；驱动力来源于竞争压力，希望获取先发竞争优势。过去5年部署的AI应用，都有可能被新一代AI更新换代。营销文案生成：方案架构营销文案生成：方案架构1.用户通过 Web 页面向后端发出请求，如：“XX牌，帐篷，颜色鲜艳

12、，轻便，抗风，抗雨，高山帐，海拔5000米”。2.服务端将用户输入转化为特定的 Prompt 提示词，如”请根据下面的内容写一段小红书的种草文案:XX牌，帐篷，颜色鲜艳，轻便，抗风，抗雨，高山帐，海拔5000米“。3.SageMaker 接收到输入参数，并把推理结果以流式进行输出。4.服务端通将输出流封装为 WebSocket 协议并返回给用户（前端）。1RequestRequest(Prompt)2Streaming Response(Answer and History)3Streaming Response4UsersAPP serviceChatGLM2SageMaker Endpoi

13、ntBedrock Claude2营销图片：方案架构营销图片：方案架构1RequestResponse(Image)6Product Design Model on SageMakerPrompt¶metersResponse(Base64)4Amazon Translate1RequestResponse(Image)8SAM ModelPrompt(Mask content)Mask image(S3 URL)Amazon TranslateSD Inpainting Model325Request(Chinese)Response(English)32Request(Chines

14、e)Response(English)4567Response image(Base64)Origin image&Mask产品设计营销素材Bedrock SDXL1.0Bedrock TitanProductProduct Design Design 模型介绍模型介绍一个产品和汽车设计师的作品基于SD 1.5的Fine-tuned Model Checkpoints设计过程辅助工具，形状创意生成，也可以对草图和3D设计稿进行渲染。模型可以从站点下载SAMSAM与与stable-diffusion-2-inpaintingstable-diffusion-2-inpainting模型介绍模型介

15、绍Segment Anything Model(SAM)分割一切模型，根据点或框等输入提示生成高质量的对象蒙版，并且可用于为图像中的所有对象生成蒙版。基于1100 万张图像和 11 亿个掩模的数据集上进行训练，在各种分割任务上具有强大的零样本性能。Github地址stable-diffusion-2-inpainting根据文本提示生成和修改图像基于Stable Diffusion V2，遵循LAMA蒙版生成策略，额外训练200k steps得到的模型。Github地址ChatGLM3 FunctionChatGLM3 Function CallCallChatGLM3 FunctionChatGLM3 Function CallCall相关资源相关资源1.Workshop操作手册：https:/catalog.us-east-1.prod.workshops.aws/workshops/4aec1efd-5181-46be-b7b1-2ee9292dae80/zh-CN2.Workshop源码：模型部署：https:/ 演示演示