《达泰库:吃你的蛋糕也吃它与Dataiku+数据砖.pdf》由会员分享,可在线阅读,更多相关《达泰库:吃你的蛋糕也吃它与Dataiku+数据砖.pdf(22页珍藏版)》请在三个皮匠报告上搜索。
1、Have Your Cake and Eat it Too with Dataiku and DatabricksAmanda MilbergSenior Partner Solutions Engineer,DataikuDatabricks2023Session ObjectiveProvide an overview of Dataiku Highlight the seamless feature integrations of Dataiku and DatabricksIllustrate how utilizing these two technologies provide a
2、 fool proof recipe for AI applications and data driven success Providing a high level overview of what are Large Language Models(LLMs)Introduce the use case we will be walking through todayOutline step-by-step how to customize a LLM on your own data with low computational resourcesDemocratize our LL
3、M to the enterprise through a no-code Dataiku ApplicationOutline the components of the RAFT framework for Generative AI Deploy our LLM in a secure,governed environment Gather the IngredientsFollow a RecipeThe Icing on The CakeHighlight how Dataiku and Databricks is a winning formula for data excelle
4、nce1_DAIS_Title_SlideGather the IngredientsHow Dataiku and Databricks provide a fool-proof recipeWhat is Dataiku?Skill Agnostic Analytics Workbench to Scale Data&AI Initiatives Data EngineerBusiness AnalystData AnalystAnalytics LeaderData ScientistHigh codeLow codeNo codeCentralized Analytics Workbe
5、nchCloud Agnostic/Hybrid CloudData Access&CatalogingData Preparation&Analysis(Auto)Machine LearningProduction DeploymentAnalyze,aggregate,and transform data with visual or code interfacesDevelop,deploy,and monitor machine learning models in a single common environmentSchedule,deploy,and monitor data
6、 products across the enterpriseData Product Development&GovernanceConnect to ANY data source(Cloud,On Prem,ERP,CRM,etc)with or without codeEveryone working together in a common analytics workbench with a visual interface allows for collaboration,reuse,and scaleDecrease the marginal cost of data&clou
7、d investments by extending access(with proper guardrails)to everyone across the organizationSimple,full lifecycle platform for everyoneMost comprehensive data and analytics platform on the marketFULL CODELOW CODENO CODESQL+MIXEDInfrastructureFeature IntegrationsEmpower users with a full suite of dat
8、a and AI capabilities closely knit with your toolkitFast load data from S3/Azure Blob into DatabricksPerform visual recipes on the Databricks SQL DBWrite SQL code to execute in Databricks SQL DBWrite Python code to execute in Databricks Cluster with dbc2Score models in Databricks SQL DB1_DAIS_Title_
9、SlideFollow the RecipeWalk through an example of building a LLM applicationWhat are Large Language Models(LLMs)LLMs represent a subset of foundation models that are trained specifically on text sources.LLMs are a revolution in Natural LLMs are a revolution in Natural Language ProcessingLanguage Proc
10、essing NLP is a well-established field of machine learning for analyzing and processing text LLMs are revolutionary because theyve cracked the code on language complexity.Now,for the first time,machines can learn language,context and intent,and be independently generative and creative.40%40%of all w
11、orking hours can be impacted by LLMs like GPT-4.Language based tasks make up 62%62%of the total time employees work65%65%of that time can be transformed into more productive activity through augmentation and automation14 in 10 organizations4 in 10 organizationswant to make a large investment in LLM
12、capabilities 1Accenture ReportThere are two broad categories of LLMs which one to use is dependent on your business needs Proprietary Saas LLMsOpen Source LLMsPrimarily accessed through APIsBigger and better for more tasksData will leave your environment Customization is vendor dependent Can be down
13、loaded&run locally Smaller and often less performant Data stays in your environmentFully customizable Illustrative Use Case:BloomBot Developing a Q&A system,powered by a LLM,to assist the support desk in responding to questions from customers VP of Customer OpsHead of RiskDirector of ITRequests the
14、LLM be customized with internal Q&A knowledge source from support tickets in the pastWants to balance speed of innovation with caution and requests all data stays in BloomBots environment Concerned with costs,so advises we start with a local,“smaller”model as a proof of concept Customizing an Open S
15、ource LLM model fully contained on a companys infrastructure with no external data movementPrepare our data with visual recipes using Databricks Compute 1Build a vector store to find similar answers of questions asked in the past(“relevant facts”)2Pass the relevant facts into the our prompt template
16、 to query to Dolly to generate a response 31.Preparing our DataUsing visual recipes to prepare and transform our textual data2.Build out our Vector Store Combining our LLM with our Q&A data source to provide relevant and up-to-date answersSecret Ingredient:Retrieve-Then-Read Pipeline Our proposed ap
17、proach for customization without the need for fine-tuning1.We receive the question from the user;2.We query the vector store to retrieve the relevant facts most semantically similar to the question;3.We incorporate both the question and the previous answers in a question-answering prompt;4.We query
18、a LLM with this prompt and receive the answer.3.Prompt Engineering and Q&AA step-by-step walk through of the process Can anyone recommend a style of small-sized grow lamp that I can use inside my apartment to give my plants some extra light?User Question:Retrieval of facts semantically similar to th
19、e questionVector Store-Incidentally using a CFL light is a great way to start seeds.It amazing how easy it is to start seeds(even temperamental varieties)with CFL light and a heating mat.-A great plant that requires very little care is a Jade Plant.I only water when its very dry and it takes care of
20、 itself.Relevant Facts:Instantiation of the prompt templateYou are a gardener and your job is to help providing the best gardening answer.Use only information in the following paragraphs to answer the question at the end.Explain the answer with reference to these paragraphs.If you dont know,say that
21、 you do not know.contextQuestion:questionResponse:Prompt Template:I have had success with an 80 watt compact fluorescent light(CFL)within 5 cm of the plants,you can grow at least 6 plants.Unlike metal halide and high pressure sodium lights you can position the light close to the plants without burni
22、ng themCompletion by Dolly:Dolly QueryYou are a gardener and your job is to help providing the best gardening answer-Incidentally using a CFL light-A great plant that requires very little.Question:Can anyone recommend a style of small-sized grow lampResponse:Prompt:1_DAIS_Title_SlideThe Icing on the
23、 CakeDemocratizing our LLM with proper security and governanceDemocratizing Our LLM to the EnterpriseNo-code Dataiku Applications to package our project as a reusable asset to be utilized by the business The RAFT Framework for Generative AI Ensure proper governance throughout your development and de
24、ployment processReliable and SecureAccountable and Governed Fair and Human-CentricTransparent and Explainable AI systems are built to ensure consistency and reliability across the entire lifecycle.Data and models are secure and privacy-enhancing.Ownership over each aspect of the AI lifecycle is docu
25、mented and used to support oversight and control mechanisms.AI systems are built to minimize guidance against individuals or groups and support human determination and choice.The use of AI is disclosed to end users and explanations of the methods,parameters and data used in AI systems are provided.Q
26、uick recap of todays sessionA journey from ingredients to insights 1.Described how Dataiku and Databricks provides the most comprehensive platform for all your people,all your data,and any analytical technique2.Discussed how we can customize a open source LLM in Dataiku using the compute and storage power of Databricks 3.Demonstrated how we can democratize our LLM to the enterprise in a safe,secure,and governed wayWant to learn more?Stop by Booth#423Amanda MilbergSenior Partner Solutions EngineerAmanda.M