《阿里云:AI让数据库的路走的“更快更远”(2023)(17页).pdf》由会员分享,可在线阅读,更多相关《阿里云:AI让数据库的路走的“更快更远”(2023)(17页).pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、AI让数据库的路走的“更快更远”阿里云数据库高级技术专家 戴健AI的使用将会越来越普及83%CEOs 相信AI 是一个战略重点MIT Sloan Management Review$2.9 trillion 商业价值由AI创造6.2 billion hours 人力花在了AI上Gartner1、特征、模型管理难AI的困境AI的困境业务数据算法AI一直迭代、演进中CRM MLOpsDataOpsCRM ModelOpsCRM DevOpsAI?DB?AIDB流程简单化、低代码量、更低的开发成本&运维成本DB+AI数据、特征、模型一起存DataOps、ModelOps统一做一起存一起算为什么我们选
2、择扩展DataOps到ModelOps事务 ACID查询加速索引缓存一写多读(多写多读)数据新鲜度数据易用性SQLUDF联邦查询ServerlessHTAP扩展DatOps到ModelOps,保持了数据新鲜度,维持了数据的易用性和可用性,避免了模型单独的数据管理系统,数据延迟和复杂的硬编码数据pipeline,方便了AI的在线决策。DataOps+ModelOps:核心功能数据特征模型AI模型创建AI模型评估AI模型调参AI模型组合AI模型部署数据管理+特征管理+模型管理SQL+SQL for MLOpsCREATE MODEL airlines_gbm_copy1 WITH(model_cl
3、ass=lightgbm,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,model_parameter=(boosting_type=gbdt,n_estimators=100,max_depth=8,num_leaves=256)AS(SELECT*FROM airlines_train)SELECT TripID,Delay FROM PREDICT(MODEL airlines_gbm_copy1,SELECT*FROM airlines_train_1000_copy1)WI
4、TH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID,CREATE MODEL airlines_gbm WITH(model_class=lightgbm,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,model_parameter=(boosting_type=gbdt,n_estimators
5、=100,max_depth=8,num_leaves=256)AS(SELECT*FROM airlines_train)模型创建模型评估SELECT Delay FROM evaluate(MODEL airlines_gbm,SELECT*FROM airlines_test)WITH(x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,metrics=acc);模型推理(离线)SELECT TripID,Delay FROM PREDICT(MODEL airlines_gbm_c
6、opy1,SELECT*FROM airlines_train_1000_copy1)WITH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID,mode=async)INTO lightgbm_v2_predict82201;特征创建模型上传模型部署PolarDB for AIUPLOAD MODEL model_name WITH(model_location=,req_location=)DEPLOY M
7、ODEL model_name模型推理(在线)SELECT TripID,Delay FROM PREDICT(MODEL airlines_gbm_copy1,SELECT*FROM airlines_train_1000_copy1)WITH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID,);UDF创建DEPLOY MODEL my_lr_model WITH(mode=in_db);CREATE FU
8、NCTION my_lr_model RETURNS REAL SONAME#ailib#_my_lr_model.so;CREATE FEATURE feature_name WITH(feature_class=,parameters=()AS(SELECT select_expr,select_expr.FROM table_reference)特征更新UPDATE FEATURE feature_name WITH(feature_class=,parameters=()AS(SELECT select_expr,select_expr.FROM table_reference)模型描
9、述DESCRIBE MODEL model_name特征删除DROP FEATURE feature_name模型删除DROP MODEL model_name等AI SQLPolarDB for AI:DB for AI in PolarDB MySQLSQL:Feature Creation,Model Creation,Model Evaluation,Model Inference,etc.一个系统:PolarDB 一套语言:SQL基于高速RDMA的CPU/memory/storage 三层解耦PolarStoreOSSMemRWROPROXYAIAIScaleScaleUpUpSca
10、le OutScale OutPolarDB for AI模型推理SELECT TripID,Delay FROM PREDICT(MODEL airlines_gbm_copy1,SELECT*FROM airlines_train_1000_copy1)WITH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID,mode=async)INTO lightgbm_v2_predict82201;数据模型推理结
11、果PolarDB for AI:场景化场景一:从数据到模型到应用模型开发模型应用模型创建模型训练模型评估模型描述https:/ MODEL airlines_gbm WITH(model_class=lightgbm,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,model_parameter=(boosting_type=gbdt,n_estimators=100,max_depth=8,num_leaves=256)as(SELECT*FROM db4ai.airlines_tr
12、ain)模型列表SHOW TASK df05244e-21f7-11ed-be66-xxxxxxxxxxxx;模型创建结果查看DESCRIBE MODEL airlines_gbm;模型描述结果查看模型评估SELECT Delay FROM evaluate(MODEL airlines_gbm,SELECT*FROM airlines_test)WITH(x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,metrics=acc);模型在线推理模型离线推理模型列表SHOW MODELSS
13、ELECT TripID,Delay FROM PREDICT(MODEL airlines_gbm_copy1,SELECT*FROM airlines_train_1000_copy1)WITH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID)INTO lightgbm_v2_predict1030;模型在线推理SELECT TripID,Delay FROM PREDICT(MODEL airlines
14、_gbm,SELECT*FROM airlines_train_1000_copy1)WITH(s_cols=TripID,Delay,x_cols=Airline,Flight,AirportFrom,AirportTo,DayOfWeek,Time,Length,y_cols=Delay,primary_key=TripID)模型离线推理场景二:预训练的模型训练好的模型requirements.txtUPLOAD MODEL my_model WITH(model_location=https:/xxxx/model.pkl?Expires=xxxx&OSSAccessKeyId=xxxx
15、&Signature=xxxx,req_location=https:/xxxx/requirements.txt?Expires=xxxx&OSSAccessKeyId=xxxx&Signature=xxxx)模型上传DEPLOY MODEL my_model;模型部署SELECT Y FROM PREDICT(MODEL my_model,SELECT*FROM db4ai.regression_test LIMIT 10)WITH(x_cols=x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19,x20,x
16、21,x22,x23,x24,x25,x26,x27,x28,y_cols=);模型在线推理orDEPLOY MODEL model_name WITH(mode=in_db);模型部署CREATE FUNCTION function_name RETURNS return_value SONAME soname;UDF创建SELECT function_name(content);UDF使用场景三:开箱即用的方案idproduct_idproduct_review11华为最新手机Mate 60 Pro自开卖后销售火爆,目前已缺货。21华为大幅提升了Mate 60 Pro的出货量预期。这对于产
17、业和股市的影响力不可忽视。32这个东西只是看着还行,实际体验上不太好,不推荐大家购买。reviewsSELECT*FROM PREDICT(MODEL _polar4ai_tongyi_sa,SELECT product_review FROMreviews WHERE id=1)WITH();result正向情感分析idcomment3今年暑期档最大的变化是:好莱坞大片的失败,与现实题材的国产片大行其道。过去,好莱坞大片讲究炫目视效,在题材上以动作、奇幻、冒险为主,主要靠视觉轰炸。而在后疫情时代,观众则越来越关注现实的、切身的内容。今年的TOP 10电影,现实题材几乎屠榜。消失的她捆绑反恋爱
18、脑与泰国杀妻等热门话题、八角笼中深度挖掘王宝强的人生经历主打草根逆袭;孤注一掷则切中人们对电信诈骗、缅甸诈骗园、荷官设局等社会热点的好奇;热烈也聚焦了普通人、小人物的成长路线。回顾近几年的中国电影市场,现实题材其实早就开始拔尖。2018年我不是药神,以31亿票房拿下当年暑期档冠军。2019年的扫毒2、2021年怒火重案中国医生,再到2022年的人生大事还有其他档期的奇迹笨小孩我的姐姐等等commentsselect*FROM PREDICT(MODEL _polar4ai_tongyi_summarize,selectproduct_review from reviews whereid=3)with();result今年暑期档最大的变化是现实题材电影大受欢迎,好莱坞大片则表现不佳。中国电影市场中,现实题材电影早已成为主流,包括我不是药神、扫毒2、怒火重案等多部影片都取得了高票房。总结研究成果A Comparative Study of in-Database Inference Approaches(ICDE 2022)SmartLite:A DBMS for Serving Multiple Neural Models with Constraint Resource (PVLDB 2024)