《构建企业级 AI 平台的架构策略和实践-李欣.pdf》由会员分享,可在线阅读,更多相关《构建企业级 AI 平台的架构策略和实践-李欣.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、Strategies of Machine Learning Platform Building&Practices in eBayeBay AIP Chief Architect,CCOE VAT Chairman/Bruce LiAgendaUnified data strategiesAI Platform vision,design principles and core capabilitiesAI/ML use case analysis123AI Use Cases-Online data services OTF FE-Streaming events NRT FE-Offli
2、ne batch/ETL datasets Batch FEStructured DataStructured DataSemi/Unstructured DataSemi/Unstructured Data(image/video/text/3D/(image/video/text/3D/)Data Source-Content generation/acquisition NRT pipelineStorageUnified online/offline feature storeUnified online/offline content storeData PiT ParityOnli
3、ne/offline PiT data strategiesPiT data parity is not requiredFeedback Loop-Short:Continuous online training-Long:Offline PiT feature simulationVendor/manual/auto labellingCommonDriver set&training set generation&management,catalog,data lineage,etc.CPU/GPU-CPU training and inferencing typically-GPU t
4、raining and inferencing typicallyChallenges of Building Enterprise ML PlatformTends to invest more on solutions instead of platformLack of clear boundary between solutions and platformLack of unified data strategies and self-service support for ML Platform buildingTraditionally focus more on trainin
5、g,lack of enough platform support on data/feature and inferencingLack of E2E seamless integration strategies cross feature,training and inferencingML Development LifecycleAgendaUnified data strategiesAI Platform vision,design principles and core capabilities123AI/ML use case analysis Our VisionTo em
6、power eBay AI practitioners to build,train and deploy machine learning models with fully-managed,efficient and self-service platform at scale.ML Platform Core Capability MapML Platform Architectural PrinciplesEnable self-service based on centralized configuration and metadata-driven design,with life
7、cycle management and governance in placeEnable unified metadata and definitions cross online and offline,with enough flexibility and extensibility to support domain level customizationsProvide a group of management APIs&services for MLP managed lifecycle,and enable the E2E seamless integration based
8、 on the APIsProvide unified catalogs(including data,stored variables,features,models,solutions,etc.)to promote discovery,reuse and better governanceProvide E2E data lineages for the AI Platform domain entitiesApply unified monitoring cross the whole ML platformML Platform Online Integration Architec
9、tureEntity Modeling in ML PlatformDependency DAG&Execution PlanUnified CPU/GPU Inferencing PlatformModel and Feature MonitoringAgendaUnified data strategies132AI/ML use case analysis AI Platform vision,design principles and core capabilitiesWhy Data Strategies are so Important for AI/ML Image source
10、:Cognilytica,from https:/www.ayadata.ai/blog-posts/manual-vs-automated-data-labeling Batch FeatureFeature DSLNRT Roll-up AbstractionNRT Feature EngineeringNRT FeatureSchemaEvent processingDerived ComputationOn-the-fly FeatureComparisons of Different Features TypesYesBatch FeatureMLP ManagedNRT Featu
11、reOn-the-fly FeatureYesNoSelf-service by End Users(DS)YesYesNoOnline/offline PiT StrategyPiT Simulation/Feature SnapshottingPiT Simulation/Feature SnapshottingFeature Snapshotting OnlyData SourceETL/Batch data/Snapshotted DatasetEnriched eventsRequest context/Online data servicesReusabilityEasy to r
12、euseEasy to reuseSolution by solution supportTime-to-MarketFastFast except new enriched event acquisitionSlowDelay of Data Freshness1 Day+P99 5 sec Real-timeEmbracing NRT StrategyIntegrated Data StrategiesFeature PlatformTraining PlatformInferencing PlatformUnified Feature StoreFeature Lifecyle Mngt.Feature PiT SimulationTraining Set GenerationDriver/training Set Mngt.High-throughput Data AccessFeature/Model SnapshottingUnified Model SpecAPI Spec Auto-Gen