《使用 Power BI 和 Databricks 进行流式处理数据分析.pdf》由会员分享,可在线阅读,更多相关《使用 Power BI 和 Databricks 进行流式处理数据分析.pdf(27页珍藏版)》请在三个皮匠报告上搜索。
1、Streaming data analytics with Power BI and DatabricksLiping HuangMarius PangaDatabricks2023Agenda Lakehouse for Real-Time Analytics Architectural Patterns and Demos Considerations2Transactional recordsPoint of sale(POS)Banking transactionsAirline reservationsCall center recordsInteractionsWeb clicks
2、Social postsEmailsInstant messagesIoT eventsSensorsGeolocationMachine logsMobile devicesThird-partyNews feedsWeatherMarket dataReal-time trafficFrauddetectionPersonalized offerVaccine distributionPredictivemaintenanceSmart pricingIn-game analyticsConnected carsand smart devicesContentrecommendations
3、New opportunities from real-time dataEvery organization generates vast amounts of real-time dataCreating opportunities for new kinds of real-time applicationsReal-time and historical data in separate systemsData streaming is hard for most organizationsDifficult to enable existing data teams with the
4、 languages and tools they already knowDifficult to deploy and maintain streaming data pipelines that run reliably in your production environmentNew APIs and languages to learnComplex operational tooling to buildIncompatible governance models limit ability to control access for the right users and gr
5、oupsLakehouse PlatformDataWarehousingData EngineeringData Scienceand MLData StreamingAll structured and unstructured dataCloud Data LakeUnity CatalogUnified governance for all data assetsDelta LakeData reliability and performanceEnable all your data teamsData engineers,data scientists,and analysts c
6、an easily build streaming data pipelines with the languages and tools they already know.Simplify development and operationsReduce complexity by automating many of the production aspects associated with building and maintaining real-time data workflows.One platform for streaming batch and dataElimina
7、te data silos,centralize security and governance models,and provide complete support for all your real-time use cases.Data streaming made simpleReal-time analytics,machine learning and applications on one platformDatabricks SQL(DB SQL)This is not a deep dive intoPower BIDelta Live Tables(DLT)Structu
8、red StreamingScenario 1Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enr
9、ichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuou
10、s IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPow
11、er BIPower BIPower BIScenario 1Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStrea
12、ming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceC
13、ontinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End Orchestr
14、ationPower BIPower BIPower BIData processing frameworkDeclarative pipelinesData quality enforcementStreaming and batchScenario 1Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowing
15、Automatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS Ap
16、plicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Struct
17、ured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BISQL warehouse on your LakehouseIntuitive environment for data analystsServerless or dedicatedRich ecosystemScenario 1Direct Query+Auto RefreshComposite DatasetsStreaming Conn
18、ectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle
19、 Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/Dashboardi
20、ngData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIDirect Query datasetAutomatic Page Refresh turned onScenari
21、o 1Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message Stor
22、eObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Tran
23、sformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower B
24、IDemoDatabricks DLTDatabricks SQL Power BI Direct Query Power BI Auto Page Refresh1213Scenario 2Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsD
25、ata Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application Lo
26、gsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming Connect
27、orsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIComposite Query datasetAutomatic Page Refresh turned onDemoDatabricks DLTDatabricks SQL Power BI Composite DatasetPower BI Auto Page Refresh1516Scenario 3Direct Query+Auto RefreshComposite DatasetsStreaming
28、 ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreG
29、oogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/Dashb
30、oardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIScenario 3Direct Query+Auto RefreshComposite DatasetsS
31、treaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lak
32、e StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalyti
33、cs/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIManage your own checkpoints/infraMore Control/
34、FlexibilityHandles(all)transformation logicWith/without persisting delta tablesScenario 3Direct Query+Auto RefreshComposite DatasetsStreaming ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pip
35、eline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake StoreGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-pr
36、emises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/DashboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLake
37、house PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIPush APILimited historyPower BI Service OnlyNo transformationDemoSpark Structured Streaming Power BI Streaming Dataset2021Databricks+Power BI streaming architectureDirect Query+Auto RefreshComposite DatasetsStream
38、ing ConnectorsStreaming Ingestion&Transformation with Delta Live TablesFilterJoinsAggregatePhotonStreaming DatasetWindowingAutomatic Deployment&OperationsData Pipeline Observability Automatic Error Handling&RecoveryStreaming Enrichment Message StoreObject StoresAmazon MSKAmazon S3Azure Data Lake Sto
39、reGoogle Cloud StorageData SourcesKinesis Data Streams Azure Event Hubs GCP Pub/SubMobile&IoT DataApplication EventsSaaS ApplicationsMachine&Application LogsOn-premises Systems Data WarehouseUnity Catalog For GovernanceContinuous IngestionData Transformation&.QualityEnhanced Auto ScalingAnalytics/Da
40、shboardingData EnrichmentBusiness AggregatesReal-Time Analytics with Databricks SQLReal-Time Applications with Spark Structured Streaming Streaming ConnectorsLakehouse PlatformDatabricks Workflows for End to End OrchestrationPower BIPower BIPower BIScenario 1 DLT+DB SQL+PBI DQScenario 2DLT+DB SQL+PB
41、I CompositeScenario 3Struct Streaming+PBI Streaming DatasetEase of implementationVery easyEasyMediumLatencyMedium(3s)Medium(3s)Low(3s)Historical dataYesYesLimitedAvailable PBI visualsAllAllBasicIn summary ConsiderationsRefresh interval to match expected new data arrival rateChoose the appropriate SQ
42、L Warehouse tierUse Photon AccelerationUse Event Logs of Delta Live TablesUse SQL Warehouse for Power BI direct query/composite modelsReduce number of visuals on a pageToggle on Referential IntegrityPush down transformation logic to DatabricksResourcesGithub link for demos on Cookbook(demos will be made available!)Power Up your BI with Power BI and Lakehouse in Azure DatabricksDelta Live Tables Structured Streaming Power BI Page Auto Refresh Marius PangaDatabricks Architecthttps:/ in touchLiping HuangSenior Solutions Architecthttps:/