《使用 Node.js 和增量共享编写数据共享应用.pdf》由会员分享,可在线阅读,更多相关《使用 Node.js 和增量共享编写数据共享应用.pdf(21页珍藏版)》请在三个皮匠报告上搜索。
1、Building Data Sharing Applications using Node.jsWill GirtenLead Specialist Solutions Architect at DatabricksDatabricks2023Who am I?Will Girten,Lead SSA at DatabricksBS in Computer Engineering from the University of Delaware Author of the Node.js connector for Delta SharingJoined Databricks in 2019 S
2、pecializes in data warehousing and performance tuning BI workloads for Financial Services.Prior to Databricks,Will worked as a Data Architect helping federal customers build intelligent data lakes in HealthCare and Government verticals1_DAIS_Title_SlideWhy Node.js?Top Programming LanguagesAcross rep
3、os created on GitHub in 2022RubyC+C#TypescriptJavaPythonJavascriptSource:https:/ is evolving into a language built for communicating data insights Built on top of the V8 JavaScript engine#2.Cross-platformCompatible with all major operating systems#3.Real-time appsAn event-driven,non-blocking I/O mod
4、el#1.FastThe Node.js Event LoopRequestRequestRequestWorkerWorkerWorkerEvent QueueThread PoolEvent LoopRegister CallbackOperation CompleteExecute CallbackWhat makes Node.js great?Sharing large datasets with Delta SharingOverview of Delta SharingThe industrys first open data sharing protocol1.Share li
5、ve data without copying out of the data lake2.Support a wide range of clients,like Node.js3.Strong security,auditing,and governance4.Efficiently scale to massive datasetsHow it works“under the hood”The industrys first open data sharing protocolDelta Sharing ServerDelta TableQuery“sales”tableGet late
6、st snapshothttps:/ are short-lived file URLshttps:/ worth solvingThe hidden cost to powering your APIs&frontendsStreaming&Batch Sources Step 1:Raw(Bronze)Step 2:Transformed(Silver)Step 3:Feature/Aggs(Gold)Step 4:COPY Key-Value StoreThe hidden data maintenance cost!Who maintains these data pipelines?
7、Who rebuilds the indexes?Is this addtnl cloud cost?How long is a refresh?What about my APIs?Use case:Ad placement prediction Copying the data to a serving layer creates unnecessary data silosHow do you keep the copy of the data up-to-date with the source?Use case:Real-time Web ApplicationCopying the
8、 data to a serving layer creates unnecessary data silosRedundant data copy thatsdifficult to manage!Use case:Popular Fashion Retail SiteCopying the data to a serving layer creates data silosTheres nothing“real-time”about copying the data!“Today data lives in many different systems and theyre siloed.
9、Even from the vendors perspective its become a nightmare to manage”.-Ali GhodsiA simplified architectureStop copying your data to power your APIs&frontendsStep 1:Raw(Bronze)Step 2:Transformed(Silver)Step 3:Feature/Aggs(Gold)Delta Sharing ServerMerge changesMerge changes1_DAIS_Title_SlideBuilding a d
10、ata sharing app using Node.jsCreating a Project SkeletonYeoman generator for Delta Sharing npm i generator-delta-sharing A middle-rank servant during mid-14th century England,who owns/cultivates land Yeoman is a generic scaffolding system allowing the creation of any kind of app.Generate a Node.js or React.js sample app for Delta SharingBuilding a React AppSet application state using Delta Sharing APIsEvent TriggersetState()setState()setState()Follow for more!Learn the latest Lakehouse tips&