上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

3-1 通过 dbt 把软件开发的最佳实践带到数据领域.pdf

编号:101819 PDF 27页 2.06MB 下载积分:VIP专享
下载报告请您先登录!

3-1 通过 dbt 把软件开发的最佳实践带到数据领域.pdf

1、通通过过 dbt 把把软软件开件开发发的最佳的最佳实实践践带带到到数据数据领领域域Chenyu Li,Sr Software Engineer,dbt Labs因为dbt还是一个比较新的产品,社区也主要集中在美国欧洲,很多材料并没有中文翻译,我会尽量用中文讲解,做的不好的地方还请大家见谅。传统传统数据分析中的流程数据分析中的流程问题问题云原生数云原生数仓带仓带来的机会来的机会dbt 想要提供的解决方案想要提供的解决方案以前的数据以前的数据仓库仓库非常昂非常昂贵贵Legacy E-T-L数据转换发生在 数据存数据存储层储层之外之外基础设施的管理是一份全全职职工作工作数据分析任数据分析任务务分散在

2、分散在 工程师,数据分析师和 stakeholder中间451.工程师成为 每一个更改每一个更改 的瓶颈2.从从头创头创建建 比查找现有代码更简单 3.不可追踪的不可追踪的变变更流程更流程 破坏数据pipeline,降低大家对数据的信任.传统数据分析中的流程问题云原生数云原生数仓带仓带来的机会来的机会云原生数云原生数仓仓降低成本,并且更容易使用降低成本,并且更容易使用弹性存储和计算使得 数数仓仓内内转换变转换变得得可行可行.云原生架构 减少了基减少了基础设础设施的管理施的管理工程师,分析师可以更加专注在 高回高回报报的任的任务务上,比如上,比如优优化和构建数据化和构建数据转换转换流程流程这这些

3、些变变化化给给数据工作流程数据工作流程创创新提供了新提供了可能性,可能性,dbt在在这样这样的契机下,提出了的契机下,提出了自己的一套解决方案自己的一套解决方案Modern E-L-T7dbt 想要提供的解决方案想要提供的解决方案8模块化 可测试持续集成有文档 数据分析流程更快更稳定的更新The dbt viewpoint:Build data like developers build applications9how dbt want data teams work together1.Enable anyone who knows SQL to quickly build and tes

4、t data2.Use version control to update once and deploy everywhere 3.Provide documentation tool and auto-refreshing lineagestg_ordersordersselect*from ref(stg_orders)where is_deleted=false-orders.sqlcreate table analytics.dev.orders as(select*from analytics.dev.stg_orderswhere is_deleted=false);Runs i

5、n the warehouseThis isnt anything new,its how every high-quality software project is run.You expect there to be tests.You expect there to be documentation.You expect the PR process to be collaborative.Youre building software together.Were just applying this to analytics code as well.11DevelopDocumen

6、tTestDeploy IDE or CLI Modular SQL No DDL/DML Pre-built packages Dependency management Auto-generate DAG Auto-updated docs Schema tests Data value testing Pre-packaged tests for complex logic Job scheduling CI/CD Version control Logging&alertingA centralized environment for collaborative development

7、12Develop IDE or CLI Modular SQL No DDL/DML Pre-built packagesA centralized environment for collaborative development13Develop faster with SELECT statements(declarative)Express business logic in SQL Includes several materializations Table View Incremental snapshotselect*from analytics.dev.stg_orders

8、where is_deleted=false-orders.sqlcreate table analytics.dev.orders as(select*from analytics.dev.stg_orderswhere is_deleted=false);Runs in the warehouse14Develop faster without having to think about run orderRun the same code in dev,test and prod the correct schema is resolved for youDependencies bui

9、lt automatically so you can focus on modeling,not run orderstg_ordersorders15select*from ref(stg_orders)where is_deleted=false-orders.sqlcreate table analytics.dev.orders as(select*from analytics.dev.stg_orderswhere is_deleted=false);Runs in the warehouseDevelop faster without having to think about

10、run orderRun the same code in dev,test and prod the correct schema is resolved for youDependencies built automatically so you can focus on modeling,not run orderstg_ordersorders16select*from ref(stg_orders)where is_deleted=false-orders.sqlcreate table analytics.prod.orders as(select*from analytics.p

11、rod.stg_orderswhere is_deleted=false);Runs in the warehouseMacrosA sandbox environment to execute user logicAbstract snippets of SQL into reusable macros these are analogous to functions in most programming languages.Use control structures(e.g.if statements and for loops)in SQLUse environment variab

12、les in your dbt project for production deploymentsOperate on the results of one query to generate another query17Apply industry standard code to your project Check out the dbt Packages HubAkin to python librariesGet to focusing on unique business logic rather than implementing something people have

13、already solved forTypes of packages:Transforming data from a structured SaaS datasetWriting dbt macros to answer“How do I do this in SQL?”(i.e.Dbt_utils.equal_rowcount,date conversion)Auditing&Testing 18Use packages to skip boilerplate codeDevelopDocument IDE or CLI Modular SQL No DDL/DML Pre-built

14、packages Dependency management Auto-generate DAG Auto-updated docsA centralized environment for collaborative development19Maintain shared understanding with auto-updating lineage2021DevelopDocumentTest IDE or CLI Modular SQL No DDL/DML Pre-built packages Dependency management Auto-generate DAG Auto

15、-updated docs Schema tests Data value testing Pre-packaged tests for complex logicA centralized environment for collaborative development22Test assumptions about data,and the validity of transformationsCustom+out of the box tests including:UniquenessNull valuesCertain valuesIs a valid foreign key to

16、 another table23Preserve quality by testing in-lineDevelopDocumentTestDeploy IDE or CLI Modular SQL No DDL/DML Pre-built packages Dependency management Auto-generate DAG Auto-updated docs Schema tests Data value testing Pre-packaged tests for complex logic Job scheduling CI/CD Version control Loggin

17、g&alertingA centralized environment for collaborative development24Deploy seamlessly with version control and CI/CDVersion ControlIntegrate with git provider of choiceContinuous IntegrationContinuous DeploymentMinimize wasteful runs by testing only changesJob scheduling and alertingLogging&Alerting2526Thank you!Questions?

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(3-1 通过 dbt 把软件开发的最佳实践带到数据领域.pdf)为本站 (云闲) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部