上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

ATscale:揭开语义层的神秘面纱-实现更智能、更快的AI和BI(英文版)(13页).pdf

编号:119052 PDF   DOCX 13页 9.82MB 下载积分:VIP专享
下载报告请您先登录!

ATscale:揭开语义层的神秘面纱-实现更智能、更快的AI和BI(英文版)(13页).pdf

1、Demystifyingthe Semantic LayerThe What,So What,and Now WhatA Perspective From Legendary Best-selling Author Prashanth Southekal,PhD,MBADr.Southekal has consulted for over 75 organizations including P&G,GE,Shell,Apple,and SAP.is the author of two books “Data for Business Performance”and Analytics Bes

2、t Practices”Wh i t e pap e rFOR SMARTER,FASTER AI AND BI2DEMYSTIFYING THE SEMANTIC LAYERThe What,So What,and Now WhatThe data economy is increasingly embraced worldwide in every industry.Data has enabled firms like Netflix,Facebook,Google and Uber to have a distinct competitive advantage.In 2021,the

3、 market capitalization of Amazon($1.7 Trillion),a data company,was more than the combined GDP(Gross Domestic Product)of two big G20 countries-Turkey($780 Billion)and Saudi Arabia($700 Billion).Companies that are data-driven demonstrate improved business performance.A report from MIT says digitally m

4、ature firms are 26%more profitable than their peers MIT,2013.McKinsey Global Institute indicates that data-driven organizations are 23 times more likely to acquire customers,six times as likely to retain customers,and 19 times more profitable Bokman et al.,2014.Overall,data and analytics,when deploy

5、ed at scale,can generate a 5%to 10%uplift in revenue and 3 to 6 percentage point increase in EBITDA margin CGT,2021.Today,every company is leveraging data and analytics for improved business performance.However,most organizations struggle to use data for improved business performance and one reason

6、is poor data quality.According to Experian Data Quality,a boutique data management company,inaccurate data affects the bottom line of 88%of organizations and impacts up to 12%of revenues Experian,2015.According to Mckinsey,an average user spends 2 hours a day looking for the right data.A report by H

7、arvard Business Review says just 3%of the data in a business enterprise meets quality standards.A joint study by IBM and Carnegie Melon University found that over 90%of the data in a company is unused Southekal,2020.All these studies point out that poor data quality affects the firms financial perfo

8、rmance,growth,reputation,and branding.But what is data quality?How do you define data quality?Data is of high quality if they are fit for use in operations,compliance,and decision-making,leveraging the 12 different data quality dimensions.According to IBM research,in the U.S.alone,businesses lose$3.

9、1 trillion annually due to poor data quality IBM,2020.3These data quality dimensions are based on good data definitions Southekal,2017.Unfortunately,many enterprises have challenges even in defining the data.Why?How?A data definition is a descriptor for the attributes(also known as features and the

10、labels in data science and machine learning)of the data object.A comprehensive and consistent data definition is Stanford,2022:Concise:Described succinctly and clearly.Precise:Described using unambiguous words when possible.Non-Circular:The term being described should not be used in the definition.D

11、istinct:Described so it differentiates this data element,data entity or concept from others.Unencumbered:The definition should not refer to a physical location or how it is created.1.2.3.4.5.Against this backdrop,this whitepaper is written as a reflection paper(WHAT?SO WHAT?NOW WHAT?)to thoroughly u

12、nderstand the data definition problem and guide the implementation of the solution.It is based on thoughts and analysis I have seen from a practical viewpoint.Specifically,this whitepaper looks at three main What,So What,Now What elements:WHAT is the problem?SO,WHAT is the impact of this problem?Fin

13、ally,NOW WHAT solves this problem.Let us start first the discussion by looking at the problem.What is the problem in defining data?Why does defining the data well matter for improved business performance?Data attributes can be defined from both technical and functional perspectives.The technical dat

14、a definition includes the format,type,length,etc.These are the metadata characteristics.However,the real problem is in defining the data object or the attributes in the data objects from a semantic or functional or business view because context plays a big role in how business users access,communica

15、te,interpret,and consume data,especially in a fast-paced distributed working environment.In todays big data world,this context is severely magnified due to the volume,velocity,and variety of data that is getting ingested into the IT systems.So,what is the business impact of poor data definitions?Why

16、 does semantically defining the data matter?Semantically defining the data is based on the context in which the business users consume the data to run the business based on objectives,questions,and metrics.This context in business can come in three main flavours-stakeholder views,value chain impact,

17、and business process differences.4Stakeholder views.Finance and procurement often have diverse views on managing vendor relationships.While procurement sees the vendor as a service provider,finance looks at the same vendor from the costing and budgeting perspective.A low payment term(say net 30 days

18、)is desirable for procurement as it is seen by the vendor as reward and recognition of this service.This improves the service levels of the vendor.However,this low payment term affects the cash flow which is often not supported by the finance department.So,are vendor payment terms a service element

19、or a cost element?Here is another example from the Retail industry.Marketing needs a good amount of inventory to serve the customers,but finance believes more inventory increases the carrying cost.So,is inventory bad or good?Who defines this?Value chain impact.Is the customer a prospect or an accoun

20、t(who pays for the invoice)?If a vendor gets paid for providing goods and services,can an employee be defined as a vendor given that the employee provides services and gets paid for the work?So,unless one defines the customer,the vendor,or the employee semantically based on their impact on the busin

21、ess value chain,there will be misunderstandings on the use of data.Business Process differences.Let us take an example of a financial services company.Is the start time for processing the credit application when the adjudicator receives the file or is it when the processing of the previous credit ap

22、plication is completed by the adjudicator?Unless the start time is clearly defined,there could be multiple interpretations of these start times.Another common example is using telephone and fax numbers to derive the jurisdiction and tax rates.While the telephone or fax numbers are not meant for tax

23、calculation,the business circumstance or even the limitations in the data model force the business to use the available resources.To address the above contextual and circumstantial constraints and issues,we need to clearly and holistically define the data-technically and functionally.Overall,while t

24、he technical or metadata aspects are relatively easy to define,the business or functional or semantic aspects are challenging as the definition is formulated based on business context.There are four main ways to handle this data definition problem:Master Data Management(MDM),Data Integration Methods

25、,Data Wrangling,and Semantic Layer.Lets quickly discuss these four solution options from the data definition perspective.1.2.3.Lets look at the impact of the semantic definition based on the above three main flavors using some common and simple business examples.5According to Gartner,MDM is a techno

26、logy-enabled discipline in which business and IT work together to ensure the uniformity,accuracy,stewardship,consistency,and accountability of the enterprises critical data assets Gartner,2022.These critical data assets could be customers,products,vendors,factories/plants,currencies,general ledgers,

27、and more.The goal of MDM is to provide a trusted,single version of the truth(SVOT)so organizations do not use multiple and inconsistent versions and definitions of the same data in different systems.The MDM initiative starts early in the data lifecycle(DLC)and includes defining the data,formulating

28、the business rules,setting up the workflows,roles mapping,formulating the governance policies,processes,procedures,standards,nomenclature,taxonomies and so on.The second possible solution to fix the inconsistent versions and definitions of the data in different IT systems is with data integration to

29、ols.The data integration process(such as EAi,ESB,Message Queue and so on)happens in the DLC.The selection of these data integration tools and practices to address inconsistent data definitions is based on three key factors.Data Wrangling,especially cleansing the data in the canonical system like the

30、 data warehouse or data mart is also a potential option.Technically Data Wrangling is formatting,de-duping,renaming,correcting,improving accuracy,populating empty data attributes,aggregating,blending and any other data remediation activities that help to improve the data quality.Most of the data cle

31、aning work is manual,even though stored procedures(set of SOL statements reused and shared)and automated routines are often used to support this manual labour.The fourth option to fix the inconsistent data definitions is using the Semantic Layer.A Semantic Layer is a business representation of data

32、that helps users access data using common business terms.A Semantic Layer maps business data into familiar business terms to offer a unified,consolidated view of data across the organization.Implementing the Semantic Layer process happens in the end of the DLC and is generally considered as part of

33、last mile analytics-the key piece that connects insights to business results.In simple words,the Semantic Layer creates the context for actionable analytics.Capabilities of APls(REST,SOAP,RPC,GraphQL and more)and their request-response dependencies.Number of transactional systems in scope with incon

34、sistent data definitions that need to be integrated.Sequence of Transfer,Transpose and Orchestration(TTO)in the data integration process.1.2.3.6All these four solutions(MDM,Data Integration,Data Wrangling,and Semantic Layer)that can help in fixing the inconsistent data definitions depend on data map

35、ping.The data mapping creates data element linkages between data attributes in two distinct data models.Overall,the MDM and Data Integration methods are more suitable for compliance and operations.But if the use case is on deriving insights,then the Semantic Layer is an attractive option.In terms of

36、 data and analytics,the Semantic Layer manages the relationships between the various data attributes in the database to create a simple and unified business view that can be used for querying and deriving insights.But more importantly,each of these four methods depends on specific use cases and the

37、control one needs on data quality in the data lifecycle(DLC).A simplified and generic DLC is shown below.This brings us to the third part of the whitepaper.Now,what is the solution from the data and analytics perspective?Specifically,how to implement the Semantic Layer?Implementing the Semantic Laye

38、r requires some preparation and leadership.As Bill Gates,Microsofts founder,said-The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency.The second is that automation applied to an inefficient operation will magnify the ine

39、fficiency.Against this backdrop,how can an organization prepare itself for the successful implementation of the semantic layer platform?Figure 1:Simplified and Generic DLCData LiteracyCompetenciesStatisticalModelsDataStewardshipDataStorytellingDataArchitectureDataEngineeringDataAcquisition3DMMDMData

40、GovernanceDataEthics7Step 1:Identify the use cases.If the data is in one system format,you do not need the Semantic Layer.However,that is rarely the case in most enterprises today,given the variety,volume,and velocity in capturing and ingesting data into the data landscape.For example,one client,a l

41、arge and global engineering conglomerate,has 17 systems.The Semantic Layer is effective if the data is distributed in multiple systems(in diverse types and formats).This is because the distributed landscape with diverse data models often creates a situation of multiple data definitions.If there are

42、data silos in the company with multiple definitions for the same data object,then the Semantic Layer is a strong solution on the table.While most use cases describe the systems needs,meaningful use cases also identify the problem or opportunity owner,potential risks,and the business benefits in mone

43、tary terms.Also critical to success is active subject matter expert(SME)engagement to ensure proper representation of the business knowledge and understanding/use of the data.Step 2:Identify the business KPIs and the Ownership.Every meaningful initiative starts with a purpose that can be objectively

44、 measured and owned.Management guru Peter Drucker once said-You cannot manage what you cannot measure.When it comes to ownership the selection of the business KPI(Key Performance Indicator)is based on the strategy and business objective.For instance,if the business objective is to improve the firms

45、liquidity,it is prudent to have the cash conversion cycle(CCC)as one of the KPIs.Also,it is always advisable to have a leader very close to the business and data to own the KPI.For instance,to reduce the inventory carrying cost,it is better to assign the KPI ownership to the Sales manager than to th

46、e Finance manager.This is because the sales manager has more variables under his control such as demand variability,forecast accuracy,service levels,order sizes,etc.The granularity of the insights from the KPI also matters.If the KPI owner is a C-level executive,the KPI will be very different from t

47、hat of a manager.Once we have the KPls and ownership identified,the data objects must be identified.This process will help us define the data from the right stakeholder views.For example,if the KPls are focused on reducing expenses,then the definition of data from the finance is more important than

48、that of marketing with inventory management.Step 3:Build data literacy in the Enterprise.Taking ownership of any initiative for success requires a strong commitment and one effective way to bring data ownership is with good education or awareness.Data literacy is the ability to understand and commun

49、icate data and insights.Data literacy is to the 21st century what literacy was in the past century given that over 93%of the high-value business process today are digital and data-centric Hurst,2018.The digitization and data capture rate will continue to grow in the coming years.The figure below is

50、the 10 key data literacy competencies.8Step 4:Define the data attributes.With the above four steps,one can define the data attributes technically and semantically.The technical data definition includes information such as format,type,length,etc.These are the metadata characteristics.The semantic or

51、functional view defines the data attributes from a business viewpoint,which is very challenging.Given the context can come from the KPls and the ownership,defining the data attribute from the functional or semantic perspective at this stage should not be very difficult.Step 5:Implement the Semantic

52、Layer PlatformWith the strong foundation built in the first four steps,you are now ready to deploy the Semantic Layer platform.The Semantic Layer platform links the analytics consumption platform with the data platforms using the facts(data values),dimensions(data attributes)and hierarchies(i.e.taxo

53、nomies)in the Data Warehouse(DWH)or any other canonical data platforms such as the data lakes or data marts or lake houses.The consumption or analytics tools can be Power Bl,Tableau,Python,Business Objects.Looker,Jupyter Notebook,and even Microsoft Excel.The queries from the business users could be

54、in SOL,DAX,MDX,etc.,using the tool-specific native protocols such as XMLA,JDBC,ODBC,SOAP,and REST interfaces.By abstracting the physical form and location of data,the Semantic Layer platform makes data stored in the canonical data platforms accessible with one consistent and secure interface for the

55、 business users.Figure 2:Data Literacy CompetenciesData CaptureData IntegrationData ScienceDecision ScienceExternalFeedsDataInsightsEAIETLTransactionsystem RPA BOTSSemanticlayerBI and AnalyticsSystemFocus on Data QualityMDMDataWarehouse9So,how does the end state with the Semantic Layer look once imp

56、lemented?A holistic SL platform meets these five features:connect to any data source,support modeling,governance,security,and performance Thuma,2019.Again,even if there is real-time data ingestion(say from Kafka Pubsub)or batch data ingestion(say from files),the Semantic Layer is a viable and strong

57、 solution only when multiple data definitions exist.Essentially,the Semantic Layer works as middleware between the data sources and the analytics platforms by providing virtualized connectivity,modelling,and other data management features.As all the data required to derive insights from analytics da

58、ta is filtered through the Semantic Layer,the data scientists and the business users see the same data in one consistent way resulting in a single version of the truth with the same measures and dimensions.In his backdrop,below are five key value propositions or reasons for the business to implement

59、 the Semantic Layer.Value#1:Democratization of Data Analytics and Machine Learning(ML)As data analytics have spread more within organizations,relying on one monolithic Bl(Business Intelligence)or ML(Machine Learning)platform to meet everyones needs is becoming less realistic.A Semantic Layer platfor

60、m is needed to connect and work with diverse data platforms,protocols and consumption tools.This will decouple the data from consumption,enabling the democratization of data analytics and ML in the enterprise.Value#2:Seamless Model development and Sharing While Data scientists rely on raw and granul

61、ar data for deriving insights from their models,this raw data has little business value from the data and analytics perspective.Businesses need insights to make decisions and not the raw data per se.But adding a data model to the raw data makes it very valuable because data models create a visual de

62、scription of the business for analyzing,understanding,and clarifying the data and the associated relationships The Semantic Layer,with its data modeling capabilities,enables easy authoring,sharing,and collaborating of data models and insights.Value#3:Improved query performance and reduced computing

63、costsThe limited scalability and the higher up-gradation costs of on-premise data warehouses are forcing companies to leverage the power of the cloud to offer enhanced scalability,flexibility,and elasticity.While cloud computing,including cloud data warehouses,offers many benefits,these benefits com

64、e at the expense of performance and costs.We have often heard stories like the$50,000 query in the cloud Lynch,2020.A good Semantic Layer platform includes a comprehensive performance management system beyond simple caching techniques in todays big data environment.At the core,the Semantic Layer fac

65、ilitates improved query performance(and faster time to insights)and reduced computing costs.10Value#4:Reduced Data Cleaning EffortStudies have shown that over 70%of the effort in data and analytics projects is on data cleansing Southekal,2020.A common and consistent data definition using the governa

66、nce-enabled Semantic Layer will help business analysts,data analysts,and data scientists have the same definition and context on the data.In addition,the Semantic Layer offers pre-built controls for managing data access,integration and feature creation.All this will not only reduce the data cleaning

67、 effort but will also produce reliable insights.In addition,the Semantic Layer provides a logical schema with views,stored procedures,functions,and more.Value#5:Better Security and GovernanceAs the Semantic Layer sits between the data platform and the analytics tools,it secures the digital infrastru

68、ctures with the right levels of authentication and authorization.The Semantic Layer can authenticate users with single sign-on solutions through Active Directory,LDAP(Lightweight Directory Access Protocol),OAuth,or any other user authentication platforms.Secondly,the semantic layer offers R BAC(Role

69、-Based Access Control),including the ability to protect sensitive data attributes,limit data access as per users business roles,and more.Using the Semantic Layer for creating the context and deriving insights from data analytics is promising.To remain competitive in todays market,Toyota,the multinat

70、ional automotive manufacturer,empowered its teams to work with data and analytics more independently using AtScales Semantic Layer platform.Toyota has achieved a 2100%reduction in the insights derivation cycle time and reduced the IT infrastructure by over 60%using the Semantic Layer platform.Home D

71、epot,the largest home improvement retailer in the United States and Canada,deployed the Semantic Layer solution from AtScale by working directly in Googles Memory cloud data warehouse,i.e.BigQuery and reduced the cost of a query by 91%.And the company realized efficiencies to increase data retention

72、 from 3 months to 3 years(a 1200%increase).This enabled Home Depot to support over 17,000 queries per day executed by internal and external users on a real-time basis.In addition,companies like Cardinal Health(health care services company),Wayfair(e-commerce furniture company),Tyson Foods(a food com

73、pany)and many more have implemented the Semantic Layer with minimal disruption to their teams working style while accelerating their efforts to derive insights for better business results at a much lower cost.A generic system architecture with the AtScale Semantic Layer is shown below.11Figure 3:Sem

74、antic Layer Based System ArchitectureThough enterprises have been using Semantic Layer tools to manage data for a long time,the data landscape has changed significantly in the last few years due to the increased adoption of big data,cloud data warehouses,self-serve analytics,and more.Companies need

75、quicker and better insights in todays VUCA(volatility,uncertainty,complexity,and ambiguity)world of sudden and unpredictable change.Sadly,many of these companies have deployed numerous data and analytics solutions across diverse cloud and on-prem data platforms,resulting in data and insight silos.In

76、 addition,this distributed set-up has created challenges in data quality,literacy,adoption,and ultimately,business performance.The Semantic Layer makes data accessible to business users while hiding the complexities with data definition,manipulation,reading,and mapping.The Semantic Layer creates act

77、ionable data!Building the Semantic Layer consists of many solutions,ranging from the organizational data itself to data models that support object or context-oriented design,semantic standards to guide machine understanding,and tools and technologies to enable and facilitate implementation and scale

78、 for interoperability and governance Tesfaye,2020.But once built,business users can access the data as per the business terminology.This will reduce the complexity/costs,improve security,and accelerate and streamline reporting for the business users in todays complex data environments.Importantly al

79、l these can happen using the data and analytics tools the users already have expertise in.This will ultimately increase the odds of better analytics adoption and improved business performance.Source SystemsSource SystemsSourceSystemsETL ToolsData TablesAggregatesData LakeS3XMLAJDBCODBCRESTSpark/Hive

80、SpectrumMDXPythonDAXSQLData WarehousePlatformSemantic Layer12References?Bokman,Alec;Fiedler,Ars,Perrey,Jesko;Pickersgill,Andrew,Five facts:How customer analytics boosts corporate performance,https:/mck.co/2Ju0 xYo,Jul 201?CGT,Learn How Tyson Foods Appetite for Data is Customer-Driven,https:/ Sept 20

81、21?Experian,Is Dirty Data Costing you?,https:/www.xperience- Glossary,Master Data Management(MDM),https:/ technology/glossary/master-data-management-mdm,Feb 202?Hurst,Heather,5 Systems of Record Every Modern Enterprise Needs,https:/ 201?IBM,Spreadsheets vs.Watson Studio Desktop,IBM Research,Jan 2020

82、?Lynch,Christopher,How to Avoid the Not So Mythical$50,000 Query in the Cloud,https:/ 2020?MIT,Digitally Mature Firms are 26%More Profitable Than Their Peers,https:/bit.ly/2xBTPNe,Aug 2013?Southekal,Prashanth,Data for Business Performance,Technics Publications,April 201?Southekal,Prashanth,Analytics

83、 Best Practices,Technics Publications,April 202?Stanford,Data Definitions Best Practices,http:/web.stanford.edu/dept/pres-provost/cgi-bin/dg/wordpress/,202?Tesfaye,Lulit,What is a Semantic Architecture and How do I Build One?,https:/enterprise 202?Thuma,John,Five Things That Make a Great Universal S

84、emantic Layer,https:/ 201913Prashanth Southekal,PhD,MBAPrashanth Southekal is the Managing Principal of DBP Institute(www.dbp ),a data and analytics consulting and education firm.He is a Consultant,Author,and Professor.He has consulted for over 75 organizations including P&G,GE,Shell,Apple,and SAP.D

85、r.Southekal is the author of two books-Data for Business Performance and Analytics Best Practices-and writes regularly on data,analytics,and machine learning in F,FP&A Trends,and CFO.University.Apart from his consulting pursuits,he has trained over 3,000 professionals worldwide in Data and Analytics

86、.Dr.Southekal is also an Adjunct Professor of Data and Analytics at IE Business School(Madrid,Spain).COO Magazine included him in the top 75 global academic data leaders of 2022.He holds a Ph.D.from ESC Lille(FR)and an MBA from Kellogg School of Management(U.S.).He lives in Calgary,Canada with his wife,two children,and a high-energy Goldendoodle dog.Outside work,he loves juggling and cricket.

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(ATscale:揭开语义层的神秘面纱-实现更智能、更快的AI和BI(英文版)(13页).pdf)为本站 (无糖拿铁) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部