1、Democratizing Analytics in the Cloud-first Enterprise Using a Semantic LayerWh i t e pap e rBy Kirk BorneKirk has been an influential globally recognized leader in the data science space for 20 years.His areas of passion and focus include Big Data&Data Science,Artificial Intelligence(AI),and Astroph

2、ysics.Kirk is also the co-creator of the field of Astroinformatics.2Data Then,and Data NowData Democratization and its ChallengesIf you are already familiar with databases,BI(business intelligence),and even data mining,you might be thinking:we have had these things for many years,so whats all the fu

3、ss about right now with data analytics,data science,machine learning,and AI?The short answer to that question is this:data science(DS)and machine learning(ML)are hot right now due to the enormous growth in data volumes and data variety in recent years,and due to the significant increase in the capac

4、ity of those techniques to reveal significant new discoveries and insights that deliver new value(and opportunities)to organizations.A longer answer to our question would cover these emergent perspectives:DS/ML now engages three significantly expanded,improved,and mature technologies:?Ubiquitous dat

5、a sources(including sensors,networks,logs,mobile apps,etc.);?Greater access to more powerful computing(including virtual on-demand distributed computing with the Cloud);?Sophisticated and more powerful machine learning algorithms for insights discovery from data.Despite such remarkable advancements,

6、there are a few hurdles that stand in the way of democratizing their benefits across the entire enterprise.These truths might be self-evident,but we will review them here.Several significant truths that have become very clear in the data analytics sphere in recent years include the primary truth(our

7、 first“T”)that every business is(or should be)a digital business.That is,organizations are(or should be)in the“insights discovery from data”business because nearly every workflow,process,market intelligence,and customer interaction is digital,and most organizations are drenched(if not drowning)in da

8、ta.That big data challenge can be a big problem if insights discovery initiatives are not empowered across the organization.As one person(i.e.,me)put it:“what we have here is more information,not more insights”,with the added challenge of how to provide convenient and understandable access to the di

9、scovered insights to the right business users at the right time.3Next,there exists a vast and confusing assortment of tools,techniques,and technologies available in the market,often leading to analysis paralysis among business decision-makers.Theyre asking,“which Tools,Techniques,and Technologies sh

10、ould I be using for my business use case?”(Thats three more Ts!)Then,theres perhaps the biggest“T”challenge:Talent!That challenge manifests itself when we hear questions like this in organizations:?Where do we find(and how do we afford)the talent to do this data analytics,data science,machine learni

11、ng,and AI stuff?How do we deploy and use such talent within my organization?How do we retain the talent?How do we reskill or upskill existing talent?One approach to addressing most of these challenging truths is data democratization.Initially that means encouraging and engaging a broader swath of yo

12、ur workforce in the day-to-day data-intensive activities of the business.Data democratization can(and should)include data literacy and data fluency training of your workforce.But data democratization must go further:Last,but not least,data democratization also means that actionable insights from dat

13、a need to be produced,shared,and used by different business teams within their natural business applications and workflows.We will learn later that a solution to these challenges is a semantic layer that overlays the data layer that each business user accesses.That semantic layer presents the insigh

14、ts learned from data in the language and environment that each business user is accustomed to using,thereby enabling data scientists to publish their predictions,advanced analytics results,and most significant learned patterns and features in the data directly into the business analysts BI tools of

15、choice.Such a capability consequently empowers the non-data scientist business users to do more with data,while also empowering the data scientists to have a more direct business impact.With a semantic layer,the data scientists have direct access to the KPIs and metrics definitions that the business

16、 cares about,so that they can not only focus their DS/ML projects on driving critical business outcomes,but they can also publish their model results(e.g.,predictions,recommended/optimal actions,emergent patterns,or discovered anomalies)into the business analysts and BI users Metrics Store.Data demo

17、cratization can(and should)also include the adoption of tools,techniques,and technologies that are relatively easy for your business users(not only the data scientists)to learn,understand,and use on a routine basis within each persons assigned business functional area.4There is still another busines

18、s truth(another“T”)that can stand in the way of progress with data democratization and deployments of analytics,data science,machine learning,and AI.That is“technical debt”,which refers to the implied cost of upgrading,revising,or otherwise expanding your existing tools and technologies that may alr

19、eady be close to obsolete but for which significant corporate investment has already been made,including existing databases.It is easy to argue that we should not buy new technologies(or pay for new training initiatives)when we already have sunk costs in existing technologies and training.Thats a ve

20、ry understandable argument,especially in tight economic times.However,such thinking can prevent some quick wins from manageable investments.We will discuss here some ways that data democratization can be deployed and deliver significant value without significant disruption to the existing workforce

21、or technology infrastructure of your organization.In the end,our business analytics scorecard should avoid the four Fs:Cultural Challenges and ImplicationsfearfrictionfragilityPhantom(“Fantom”)Analytics“Will people lose their jobs to the new automated data science tools and technologies?”“How do we

22、integrate the data science process within our employees existing workflows without sacrificing productivity?”“What happens if our data science experts leave,and who will then carry the responsibility to do these complicated things?”Doing busy work with data that makes your organization superficially

23、 look good but actually produces no significant business value this is usually motivated by FOMO and hype-driven reactions to market trends.5Data literacy begins to grow by emphasizing that data is everywhere:in our hands(our smartphones),in our online interactions,in our documents,in our customer s

24、ervice interactions,in our logs(network,purchase,shipping,revenues,etc.),and everywhere else.Data literacy includes an understanding that data has many modalities:numbers,databases,documents,images,social networks,voice messages,customer interactions,logs,time series,press releases,and basically any

25、thing else that you can see,read,hear,summarize,and record.Data fluency grows by emphasizing that data analytics(and its affiliate techniques of data science,machine learning,and AI)are essentially replicating what we already do naturally as human beings:?pattern detection(“what is that”?pattern dis

26、covery(“is this new or different from what I have seen before?”?pattern recognition(“how can I describe it so that others can recognize it if we see it again?”?pattern exploration(“what aspects of this make it uniquely interesting or meaningful?”?pattern exploitation(“what key insights have I gained

27、 from this and how do we generate business value from those insights?”).You could say that the end-goal of all this is“decision intelligence”what is the best action to take that will deliver the best business outcomes based on what we see in the data right now,right here,in this context?Data for All

28、Where do we begin?Like all important aspects of life,we need to start with“having the talk.”Okay,got it?Got it!But what is“the talk”?In this case,it means that we need:?to develop and grow data literacy in the organization by increasing awareness,familiarity,and understanding of data;and?to develop

29、and grow data fluency by enabling and improving communications with data and insights discovery from data across all business users and business functional areas.My experience in teaching these things to very general audiences is that a common response from those learning this stuff is:“I learned th

30、at data is not a 4-letter word.”Thats a little bit of humor,basically meaning that we shouldnt fear it or think of data as something that only certain people can talk about.Data can be used by everyone for business insights discovery.6While all of that may sound overwhelming initially,it can be summ

31、arized in different ways that are best suited to your workforce.For example,there are basically four types of pattern discovery:Class Discovery:Find the categories(groups,population segments,and sub-segments)of things(objects,events,and behaviors)in your data;plus learn the rules that constrain the

32、different class boundaries(that uniquely distinguish the segments).Correlation(Predictive and Prescriptive Power)Discovery:Find the trends,relationships,and dependencies in data that reveal the governing principles,causal connections,and/or behavioral patterns of things(their“DNA”).Outlier/Anomaly/N

33、ovelty/Surprise Discovery:Find the new,surprising,unexpected one-in-a-million or billion or trillion object,event,or behavior in the data.Association(or Link)Discovery:(Graph and Network Analytics)Find both the usual and the unusual(interesting)data associations,links,or connections across the entit

34、ies in your domain.“Connect the dots that arent connected.”7Another way of explaining the business goals of these data democratization activities is to place them within the five dimensions of business analytics outcomes that can be achieved:The Five Dimensions of Data AnalyticsDiagnostic Analytics

35、refers to oversight(real-time reporting and analysis)of your domain through your data.Predictive Analytics refers to obtaining and applying foresight from your data(DS/ML models for predicting business outcomes).Cognitive Analytics refers to delivering the“right sight”from your data:i.e.,learning th

36、e right question to ask your data,at the righttime,at the right place,for theright customer(or theright event),inthe rightcontext.Prescriptive Analytics refersto obtaining and applying insight from your data(DS/ML decision optimization models,for proactively achieving optimal business outcomes).Desc

37、riptive Analytics refers to hindsight(historical reporting and analysis)on your domain from your data.1)A customer-facing services organization was trying to find an early warning signal of customer attrition to identify those customers who were on the verge of taking their business elsewhere.From t

38、his insight discovery,the business customer care team would reach out to those customers with a little bit of extra customer care.Four Examples of Democratized Analytics8The data science team could have built a large data infrastructure,full of esoteric predictive analytics algorithms and ensembles

39、of models.Instead,the DS team tried a simple approach first.They decided to look at web logs(website usage histories)of those customers who have already left,compared with a control set of customers who have not left.They found a simple signal in the data and built a model from that.It was a very si

40、mple,yet very successful model.What was the simple model?The team simply counted how many times the customer visited and clicked on their accounts webpages in the month preceding their departure.These customers had a much higher web visit rate in that month compared to their baseline usage in preced

41、ing months.So,the DS team implemented a simple alert algorithm.When the algorithm alerted a customer service agent that a customer was in the high-click-rate category,the agent simply reached out to the customer and offered some“just-in-time”advice,information,and other assistance related to their a

42、ccounts.The customer retention rate rose,the attrition rate dropped,and the simple“count the web clicks”model was a huge success.Everyone in the business could understand the data and the model,and they were all encouraged to pay attention to additional“customer signals”(patterns)in future data.2)A

43、mobile phone services provider wished to cross-sell their traveling customers with a mobile roaming package for their international travels.They specifically wanted to do this before the customer leaves the country,thus avoiding a situation where the customer purchases the roaming package from a com

44、peting service provider elsewhere.The business teams worked together with the DS team to identify a solution:the target customers could be identified when they were at the international departure terminal of an airport,simply from the GPS location data that the phone automatically provides via the c

45、ellular network.The mobile provider was able to offer a discount on a“just-in-time”roaming package to those international-bound travelers prior to their departure.A high percentage of the customers responded and accepted the offer.The simple location-based model,based on metadata that was already wi

46、thin the mobile device signal packet was a huge win for the company,with almost zero additional promotional or marketing costs.The business teams were encouraged to bring new services and experiences to customers based on other meaningful patterns in readily accessible customer data.93)Streaming dat

47、a from sensors in engines usually have discrete modes of behavior,such as specific frequencies and ranges for the different sensor readings.When one of these frequencies,ranges or mean values for the sensor readings suddenly changes or begins to drift unexplainably,then that could be an early warnin

48、g sign of an undesirable outcome like engine failure or engine malfunction.Industry data scientists are using deviations in simple statistical metrics(mean,median,variance,skew,emergent frequencies)computed from streaming digital signals as a prompt for the engineering teams to schedule“just-in-time

49、”prescriptive maintenance on the affected systems.These industries are saving money in two ways from such a simple model:?servicing a component before an undesirable event occur?significantly reducing the amount of scheduled preventive maintenance for components that are working just fine and do not

50、 need servicing.The relevant statistical metrics were understandable to all and readily computable within standard spreadsheets and other BI tools,thus inspiring other business analysts to search for similar“early warning signs”in other business data streams.4)An electronics retail store chain many

51、years ago was selling the hot items of the day:video cassette players/recorders(VCR)and camcorders.Thats obsolete technology these days,but not 20+years ago.The store tried upselling a camcorder to a customer when the customer bought a VCR,but the response to the stores offer was weak,at best.Some o

52、f the stores business analysts then looked at their customer data from a different but simple perspective.They looked at the association of VCR purchases with future camcorder purchases.(Note:association mining is a common DS technique,listed earlier as our fourth type of pattern discovery from data

53、.)The analysts discovered that some of the customers who bought a VCR came back several months later to buy a camcorder.The likely explanation was that the customers realized that they could make their own home movies to show on the family VCR.That business explanation meshed very clearly(and reassu

54、ringly)with the time-shifted association mining DS approach.Consequently,the store began sending discount coupons to customers 46 months after their VCR purchase to capture the customers attention“just-in-time”as they were starting to consider a camcorder purchase.It worked!A significantly higher pe

55、rcentage of customers accepted the offer,returned to the store to make the purchase,and the time-shifted marketing campaign was a great success.10You can see several recurring themes in all these examples?They all had clearly understandable business goals as the initial requirement for the analytics

56、 task?They all used easily accessible data?They used“small data”(not massive quantities or complex varieties of data)?They achieved business value and good ROI from simple models?The models were simple enough that they could easily be adjusted and updated if the initial implementations failed(which

57、means they followed a“fail fast in order to learn fast”data analytics strategy,which represents a major cultural milestone for any organization seeking to achieve success in both DS and data democratization initiatives)?They all enabled“just-in-time”actions,which was made possible by simple and easy

58、-to-understand data,models,implementations,and ROI metrics?The DS/ML objectives here are understandable to all business partners:to detect existing,emerging,and actionable patterns in data:(a)segments(classes),(b)trends(correlations),(c)surprises(anomalies,outliers),and(d)linked entities and events(

59、co-occurring associations)?The DS/ML results were shared with business analysts.Ideally the results would have been automatically communicated and made readily accessible within the analysts existing BI tools and spreadsheets,but maybe that didnt happen in these cases since the semantic layer was no

60、t available then.The most important lesson is this:achieving the best value from data analytics is not about fancier algorithms.But what is“best”anyway?Faster?Cheaper?Explainable?Usable?Higher revenue?Yes,“best”is a metric that represents some business-specific combination of all those characteristi

61、cs of the solution.In any case,it is really about ease of use and ease of explainability,both of which deliver trusted outcomes in a timely manner.If you are looking to deploy fast,learn fast,and earn fast with data analytics,then a culture of experimentation,data democratization,and democratized an

62、alytics are key ingredients to success.That means enabling cooperation,collaboration,and communication between DS teams and business analyst teams.And thats the benefit and justification for going with a semantic layer solution.Lessons Learned from Data Democratization Examples11Once a culture of da

63、ta democratization and insights discovery has been established and begins producing results across the enterprise,you will find that your business analytics scorecard has conquered and reversed the four Fs:The Business Analytics ScorecardfearfrictionfragilityPhantom(“Fantom”)AnalyticsExisting staff

64、continue their normal functions,now with enhanced data literacy and data fluency(i.e.,“data guru”)capabilities.Existing workflows are not disrupted as all activities(data access,data exploration,analytics tasks,and delivering predictive model results)are all performed seamlessly within the database(

65、or other standard business tools)using standard SQL that most business users are familiar with.The responsibility and success for analytics tasks does not depend only on specialists or“elite teams”of data scientists.Everyone wins!Business users are proactively discovering actionable insights from da

66、ta and producing real business value.conquered!A better scorecard for your business is the STELLAR Analytics Scorecard!STELLAR analytics can boost analytics performance from early-stage sandbox experiments to late-stage enterprise projects.The key is to get moving,keep moving,and to accelerate forwa

67、rd progress.If the inhibitors(fear,fragility,and friction)stand in the way of analytics progress,then the project will likely receive a failing grade of F.A better grade is not just an A,but a stellar A(like in school),which is enabled by STELLAR analytics.12The areas associated with this analytics

68、framework include Streaming,Team,Edge,Location,Learning Business System,Agile,and Related-Entity Analytics.Here we briefly define each of these STELLAR characteristics:Streaming Data Analytics(S)Real-time access to,interaction with,and discovery from data,such as detecting POI(persons,patterns,produ

69、cts,processes,or points of interest)and BOI(behaviors of interest for any“dynamic actor”).Team Analytics(T)A culture of experimentation that celebrates and validates the power in diversification,collaboration,data-sharing,data reuse,and data democratization.Edge Analytics(E)Locality in time,at the m

70、oment of data collection(enabled by streaming data from real-time sensors across the business domain)“What else is happening now?”Location Analytics(L)Locality in geospace,within a given spatial context(also enabled by location-aware sensors)“What else is happening at that place?”Learning Business S

71、ystem(L)A learning business system embodies data-driven knowledge-generation business practices,with performance measurement,continuous feedback,learning,and improvement,which are embedded in daily business practice(example:).Learning Health SystemsAgile Analytics(A)Outcomes-driven,iterative,builds

72、proofs-of-value,fails fast to learn fast,thinks big,but starts small(with the Minimum Viable Product or Minimum Lovable Product)with continuous integration and delivery.Related-Entity Analytics(R)Locality in data feature space “What else is like this entity or event?”13As we prepare to dive deeper i

73、nto the meaning,characteristics,and business value of the semantic layer,we will examine a couple more of its features.One of the key features of any semantics is data annotation,data labeling,or smart metadata.This semantic labeling ascribes terms,labels,and terminology to data,data features,and pa

74、tterns in data within the context and vocabulary of the different business users(data teams and business teams).Types of semantic labels include?Taxonomic hierarchies:Sales,Sales by Region,Sales by Product Category,Sales by Time(month/business quarter),Sales by Campaign,etc.which are relevant dimens

75、ional views for the BI or data warehouse users,but should also be incorporated into the DS/ML models(if those are the data views that the business analysts require)?Column heading homogenization(e.g.,should“Revenue”and“Dollars”headings be labeled the same?What about“Predicted Price”and“Price Forecas

76、t”?Data team-created columns:explanatory variables,causal variables,predictions,and feature importance metrics?Business team-created columns:data cube dimensions,aggregates,units,ranges,min/max values,and data sources?Use cases:who(relevant business user team),what(business applications),when(time-s

77、pecific?),where(context),why(business outcomes and objectives),how(recipes,models,formulae,applicable instructions)?Ontologies:knowledge graphs;entities and their relationships;semantic assertions?Provenance:When was this data created?Who(or what business unit)created it or acquired it?Who owns the

78、data?Has the data been modified,when,and by whom?Is there a time-constrained period during which this data was valid,and thus should not be used outside of that range of dates(e.g.,a customer address might be different during the summer months versus during the school year;or a sales campaign was on

79、ly valid during a specific period,or maybe the campaign repeats on a recurring cycle)?Insights and patterns in the data:(a)segments(classes),(b)trends(correlations),(c)surprises(anomalies,outliers),and(d)linked entities and events(co-occurring associations).Steps Toward the Semantic Layer14A second

80、key feature of semantics is to make your data“analytics ready”,which means putting a business layer over the data layer.This business layer does more than just make your data“analytics ready”it really makes your data“business ready”.This puts insights discovery potential into the hands of all busine

81、ss users,enabling more people to access data and to blend multiple data sources(internal and external).The business layer also provides a single logical view of the data to deliver actionable insights in a timely manner for the right person at the right time.Thats really enabling business at the spe

82、ed of data so that all users can drive business value creation from the essential digital assets that underpin all modern organizations.Thats a good question and the key question,but lets first explain semantics.The way that I explained it to my data science students years ago was like this.In the e

83、arly days of web search engines,those engines were primarily keyword search engines.If you knew the right keywords to search and if the content providers also used the same keywords on their website,then you could type the words into your favorite search engine and find the content you needed.So,I a

84、sked my students what results they would expect from such a search engine if I typed the following words into the search box:“How many cows are there in Texas?”My students were smart.They realized that the search results would probably not provide an answer to my question,but the results would simpl

85、y list websites that included my words on the page or in the metadata tags:“Texas”,“Cows”,“How”,etc.Then,I explained to my students that a semantic-enabled search engine(with a semantic meta-layer,including ontologies and similar semantic tools)would be able to interpret my questions meaning and the

86、n map that meaning to websites that can answer the question.This was a good opening for my students to the wonderful world of semantics.I brought them deeper into the world by pointing out how much more effective and efficient the data professionals life would be if our data repositories had a simil

87、ar semantic meta-layer.We would be able to go far beyond searching for correctly spelled column headings in databases or specific keywords in data documentation,to find the data we needed(assuming we even knew the correct labels,metatags,and keywords used by the dataset creators).We could search for

88、 data with common business terminology,regardless of the specific choice or spelling of the data descriptors in the dataset.Okay,so what is a Semantic Layer?15Even more than that,we could easily start discovering and integrating,on-the-fly,data from totally different datasets that used different des

89、criptors.For example,if I am searching for customer sales numbers,different datasets may label that“sales”,or“revenue”,or“customer_sales”,or“Cust_sales”,or any number of other such unique identifiers.What a nightmare that would be!But what a dream the semantic layer becomes!When I was teaching those

90、 students so many years ago,the semantic layer itself was just a dream.Now it is a reality.We can now achieve the benefits,efficiencies,and data superhero powers that we previously could only imagine.But wait!Theres more.Perhaps the greatest achievement of the semantic layer is to provide different

91、data professionals with easy access to the data needed for their specific roles and tasks.The semantic layer is the representation of data that helps different business end-users discover and access the right data efficiently,effectively,and effortlessly using common business terms.The data scientis

92、ts need to find the right data as inputs for their models.They also need a place to write-back the outputs of their models to the data repository for other users to access.The BI(business intelligence)analysts need to find the right data for their visualization packages,business questions,and decisi

93、on support tools.They also need the outputs from the data scientists models,such as forecasts,alerts,classifications,and more.The semantic layer achieves this by mapping heterogeneously labeled data into familiar business terms,providing a unified,consolidated view of data across the enterprise.The

94、semantic layer delivers data insights discovery and usability across the whole enterprise,with each business user empowered to use the terminology and tools that are specific to their role.How data are stored,labeled,and meta-tagged in the data cloud is no longer a bottleneck to discovery and access

95、.The decision-makers and data science modelers can fluidly share inputs and outputs with one another to inform their role-specific tasks and improve their effectiveness.The semantic layer takes the user-specific results out of being a“one-off”solution on that users laptop to becoming an enterprise a

96、nalytics accelerant,enabling business answer discovery at the speed of business questions.Benefits of a Semantic Layer16Insights discovery for everyone is achieved.The semantic layer becomes the arbiter(multi-lingual data translator)for insights discovery between and among all business users of data

97、,within the tools that they are already using.The data science team may be focused on feature importance metrics,feature engineering,predictive modeling,model explainability,and model monitoring.The BI team may be focused on KPIs,forecasts,trends,and decision-support insights.The data science team n

98、eeds to know and use the data which the BI team considers to be most important.The BI team needs to know and use those trends,patterns,segments,and anomalies that are being found in data by the data science team.Sharing and integrating such important data streams has never been such a dream.The sema

99、ntic layer facilitates collaboration and communication between the data cloud,the business analysts,the decision-makers,and the DS/ML modelers.In other words:the Semantic Layer achieves data democratization and bridges the gap between data science and BI.The key results from the data science modeler

100、s can be written back to the semantic layer to be sent directly to consumers of those results in the executive suite and on the BI team.Data scientists can focus on their tools;the BI users and executives can focus on their tools;and the data engineers can focus on their tools.The enterprise data sc

101、ience,analytics,and BI functions have never been so enterprisey.(Is“enterprisey”a word?I dont know,but Im sure you get my semantic meaning.)Thats empowering.Thats data democratization.Thats insights democratization.Thats data fluency and data literacy-building across the enterprise.Thats enterprise-

102、wide agile curiosity,question-asking,hypothesizing,testing or experimenting,and continuous learning.Thats data insights for everyone.17In summary,the mission of data-rich organizations is this:to produce successful business outcomes and value from data through analytics.Since the rate at which data

103、flows through organizations is lightning fast,users of analytics applications need strategies,tools,and techniques that quickly leverage those data to extract insights,to make data-driven decisions,and to take the next best actions in other words,to help business move at the speed of data!The semant

104、ic layer represents a force multiplier for data science and business intelligence teams to work cooperatively in driving business impact and producing critical business outcomes.With a semantic layer,democratized data analytics has never been so democratized!The dream of bridging the gap between dat

105、a science and business intelligence in your organization becomes a reality.AtScale provides a semantic layer solution to actualize this dream and to achieve its full potential.AtScale can dive deeper into this with your organization to show how data science and BI teams and applications can synergis

