《MarketBridge:2022现代营销分析路线图(英文版)(44页).pdf》由会员分享,可在线阅读,更多相关《MarketBridge:2022现代营销分析路线图(英文版)(44页).pdf(44页珍藏版)》请在三个皮匠报告上搜索。
1、How to Make Marketing Science a Reality,and Deliver the Analytic Results that Executives WantEXECUTIVE WHITEPAPERAndy Hasselwander Chief Analytics Officer,MarketBridgeA Roadmap for Modern Marketing Analytics.Marketing Analytics is in Trouble pg.3Five Jobs to be Done pg.7Seven Building Blocks of Grea
2、t Marketing Analytics pg.112Bespoke Use Cases Dominate Long-run Vision.pg.6Source Systems.,.pg.12Data Pipelines.,.,.pg.17Analytical Data Storage.,.pg.20Data Science.,.,.pg.29Business Intelligence.,.,.pg.34Organization.pg.36 Skills.pg.37The WHAT:Business Intelligence and Reporting.pg.8The WHY:Data De
3、tective Work.pg.9The WHO:Understanding Your Audiences.pg.9The HOW:Marketing Effectiveness.pg.10Whats Upcoming:Predicting the Future.pg.10In organization after organization,Marketing business leaders are asking questions faster than Marketing Analytics teams can answer them.This makes sense;accelerat
4、ing technical complexity in the go-to-market space makes answering tough questions scalably and repeatably a daunting task.This whitepaper covers the challenges of,and approaches to,operating an agile Marketing Analytics organization.These pages outline the scope of Marketing Analytics jobs-to-be-do
5、ne,and detail the functional building blocks that teams need to better diagnose issues,predict outcomes,and progress on marketing goals.A COMPREHENSIVE RESOURCE ON Building and Operating a High-Quality Marketing Analytics OrganizationTABLE OF CONTENTS Conclusion pg.39Case Study pg.40Modernizing Mark
6、eting Analytics for a Health Insurer.pg.40Marketing Analytics is in TroubleFor all the promise of data-driven marketing decision-making,Gartner predicts CFOs will slash marketing analytics teams by 60%in 2023.1 Companies have spent years building up these teamshowever,they have largely been unable t
7、o deliver the data-driven results that executives want.In fact,marketing analytics influences only 53%of marketing decisions.This was not an outcome that many analytical marketers would have foreseen ten years ago;powerful analytical tools and data science approaches seemed poised to make marketing
8、science a reality.However,prosperity has always been“just around the corner”when it comes to marketing analytics.The lack of significant progress has not been due to a lack of investment;hundreds of billions of dollars have been spent on marketing technology,data assets,and talent over the past deca
9、de.However,much of this investment has gone to waste.When examining twenty years of case studies at MarketBridge,six points of failure come up repeatedly.Overreliance on Magic BulletsVendors are only too happy to promise that their martech will finally do what all the others have not over the past d
10、ecades.However,new systems do not generally improve the overall quality of marketing data;if anything,they can make problems hardernot easierfor analysts to solve.After all,marketing is the one part of the enterprise whose data sources are dynamic,unstandardized,and constantly changing.What tends to
11、 happen is that these complex solutions are installed and then heavily customized to meet the enterprises needsoften in ways that do not fit with the inner workings of the black box.In most cases,we have seen somewhere between 50%and 80%of the functionality of the solution go unused,as another 50%to
12、 80%of other functionality is bolted on so that the solution will work in the first place.Black Boxes Over ReproducibilityProprietary approaches to solving problems benefit the enterprise selling the approach.Software companies assets are their source code files and approachesits just hard to make d
13、ecent money using open-source approaches.This means that big,enterprise solutions to problemssuch as large vendors CDPs(Customer Data Platformsthe industry term for a marketing-focused longitudinal human record)are going to be complex,expensive,and hard to install,or troubleshoot without lots and lo
14、ts of dedicated IT resources.3Black boxes are also problematic for reproducibility.Perhaps the biggest headache for marketers trying to present to finance is numbers that dont tie.Software companies can promise their solution will do many things,but they cannot fix the underlying quality of the data
15、.Accountants conduct annual audits on most corporations to ensure that the dollars coming in and going out tick and tie;marketers do not have that luxury(maybe they shouldbut thats the problem detailed in the second-to-last section.)The Dynamism of MarketingMarketing technology evolves incredibly qu
16、ickly,in tandem with the rapid advances in digital media.Every year,marketers must learn new terms,systems,and strategies.Unfortunately,these channels and technologies are usually not designed with any data structure standards.Indeed,in many cases,they differentiate themselves on their uniqueness.Th
17、is uniqueness does drive innovation,and in the long run is good for marketing,but it is painful for data integration when new systems and channels must be absorbed into marketing analytics scope on an almost quarterly basis.Siloing is another side effect of rapid innovation.In a large enterprise wit
18、h multiple business units,it is common to have systems in three phases:obsolete,just right,and bleeding edge but not working yet.The obsolete systems tend to be most embedded in the business,for obvious reasonstheyve been around the longest.“Just right”systems are mature enough to function well,but
19、in most cases,dont span the full scope of marketing.Finally,the bleeding edge systems are usually where most of the effort is being spent by ITbut are generally only functional for a few use cases.Since these are often disparate systems,it is difficult(if not impossible)to query data from multiple s
20、ystems.If it is possible,there is no guarantee that the data will match because the ETL(extraction,transformation,and loading)process for each system is different.What seems to be missing in many organizations is a Rosetta key for marketing analytics.This missing intermediate layer could calm the fa
21、st-changing channel,tech,and source data landscape into a more interpretable meta layer.However,these meta layers are hard to sell because they dont seem to have immediate commercial benefit.Marketers(Usually)Arent EngineersMarketers tend to be empathetic,creative,analytical,and organized.However,th
22、ey tend to be weak in the engineering disciplines.This isnt a dig on marketers,but the reality is that in most cases marketing leaders depend on others to design,develop,and integrate tech and data.This dependence on external parties leads to many bad decisions and poor implementations.The disconnec
23、t between the builder and the user means that marketing is promised features that arent realistic,projects fall behind,and whoever made the decision in the first place moves on,leaving a trail of partially functioning data wreckage in their wake.This usually isnt malicious.Most technologists want to
24、 4please people and truly believe that theyll deliverbut then they cannot.This phenomenon was detailed comprehensively in“The Mythical Man-Month,”first published in 1975 by Fred Brooks.2 Conversely,most marketers want to believe that the promised features will workremember,they are the creative,emot
25、ive forces in a company.The personality interplay between the pleaser and the believer can be disastrous.We foresee that by 2030(roughly eight years from today),50%of marketing jobs will require coding-type technology skills.By“coding-type”,we mean individuals who can interact at a text-based progra
26、mming level with critical data querying and transformation technologiesfor example,SQL,Python,and R.These individuals will also be required to understand the different data structures and access procedures that marketing data uses,including JSON,XML,and.csv files,as well as batch access(FTPs)and API
27、s.These skills will eventually be table stakes for marketers,as marketing becomes more and more technical.No Equivalent of Financial ReportingAs mentioned before,the Profit and Loss or Income Statement is the annual audit,the equivalent of the annual physical for the corporation.It states in standar
28、d language how much money has been spent,and how much has been received.It breaks these down into the things the enterprise makes(cost of sales)and the money the company spent to sell and administer those things(S,G,&A/sales,general,and administrative costs).Marketing fits into S,G,&A.However,if you
29、 read a few 10-Ks,youll notice that marketing is rarely discussed in detail.Advertising is sometimes discussed,but the functions of marketingacquiring,retaining,and serving the companys customersare glossed over.We shouldnt make too much of this;financial reporting is required for investors and tax
30、authorities;therefore,it gets attention.However,the side benefit of this reporting is immense.Business units know exactly where they stand in terms of their profitability.SKUs(stock-keeping units)can be added or cut depending on demand and price attainment.The business runs on financial reporting;it
31、 is“oxygen”for managers.Marketing needs the equivalent discipline;there has been no norm established to do true marketing reporting.Managers are left to cobble together dashboards,or buy whatever software promises a magic bullet.But because of the reasons listed above,there is no“magic bullet softwa
32、re solution”and theres no reason to believe that one will be developed.5We forsee that by 2030,50%of marketing jobs will require coding-type technology skills.Bespoke Use Cases Dominate Long-Run VisionThere are always a million fires to put out in a business.Conference calls are routinely cancelled
33、for the ubiquitous fire drillthe“dog ate my breakfast”of business-speak.Its not clear that marketing is any worse in this regard than any other business function,by the way.However,short-termism is very problematic when trying to understand marketing performance.Short-term solutions are always easie
34、r than doing the hard work of creating universal taxonomies,building robust data pipelines,and creating tidy data frames.Marketing analytics departments generally do not have the equivalent of the“product roadmap”that software teams havebut they should.This is FixableThese problemsoverreliance on te
35、chnology,lack of transparency,an incredibly dynamic topic,not-technical-enough teams,a lack of standard financial reporting,and constant fire drillsare endemic,but they are fixable.There are Marketing Analytics organizations who are efficient,high-quality,and innovative,turning out accurate insights
36、 quarter after quarter and year after year.These teams share many of the same traits.First,they have a clear understanding of the work that they do;their scope is understood and articulated.It turns out that even between diverse industries,across consumer and business,marketing analytics really has
37、five core jobs.Specific methods differ industry-by-industry and go-to-market model by go-to-market model,but overall scope can be neatly defined for nearly any company.By defining what is in scope and what is not,analytics leaders can avoid distractions and deliver on promises to stakeholders.This i
38、s the focus of the next section of the paper.Second,successful Marketing Analytics leaders apply best practices across seven key capability areas:the source systems that create data;the processes that move data around;the analytical databases and lakes that store data;the data science techniques and
39、 approaches to analyze and create insights;the business intelligence tools to expose insights to stakeholders;the organizational approaches to prioritize and deliver great work;and the team capabilities and training necessary to deliver on the jobs-to-be-done.This is the focus of the second section
40、of the paper.6Five Jobs to be DoneThe Marketing Analytics function is typically asked to do five kinds of jobs.The What job is simplest:counting stimulus(impressions,spend)and response(leads,orders)in report format.This is sometimes called reporting or business intelligence.Building upon the What re
41、ports are the questions of Why,Who,How and What will Happen.These questions generate insights,recommendations,profiles,and predictions that help marketers make data-informed decisions.7Figure 1:The Five Marketing Analytics Jobs-to-be-Done.Poor quality performance data leads to the most effort being
42、spent on“What”,to the detriment of the more value-added tasks to the rightThe WHAT:Business Intelligence and ReportingIn many Marketing Analytics organizations,most analyst time is spent counting and summarizing data.Business intelligence(BI)software is supposed to automate these manual reporting ta
43、sks,but for many marketers this has not happened.Instead,analysts juggle Excel spreadsheets,dashboards from other systems,queries of various databases,and files from vendors to put together reports that do one-off reporting tasks.But these analyses are often difficult or impossible to reuse.8 Inform
44、ation kept in spreadsheets Using dashboards as databases Non-existent or conflicting metadata Constantly changing data streams,particularly from vendorsThese upstream data blockers are sometimes given short shrift,as leaders instead focus on“getting something built.”This inevitably results in distru
45、st and poor usage,as numbers fail to match between systems,and data refreshes take months or even quarters.What reporting is critical for running the business.Understanding progress-to-goal,spend by channel,and last-touch attribution is critical for marketers.The best reporting is visualbut also inc
46、ludes the ability to drill through to detailed,tabular reporting that managers can use to draw their own inferences and conclusions.In some cases,analytics organizations can“hoard the information”for fear of transparency.This should be avoided;it creates bottlenecks and drives resentment.MARKETING S
47、pending by channel Last-touch attribution of leads by channel Campaign-by-campaign performance Test-control readouts Attitudinal tracking CRM Website traffic Customer lifetime value Customer engagement levels Customer satisfaction SALES Performance by rep/territory Sales pipeline status Lead handoff
48、 performance Partner segmentation and performanceFigure 2:Typical“What”Reports and Dashboards Across Marketing,CRM,and Sales“What”reporting should be 80%automated,but several common upstream marketing data problems prevent organizations from reaching this level:Generally,the What use case is the fir
49、st of the five that should be mastered.The“how much,how many”data should be well understood by everyone;this provides a sensical substrate for more advanced questions.It is difficult to productively engage with more advanced questions about attribution,audience dynamics,or predictive modeling if dif
50、ferent stakeholders cant agree on how many leads are being produced,or how much money is being spent by channel.The WHY:Data Detective WorkMarketing is a dynamic and constantly changing job,and there are always new ad hoc questions emerging from leadership.Answering these questions requires fast acc
51、ess to diverse data sets,alongside a powerful“data munging”toolkit to quickly restructure and join data.Data detectives are a unique breed.They are naturally impatient to find answers,and seek information from any source they can find.They are then capable of weaving these sources together into a co
52、mpelling narrative that can be processed by executives.Data detectives dont have time to hunt for these data across hard-to-access systems owned by different departments;they need a comprehensive repository of information a“querys length”away from their development environment.In many marketing orga
53、nizations,the data required for Why analyses are many emails and meetings awaylengthening the time to get an answer from hours to weeks or even months.The WHO:Understanding Your AudiencesSegmentation,targeting,and positioningor“STP”is part of the foundational marketing curriculum for business studen
54、ts.Understanding prospects and customers is hampered in many marketing analytics organizations because of a poor grasp of identity.Individuals,households,families,business offices,and relationships are hard to represent in data format,particularly when you dont own the data.Executives,however,need t
55、o know Who their customers are,and which audiences are responding to different offers.Ideally,analysts should be able to link stimulus and response activities and promotions to a robust set of cross-walked identity keys to enrich buyer and customer journeys.However,the reality is that most identitie
56、s exist in siloes,making truly empathic and actionable marketing extremely difficult.10 Steps for Actionable Segmentations9Many enterprises fail when segmenting and targeting audiences because they fail to design how data will be used to find customers.Here is our framework on how to remediate that.
57、ACCESS HEREThe HOW:Marketing EffectivenessThe CFOs biggest question is“how did marketing actually drive the business?”This seems like a simple ask,but it is very complex.Last-touch attribution can be hard enough,but marketing works across channels,and measuring upper-funnel impact is difficult(but n
58、ot impossible)to fully quantify.There are five key questions that analysts typically need to answer for executives asking questions about marketings worth.The first is simply quantifying marketings contribution to the business without double-countinga formidable challenge in and of itself.After that
59、 has been accomplished,channel-by-channel attribution(multi-touch attribution)is a common use case.The less quantifiable aspects of marketingbrand and upper-funnel investmentsare harder to measure and usually require an intermediate“attitudinal”variable.Finally,marketing optimization seeks to re-mix
60、 channels,audiences,and geographies to drive efficiency and effectiveness.Measuring marketing effectiveness at any level depends on quality data,and code-based,version-controlled tools through which models can be built,deployed,and tweaked at a high cadence to deliver timely answers in a constantly
61、changing environment.10Figure 3:The five key questions of marketing effectiveness,marketing analytics organizations delivering value across these five use cases are not facing the steep budget cuts that Gartner is forecasting,on average,industry wide.Whats Upcoming:Predicting the Future Crystal ball
62、s are sadly still not available,but machine learning techniquescombined with vast quantities of datagive marketers the ability to create models of people and events that help to predict behavior.The most common types of machine learning models for marketers are propensity models.Propensity models ou
63、tput the probability that an individual,household,customer,or business will do somethingusually,respond to marketing or make a purchase.To build and deploy machine learning models at scale,millions of rows of individual-level recordsalong with complete promotional histories and as many predictive fe
64、atures as possibleare required.What is my current marketing MROI of channels?What is the true return of my brand investments?How should I spend my next best dollar?What is my marketing halo?How can I quantify the total impact of my marketing efforts?004040505Each of the jobs mentioned in
65、the prior section are critical,and if done well and quickly,can drive huge value for the enterprise.Yet,to do these jobs welland in a scalable wayit is necessary to build a great marketing analytics organization.Over two decades of work with over 100 Fortune 1000 companies,we have developed a framew
66、ork to assess maturity in marketing analytics.The seven-part framework covers technology,data,analytical techniques,and talent.In each of the seven areas,proven best practices have been identified effectively avoid the pitfalls that routinely plague marketing analytics organizations.The seven areas
67、are:Seven Building Blocks of Great Marketing Analytics11Source Systems Long-term vision/commitment to standardizing marketing techBusiness Intelligence Tools that handle both visualization and tabulation12345Data Pipelines High-quality and reproducibleAnalytical Data Storage Exhaustive,organized,hig
68、h-speed,and accessibleData Science Open-source,reproducible,and connected to dataOrganization Flexible,clear project management structures 6Skills Team built around technology-capable,marketing-expert doers7Source SystemsSource Systems are marketing technology platforms that organize and execute mar
69、keting campaigns.They include Campaign Management Systems,Digital Marketing Platforms,Content Management Systems and eCommerce Systems,Digital Asset Management Systems,and Customer Relationship Management Systems.In addition to these“owned”systems,this area by extension includes data from agencies r
70、esponsible for marketing execution,as well as third-party data like enrichment files and attitudinal/survey data.In short,source systems organize and generate the data that marketing analytics uses.If source systems are not selected,organized,configured,and integrated properly,marketing analytics wi
71、ll be unable to do its jobs.The marketing technology landscape is extremely complex.It is a world dominated by software vendors and syndicated analysts.These parties are all trying to make money,selling promises with very large price tags.Ironically,billions of dollars of marketing are spent promoti
72、ng marketing software,targeting CMOs and Marketing VPs.It is tempting to always chase the next big software promisebut this can be a mistake.Companies that chase the next big thing sometimes end up with multiple marketing technology platforms doing the same job or providing dramatically different da
73、tamaking analytics job much harder.Marketing leaders shouldnt let themselves be wowed by each new promise;instead,they should define and execute on a long-term marketing technology strategy that simplifies and standardizes approacheswith consideration for the data that these systems will generate.Li
74、kewise,the marketing analytics team should have a seat at the table when defining and marketing technology,to force clear requirements on how the data from that platform will be extracted,transformed,and loaded into the marketing data lake and data warehouse.And,when new platforms are selected,the d
75、ata extraction pipeline should be engineered upon installation;this cannot be a multi-year process that is never completed.continued on the next page 112Five Key Types of Marketing Source Systems13Campaign Management SystemsCampaign Management Systems organize marketing execution,particularly for pr
76、omotion-led,down-funnel efforts like mail and direct response TV.They are,in essence,applications that configure and create relational database structures representing marketing execution.For example,if a direct mail campaign is designed to reach a certain audience with a certain creative,the Campai
77、gn Management System selects the prospects for mailing;assigns the creative to them;defines their audience;and most likely defines an inbound telephone number to track them.Examples of Campaign Management Systemsold and newinclude Unica,Aprimo,and Adobe Campaign Manager.Campaign Management Systems a
78、re important to Marketing Analytics because they define the metadata for campaigns.Metadata includes things like marketing objective,audience,campaign name,cell,test vs.control,media channel,geography,and timeframe,to name a few.To be effective,Campaign Management Systems must be flexible enough to
79、add new types of metadata when needed,but enforced to ensure that campaigns must cleave to the same basic organizational structure.One common problem with campaign design is overcomplexity.Instead of working within the constraints of the database-enforced metadata of systems,campaign managers will i
80、nstead design custom campaigns using Excel,and then leave the details in their“My Documents”folderleaving the analyst tasked with analyzing results confused and sending emails back to find the original Excel file.This seemingly mundane example,repeated over and over,can make performance reporting un
81、scalable.To avoid this,members of the marketing analytics department should be involved in campaign setup,at the very least informed about structures and strategies.They should know up-front about any“free text field”or“Excel-based modifications”made to a campaignbut ideally,these should be avoided
82、altogether.Digital Marketing PlatformsDigital Marketing Platforms organize and execute display,streaming video,social,and paid search advertising to unknown,partially known,and known individuals across mobile,PC,gaming,and streaming video devices.Most marketing technology investment today is going t
83、owards digital tactics,and for good reason;more and more user time is spent on digital devices,and even upper-funnel brand activity is shifting primarily to streaming video.In addition to paid media,earned digital media is becoming a more critical component of the digital go-to-market mix.Modern dig
84、ital marketing platforms integrate these earned tactics,providing the ability to both promote to and track influencers,and to monitor the engagement that influencers and news media have on the overall conversation on social media.14Digital Marketing Platforms process and store orders of magnitude mo
85、re data than traditional Campaign Management Systems.For example,a direct mail campaign targeting 1M households will generate approximately 1M promotion records,5,000 response records,and require another few thousand lookup records,costing about$400,000 in total to execute(about$400 CPM).A retargeti
86、ng campaign might have 10M impressions,with another 10M-100M log files generated,costing$100,000 to execute(about$10 CPM).In the first case,budget-per-row is about 40 cents;in the second case,the budget-per-row is a tenth of a cent.Put another way,the data generated per sale is about 100X greater in
87、 the digital case.For marketing analytics,the data volume generated by digital marketing has two implications.First,any downstream data storage needs to scale to hold granularnot aggregateddigital marketing data.It is common to punt on this and not store detailed digital logs.This inevitably leads t
88、o analytic frustration,particularly in the“Who”and“How”jobs-to-be-done.Second,the data must be transferred on a timely basis.While it used to be routine to wait 30 or even 90 days for upper-funnel advertising data to be analytically processed,expected load times for digital data should be in the hou
89、rs,or,at most,days.One area that Digital Marketing Platforms sometimes fall down is metadata management.It is common for the picklists that define campaigns in Campaign Management Systems to not match those in Digital Platformsas digital marketing is by its nature more audience-driven(bottom-up)vs.m
90、arketer-driven(top-down).This can make multichannel analysis difficult or impossible.To avoid this,Marketing Analytics should work with the marketing technology team to create a common set of metadata across Digital and Traditional Marketing Automation systems,particularly focusing on audience defin
91、itions,campaign and audience codes,and channel definitions.Content Management Systems and eCommerceContent Management Systems(typically called a CMS)run the website(s)of an organization.They make it easy for marketers to change the text,images,and structure of websites,and make it possible to quickl
92、y launch“landing pages”for leads.In companies with the ability to take online orders,eCommerce systems work with websites to accept orders and payments,and then order,shipping,or licensing details.Essentially,CMS and eCommerce systems are the“catchers”of interest and leads.They also play an importan
93、t role in a companys organic search performance.For Marketing Analytics,CMS and eCommerce provide critical information on overall traffic,and on discrete inbound leads.Overall trafficin other words,pageviews,unique visitors,or number of clicksis critical to understand for performance and trending.Ge
94、nerally,these data are accessible by Marketing Analytics,but what is usually more difficult is cutting the data by relevant areas of the site.Once again,metadata are critical.Sites should be tagged according to marketing objective,product taxonomy,and pre-and post-paywall or registration areas,to ma
95、ke downstream trend analysis meaningful.Because a companys website is owned,it generates first-party data.Put another way,once a person accepts cookies,he or she is now“known”and can be tracked over timein theory all the way to 15becoming a customer.These observations of individualssometimes called
96、log filesshould make it into the Marketing Analytics data repository,along with the metadata about where and at what stage the individuals were seen.These eventually can form a key component of the Customer Data Platform(CDP),discussed in more detail in Section 3.Digital Asset Management SystemsDigi
97、tal Asset Management systems hold a companys creative assets,including direct mail,video,audio,and copy.At first glance,they may seem less important for Marketing Analytics than other source systems,and in the past,this has probably been the case.However,with quantum leaps in AI and machine learning
98、 techniques applied to text,audio,and video pattern recognition,Marketing Analytics will increasingly get the“What Will Happen”job of optimizing content to drive acquisition and engagement.To make DAMs actionable data sources for Marketing Analytics,the creative used in different promotions needs to
99、 be tagged in both the Campaign Management System/Digital Marketing Platform and the DAM.In addition,metadata about the creative should be stored using a standard taxonomy in the DAM to make cross-tabbing and business intelligence reporting possible.Customer Relationship Management Systems(CRM)CRM s
100、ystems are used to manage the down-funnel sales pipeline(leads to opportunities to orders)as well as existing customers.They are traditionally seen as sales-owned platforms,but they are also owned and used by customer serviceits right there in the name.Today,the CRM space is dominated by S,who have
101、used their CRM flagship to launch a cloud-based SaaS empire.CRM systems are often the single source of truth for leads,revenue,and account information.Leads and revenue are straightforward.Leads should have campaign and tactic IDs to track their last-touch sources,as well as a linkage to a“universal
102、”contact ID.Leads are then generally converted to opportunities,which progress through the pipeline until they close.In the case of consumer products,this might be instantaneous;for more complex B2B transactions,sales cycles can take months.It is critical for marketing analytics to understand the im
103、pact of delays in sales on campaign ROI.A good account taxonomy in the CRM system goes a long way in creating robust first-party identity resolution.In B2B companies,CRM systems can be plagued with disorganized and over-complicated account hierarchies.A best practice is to avoid this by insisting th
104、at each account corresponds to a validated third-party ID,like a DUNS number(a nine-digit unique identifier for businesses).Exhaustive,Keyed,and LabeledAcross all upstream source systems,the same general rules apply to make Marketing Analytics data useful.The upstream operational data that is to be
105、loaded into the database must be exhaustive,keyed,labelled,and timely.Exhaustive data means that everything you need for analysis is included whether it is from agencies,vendors,or tech platforms.All these data should be keyedno matter what the sourceso that 1617Data PipelinesMarketing Analytics sho
106、uld not be doing analysis in operational systems.Operational systems are designed to be transactional,not analytical,and burdening them with queries or even live dashboards will lower performance and negatively impact“run the business”use cases.This still happensa lotand for understandable reasons.W
107、hen data pipelines are not built correctly,downstream analytical data sources will be wrongand when the data are wrong,Marketing Analytics loses credibility.Data pipelines are the collective set of hardware,software,code,and processes that move and transform data from one system to another.In the ca
108、se of Marketing Analytics,data pipelines primarily move data from Source Systems(Operational Data Stores)to Analytical Systems(the Data Lake and Data Warehouse).In some organizations,this process falls under the purview of data governance,which is defined as“the process which enables an organization
109、 to ensure that high data quality exists throughout the complete lifecycle of the data,and data controls are implemented that support business objectives”.3 The key focus areas of data governance include data availability,usability,consistency,integrity,security,and compliance.It also includes estab
110、lishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data which an enterprise has can be used by the entire organization.2Figure 4:Getting upstream data right means that it is exhaust
111、ive,keyed,labeled,and timelyeach transaction or row can be tied to a unique identity.More than one ID can be used,however,the interplay and any“inner join loss”should be well understood and documented.The data should also be labelled and accurately named according to a universal and well-documented
112、taxonomy.Lastly,the data teams and vendors should be held to a standard of daily updates.Month or week lags between in-market and in-database are typical but are unacceptable.Reproducibility and transparency are critical for high-quality data pipelines.In Figure 5 below,the orange boxes represent th
113、e data engineering flow.On the left side of the diagram,data are extracted from Operational Data Stores(Source Systems)and flat files,and then loaded as raw copies into a data lakeessentially a file system of raw data using cheap,fast storage(such as Amazon S3),using a code-based process manager,suc
114、h as Adobe Airflow.Then,these data are transformed into analytical data frames that stay in the data lakefor machine learning,data science,and ad hoc/“Why”use-case analysisand into a keyed relational structure(the marketing data warehouse.)A code-based,reproducible process ensures that as data is up
115、dated,it is going through the same process today,tomorrow,and next week.The most sophisticated data warehouse is useless if data pipelines are sending inconsistent,redundant,and error-filled data to analytical data frames and the data warehouse.4There are two alternatives emerging when building data
116、 pipelines:ETL(Extract,Transform,and Load)and ELT(Extract,Load,and Transform.)In a way,this is a semantic quibble.In the ELT approach,entire databases are copied into fast,cheap storage,and then transformed into whatever is needed from there.In the ETL approach,“difference queries”(all new records b
117、etween time a and time b)are read,transformed,and added to a database.The ELT approach is gaining popularity because the copied tables can be used for large machine learning/data science jobs as well as staging files for ingestion into a key relational structure(the data warehouse.)Marketers looking
118、 to use an ELT approach can use tools like FiveTran to move entire 18Figure 5:The data lifecycle in an“ELT”(Extract,Load,Transform)Structure.In this case,data are copied in raw format to a data lake for both staging and usage in machine learning use cases,and then transformed to a key relational str
119、ucture for reporting and structured analytics.copies of databases or large files between systems in the cloud,a much easier task today than ten years ago.However,there is a danger of losing track of process when using powerful,easy tools to move data;whenever click-and-drag interfaces replace script
120、s,organizations run the risk of future data governance issues.The Importance of Strategic“Data Munging”“Data munging”is a term used by data scientists to describe the process of getting data into the shape required to do analysis.To a data scientist any data is fair game,regardless of its hygiene.Th
121、is is because data scientists are task driven,and they have a complete toolkit to acquire,process,and analyze data in the form of powerful tools like Python and R.There is nothing wrong with one data scientist doing vertically integrated data munging for his or her project.However,this problem can q
122、uickly create many isolated data frames of unclear provenance that are difficult to reconcile.A better approach is creating staged views that are largely“pre-munged”,where the data scientist can simply query all recordsor a subset with a simple WHERE clause but without joins to other tablesand then
123、work on that tidy dataset with a minimum of further manipulation.One rule of thumb is that if more than one query is required for a use case,then data engineering should at least be consulted about the feasibility of creating an analytic view.This wont always be possiblebut it should at least be con
124、sidered on a cost/benefit basis.Batch Versus FlowIn too many cases,ETL or ELT code is written,but not productionalized.It is common for analytical data assets to be initially created as proof-of-concepts.The code is written to create the table or dataframe,and then checked for quality.At that point,
125、the next stepturning these assets into an evergreen assetis sometimes forgotten or dropped.The phrase“data pipeline”implies continuous flow.Without continuous flow,a better term might be“data dumptrucks.”Flow can be achieved in multiple ways.Perhaps the most common and easiest is batch processing.In
126、 this method,all new records between two time periods are transferred from the source system to the analytical system,using an inserted record datetime stamp record to identify new records.To avoid upserting duplicate data that were updated in the source system over the new time period,a record_id(p
127、rimary key)can be used.This is the common ETL(extract-transform-load)approach.Another method is omnibus copying,sometimes called the“kill-fill”.In this method,all records in the analytical database are dropped and replaced by whatever is in the source system.This method is obviously drastic,and depe
128、ndent on very fast compute and data transfer rates,and very cheap storage for both the operational data store and the analytical database.However,it can be used in a pinch,particularly for smaller datasets and lookups.19Finally,API-based approaches are perhaps the most modern and efficient.API stand
129、s for“application programming interface”,and in the context of data pipelines,it is a way to ask for and receive a specific row or set of rows from a database.Well-designed APIs allow near-real-time data updates from source applications to analytical databases.For example,an API might take three par
130、ameters:Insert datetime stamps,channel names,and product types.Every day a scheduled script would connect to this API and request the records updated in the period elapsed since the last request.In all three of these cases,some scheduling approach is needed to execute the queries and/or API calls.On
131、 Linux-based systems,a“CRON”job can be written to execute a file at a certain time.Assuming the file runs correctly,a Python script,for example,can query a database,munge the data,and dump the data somewhere else.Today,there are more advanced frameworks that can schedule data movement and transforma
132、tion,with complex logic built in.Apache Airflowan open-source library and frameworkis commonly used to schedule and add complex conditional logic to data pipelines.By using a framework like Airflow,plain text-based code can be used for scheduling,querying,and transformation,making it possible to rep
133、roduce and innovate on the data pipeline codebase.20Analytical Data StorageMost organizations struggle with multiple data warehouses in various states of construction.For example,an older system might be more reliable,but lacks newer data sources;while a newer system might have digital data,but is o
134、f poor data quality.The result is a patchwork of systems,where different reports are fed from different systems,and it is difficult to trace the provenance of data.Instead,marketing organizations should strive for a unified marketing data warehouseused for all business intelligence,reporting,and adv
135、anced analytics tasks.This warehouse should be omnichannel,housing all promotional activity from advertising to direct mail to digital to CRM(customer relationship management).It should also span the entirety of the customer lifecycle,from the upper funnel(when prospects are not yet known to the com
136、pany)to customer retention.3Without well-organized,accessible,and high-quality marketing performance data,marketing analytics cannot scale to deliver the volume of insight required by the business.History of Marketing DataThe“marketing database”originated in the 1970s with the advent of mail-order m
137、arketing.The first marketing databases were simply lists of names,addresses,and telephone numbers used to mail or call people.These are still sometimes called“files”because originally,thats what they were:simple lists of names in a single file.The advent of relational databases(RDBMSs)in the early 1
138、980s made it possible to do a lot more with mail files than just use them to send to mail houses.First,managers could know whether they were customersand how valuable they wereby linking names to the billing database.This use case was probably the first nascent“customer relationship management”or CR
139、M system.In the late 1990s,digital technologiesemail,display,streaming video,social,and searchamped up the production of data.5 Companies now had granular data about how prospects and customers were interacting with different content across different publishers.This was also around the time that S t
140、ook the CRM to the cloud.6 These two major changes sparked the larger Martech explosion today,giving us the software to store and compute large amounts of data.Marketing Performance DataYet while companies today generate and store huge amounts of marketing data,it is more difficult than ever to make
141、 sense of it all.Customer,prospect,promotional,and response data exist,but they are spread between source systemsoften without the keys that allow analysts to“cross-walk”between them.What is conspicuously missing,in most cases,is a centralized repository of well-organized marketing data that feeds b
142、usiness intelligence,analytics,and data science.Concisely,marketing performance data includes stimulus,response,customer,and market data,across all channels,up and down the funnel,from broad-reach advertising to customer retention.To be accessible,these data should be categorized using a standard ta
143、xonomy that can be queried to answer holistic questions concerning go-to-market effectiveness.Marketing performance data should be stored in one organized place,typically called a marketing performance data warehouse(or lakehousemore on that later).The marketing performance data warehouse provides a
144、 single source of truth for many different use cases.It is helpful to think of this data warehouse as the“general ledger”for marketing.This general ledger can be used to create downstream data artifacts that serve almost any analytical purpose,including ID longitudinal records,panel/econometric data
145、 sets,stimulus-response data to feed predictive models,and enriched customer records to feed segmentation and targeting analysis.These same data can also sit upstream of business intelligence(BI)use-cases such as performance dashboards and tabular reports to give a snapshot view of how the business
146、is performing.21The marketing data warehouse is composed of a few common tables that exist in almost any enterprise.They may be named differently,but they serve common purposes.Stimulus or Promotion DataStimulus data is data collected on the materials used to prompt a response from consumers,for exa
147、mple,a flyer in your mailbox telling you to buy a new vacuum or a billboard encouraging you to call a number.7 These are examples of the two main types of stimulus data:the first is promotional data and the second is advertising data.Promotional data includes the records of each promotion or stimulu
148、s,either at the individual or tactic level(if the individual data is unavailable).The dataset should include the date,quantity,spend,creative,format,and any other necessary metadata keyed to a customer ID.Advertising data can be a part of promotional data;however,it is generally too aggregated to fi
149、t the strict structure of promotional data.Advertising data in this context refers to upper-funnel advertising that is not usually targeted at the customer level and would not be keyed to a customer ID.This dataset should include the date,impressions and/or rating points(GRPs),budget,channel,and any
150、 other necessary metadata,including marketing objective,creative approach,length,etc.Campaign or Cell DataThe idea of the campaign is to organize marketing stimulus into buckets.These buckets can be divided by marketing objective,creative approach,time,or other factors.Campaignsusually large,multi-q
151、uarter effortscan be further divided into smaller groupings,sometimes called cells.These groupings are important because of the data they include about marketingthe picklists,tags,and taxonomical structures that enable downstream analysis.Response DataResponse data complements stimulus data,includin
152、g leads(a prospect filling out a form),calls(calling in to a call center),orders,and applications.Essentially,response data are the desired outcomes of promotions.There are two main types of response data,interaction data and sales data.Interaction data includes the records of individual interaction
153、s with the company,including opens,clicks,views,leads,or applications.The data set should also include a date,a robust taxonomy,and any other necessary metadata keyed to a customer ID.Sales data is technically an interaction and could be a part of the interaction data,but it is more commonly separat
154、ed.Sales data are transactions tied to customers and include the sales channel used.22Customer DataCustomer data refers to any“personal,behavioral,and demographic data collected by marketing companies about their customer base”.8 This data is usually split into two sets:prospect/customer data and lo
155、okup data.Prospect/customer data refers to records of individual prospects and customers,keyed to one or more“identity resolved”IDs.An“identity resolved”ID connects many different IDs(foreign keys)across devices,locations,and touchpoints to build an omnichannel view of the consumer.9 Lookup data are
156、 first-and third-party data about individual prospects and customers.These data can be demographic or behavioral,and can include both PII(personally identifiable data)and non-PII.10 Lookup data are commonly used as features for machine learning models.In B2B organizations,the concept of a customer i
157、s more complex.Individual contacts belong to accountsand accounts can have multiple hierarchies and locations,as described in the section on CRM above.A robust linkage between contacts and accounts is critical for account-based marketing(ABM)approaches.B2B organizations also think more deeply about
158、the roles that individual contacts play at companies.A robust taxonomy of rolesthink influencer,blocker,engineer,decision-maker,budget approvalis important for designing B2B plays to individuals inside of accounts.Market DataMarket data set the context for the marketing organization.They are critica
159、l controls for media mix modeling,and provide a birds-eye view of the industry and general consumer behavior.Market data includes competitive stimulus,customer attitudes and product data.Competitive stimulus data provides the denominator in the share-of-voice calculation(your in-market spend divided
160、 by the in-market spend of competitors.)This is a key metric for understanding the effectiveness of more upper funnel,brand-focused marketing.Customer attitudinal data is information about how customers see the business.It is data about how the business wants to be thought about in terms of awarenes
161、s,affinity,comprehension,satisfaction,purchase intent,and how customers view the business.This type of data can also be qualitative,collected from interviews and surveys.11Product data is information about new product launches,pricing changes,and competitiveness for the company itself and its compet
162、itors if possible.23The Structure of the Warehouse(or Lakehouse)These five data types fit together in a simple physical,relational structure that can be extended for specific industry and channel use cases.Even though the specifics will differ depending on specific use cases,the basic idea of five k
163、ey tables will work for almost any marketing organization.The core of the warehouse/lakehouse is the person(or,in B2B contexts,sometimes called the contact,which is then associated with a company).The person table should be comprehensive;that is,every interaction with a human should have a correspon
164、ding person_id.In other words,the person table is the data artifact of identity resolution.The person table can be extended to multiple other human-related tables,like customer(where there is a known customer ID)or external,3rd party person tables that include demographics or other features that can
165、 be helpful in modeling.The promotions table houses all marketing stimulus.Many companies track direct marketing promotions in their marketing automation software,which is necessary for simple,last-touch attribution.However,a comprehensive promotions table should include promotions for all siloed sy
166、stems,as well as traditionally“spreadsheet-tracked”marketing stimulus data like upper-funnel television advertising.A well-designed promotions table can handle both direct-marketing-type and upper-funnel-type data.24Figure 6:A simplified structure for the marketing performance data warehouse25Figure
167、 7:A well-designed promotions table can handle both upper-and lower-funnel stimulusThe leads/orders table is the“y”to the promotion tables“x”,in linear regression terms.In other words,this table stores the activities that marketing is intended to drivewhether a form fill(lead),an actual e-commerce o
168、rder,a hand-off to a channel partner,or an inbound call to a call center.This table can be broken out into multiple stages of the funnel,but in our experience,this ends up causing more problems than it solves.The distinction between funnel stages can be solved by creating a“stage”field in the table,
169、and then self-referencing inside the table with an internal id.Figure 8:A lead/order table is the“y”to the promotion tables“x”,using linear regression termsThe campaign table can be thought of as the stimulus metadata repository for the organization.A note:we are using the term“campaign”broadly here
170、;it means a cell of marketing stimulus,as differentiated from other“cells”by differences in creative,channel,audience,timing,and other factors.Because“campaign”can mean different things in different organizations and marketing technology platforms,this can be an area of confusion.promotion_idpromoti
171、on_id date_timedate_timetime_scaletime_scaleperson_idperson_idcampaign_idcampaign_idbudgetbudgetimpressionsimpressionsadec2021 00:00:00NULLp123456789c123450.311b98765432103mar2021 00:00:00dayNULLc234566000.00500000Direct marketingDirect marketingUpperUpper-funnelfunnelAudience can be trac
172、ked at the campaign level,or in the promotions table if multiple audiences are reached per campaignTime scale NULL is instantaneousas it should be for all direct and digital(discrete)touches;otherwise,this indicates the duration of the spendperson_id NULL indicates upper-funnel(as persons reached ar
173、e unknown)Impressions will always be“1”for discrete/direct marketing toucheslead_idlead_iddate_timedate_timelead_typelead_typeperson_idperson_idpromotion_idpromotion_idcampaign_idcampaign_idadec2021 00:00:00leadp123456789a123456789c12345amar2021 00:00:00orderp123456789a12345678
174、9c12345LeadLeadOrderOrderLast-touch attribution;the promotion_id if traceableIf not traceable at a discrete level,the campaign_id tracks a broad-reach/upper-funnel effort(can also be NULL)26This table should hold everything we will need to know about marketings performance down the analysis funnel.A
175、nything that we will need to group,categorize,or understand should be in this table,or lookup tables connected to it.Put another way,the“picklists”of marketing technology will largely end up in the campaign table.Audience target,channel,line-of-business,marketing objective,vendor,and other metadata
176、should be tracked here,as well as the in-market and out-of-market dates for the campaign.Finally,what about all the market data that is critical for analysts,data scientists,and executives,but never ends up getting stored in an organized fashion?Market research studies,competitive information,macroe
177、conomic data,and channel informationto name a feware critical.These data can and should be stored in a market table.The market table is similar to the campaign table,but is more diverse in its scope.A market_type field defines what is being tracked.This scope can be expanded with further lookup tabl
178、es,but ultimately the table tracks metricsanything from attitudes to competitive spend to product performanceas numeric facts that are defined by categories and dates.Campaign_idCampaign_id in_market_datein_market_dateout_market_dateout_market_dateaudience_targetaudience_targetchannelchannel c123450
179、1dec202131mar2021Sassy Susansdirect_mailc1234601apr202130jun2021Angry Andysbranded_paid_searchOther metadata depending on industry and business-model context like line-of-business,pipeline stage,marketing objective,etc.Figure 9:The campaign table acts as the metadata repository for marketing promoti
180、ons/stimulusfact_id(pk)fact_id(pk)from_datefrom_dateto_dateto_datefact_typefact_typesub_typesub_typecell_idcell_idfdec202131mar2021surveyawarenessc123456789fapr202130jun2021competitive_spendadvertisingc123456790Cell in this case can be used as a lookup to hold additional metada
181、ta about a group of metricspotentially using a star schema approachFigure 10:The market table contains all the“messy”data that usually live in email inboxes or spreadsheetsAnalytical Data Artifacts Downstream from the WarehouseDownstream from the data warehouse/lakehouse,data scientists and business
182、 intelligence engineers rely on many different views or analytical artifacts to do their jobs.Two of the most common are the ID longitudinal record and the panel dataset.Both are useful in different waysparticularly for attribution use casesand it is important to consider the pros and cons of each w
183、hen choosing the type of dataset in which to store different types of data.Data organized in an ID Longitudinal Data schema are sometimes called Customer Data Platforms or CDPs.CDPs are“record level”each row is a single observation of something that happened to a customer.Robust CDPs contain many in
184、teractions per individual,starting far“up the funnel”,and continuing through purchase,customer service,and loyalty touches.Each customer ID can then also be linked to demographic informationfor example,the Amerilink fileto allow for additional insights.Channel touches(email,direct mail,search,etc.)c
185、an likewise be linked to campaign,cell,or offer codes to get a granular understanding of how different kinds of touches drive response.This data structure is necessary to do discrete multi-touch attribution.Panel data can be constructed from ID-level data;panel data is aggregated by cross-sections(f
186、or example,audiences,States,or DMAs)and time(days or weeks).In reality,there are not many CDPs capable of being transformed into robust panel data on their own.A comprehensive panel data set includes both ID-level data and pre-aggregated data,like upper-funnel media budgets or survey-based brand-tra
187、cking data.Panel data sets are useful for econometric estimationslike media mix models(MMMs)and are also“safer”from a data security perspective,as they dont include personally identifiable information(PII)or,even worse,sensitive payment or medical information.27ID Longitudinal RecordPanelProsConsPro
188、sCons Granular Lead-level apportionment Privacy constrained Data engineering difficulty Trouble with long-run effects Handles long-run effects well No privacy issues Collinearity Overreliance on Bayesian inferenceFigure 11:An ID longitudinal record in“CDP”format,and a panel dataframeChoosing an Arch
189、itectureFor the past two plus decades,the Data Warehouse architecture has been the preferred method of storing analytical data.In this approach,a single,keyed,relational database holds all relevant analytical data that is then accessed via SQL queries to either pre-built views or specific tables usi
190、ng joins.All of the data in 28the warehouse have gone through the same ETL process.12 This approach has many benefits,not the least of which is cleanliness and organization.Because data warehouses are(or should)be meticulously keyed,applications and programs are less likely to fail due to data chang
191、es in underlying sources.Also,because tables are designed to be joined via primary and foreign keys,it is more intuitive for analysts to understand the data.The left-most graphic in Figure 12 below illustrates this architecture.In the past several years,the Data Lake architecture has also become pop
192、ular.A Data Lake is essentially cloud-connected fast storage,where flat,raw data are stored in an organized fashion.Keys are only informal;in other words,a person ID can certainly exist in multiple tables,but not really enforced.Data Lakes serve dual purposes;both as a staging area for the Data Ware
193、house,and as a direct access to unstructured data for ad hoc purposes.Data scientists love Data Lakes because they can easily create their own“working”files there;they can load them up with fast-access file formats;and they dont have to wait for relational database engineers to certify new fields an
194、d/or tables in the Data Warehouse.The middle graphic below illustrates this architecture.A Data Lakehouse(right-most column in Figure 12 below)attempts to do away with the key relational Data Warehouse through an intermediate layer using something called Delta Lake.This essentially replaces the rela
195、tional database with several big data processes to manage metadata(handled by lookup tables in Data Warehouses);versioning and snapshotting,also called time travel and audit history;and schema enforcement(handled by primary keys in Data Warehouses.)13 The two biggest competitors in the“Data Figure 1
196、2:Traditional(data warehouse)approach;hybrid Data Lake+Data Warehouse approach;and emerging Data Lakehouse architecture.Adapted from“Lakehouse:A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics”,Armbrust,Ghosi,Xin,and Zaharia,2021.29Data ScienceData science is a fr
197、aught term.In many organizations,data science has become synonymous with predictive modeling and machine learning.The stereotypical data scientist spends countless hours tuning machine learning models to get another point of AUC(area under the curve)to get users to click on an email.This is unfortun
198、ate and incorrect.Data science is a broad field that includes all kinds of analysis.From inductive to deductive to predictive,across geographic,image,textual,and quantitative data,providing deliverables spanning from slides decks to interactive tools to models to printed papers(see Figure 13 below).
199、Data science is a way of doing things,not a thing in itself.In other words,it is a method of gaining insight.4Lakehouse”space are DataBricks and Snowflake.DataBricks is coming at the problem from the Data Lake side and Snowflake is coming at it from the Data Warehouse side,but in both cases the goal
200、 is to merge the Warehouse and Lake use cases into a single analytical platform.For now,most marketing analytics organizations are sticking with the middle architecture(in other words,not abandoning the data warehouse).The advantages of a well-organized data warehouse for the“What”use case are still
201、 clear.It is just too important to have reliable how-much/how-many answers on marketing spend and last-touch return;customer activity;and the lead-sales funnel to move to a Lakehouse approach entirely.However,for organizations just starting out,a Lakehouse approach is probably the way to go.It will
202、be easier for marketers to start with a Data Lake format,load a Data Lake with files from each system as described above,and then apply the metadata layer on top of these tables,rather than trying to start down the long road to building an enterprise-class data warehouse.Figure 13:The scope of data
203、science techniques that can be applied to marketing analytics problems is vast.30From a marketing analytics perspective,data science is a particularly appropriate toolkit for the why,who,and how jobs-to-be-done.(Data science can also be used for the What job,but this will be covered in the Business
204、Intelligence section).A data science approach will yield faster,more insightful,and more scalable results,when combined with robust source systems,data pipelines,and analytical data stores.Techniques Across the Jobs-to-be-DoneWhyData Detectives are energized by looking for answers.They spend a lot o
205、f time combining data sets,aggregating them in different ways,and comparing cross-sections and time periods.They use statistical toolslike t-tests and z-tests to compare means and ratesbut mostly they are looking for patterns.Why did sales fall during the campaign?Why are my most valuable customers
206、suddenly defecting?To do these tasks,Data Detectives need a document-based toolkit where code is embedded in workbooks that are then knittable quickly to reports.At MarketBridge,we use RStudio and the extendable RMarkdown/Quarto framework for this use case.Unlike Jupyter Notebooks,Quarto documents a
207、re fully text-based,making version control easy via Git/Github,and are then convertible into HTML reports,PowerPoint slide decks,or Word documents.Data Detectives need comprehensive access to the Data Warehouse,but they will always also need access to other ad hoc files.As much as data departments w
208、ish this away,there will always be random.csv and.xlsx files with campaign metadata,test/control information,or public data that must be used to answer questions.The best practice is to partition the Data Lake and allow Data Detectives to add filesalbeit with looser metadata definitions than more st
209、ructured parts of the Lake.Ultimately,much Why work ends up in PowerPoint decks,as it always has,and probably will be for a long time to come.Theres nothing wrong with this;but the analytical processes to get to these decks should be well-documented and reproducible,so that the next time a similar a
210、nalysis needs to be done,the analyst wont be starting from square one.WhoMaking segmentation actionable will always be a challenge for data scientists.Every few years,marketers inevitably want to think about their customers in a new way,and commission qualitativeand then quantitativemarket research.
211、The segments that emerge are generally compelling on paper but end up being hard to actually pick out in prospect and customer data.It pays to think about this upfront,selecting sample from the known universe to make downstream assignment easy.However,the most critical capability for a good Who anal
212、ysis is identity resolution.31Identity resolution doesnt have to be complicated.When profiling audiences,it is critical to be able to crosswalk data sources to create comprehensive,reliable views of individuals,households,offices,or companies.These data sources will include sampled data(surveys);thi
213、rd-party data;and first-party data.To make this possible,data scientists need a comprehensive set of individual ID lookup tablesideally stripping out PII(personally identifiable information)to avoid security risks.HowIn the Fall of 2021,we wrote an entire paper about MMM(Media Mix Modeling)and MTA(M
214、ulti-Touch Attribution)which treats this topic in far great detail.For the sake of this paper,well discuss at a high level the different ways to measure go-to-market effectiveness.Media Mix Modeling(MMM)is really a task for econometricians.Econometrics is the science of inferring causal relationship
215、s based on aggregated dataparticularly time series data.In many cases,the skills of traditional marketing data scientists arent well matched to what is a more traditional statistical discipline.However,vendors like MarketBridge can help by working with a companys internal data science team to build
216、an MMM code base,and then collaboratively maintain and extend it to consider new channels,two-stage effects(i.e.,upper-funnel brand attitudinal changes downstream impacts),and changes to audience and business unit dimensions.The most important thing a Marketing Analytics team can provide for good MM
217、M is solid panel data,which can also be used to create a marketing income statement.Multi-Touch Attribution(MTA)is a compelling promise,but is only feasible far down the funnel where first-party data is available.Identity resolution is key for MTA,just like it is for the Who job,and data should be o
218、rganized in a human longitudinal record format.What Will Happen(Machine Learning)Creating predictive models to find likely customers or to classify categories gets all the attentionand it is important for marketers.Predictive modelswhen used correctlycan make marketing more efficient and effective,a
219、nd even find new audiences.Machine learning is much more storage-and computation-intensive than the other four jobs,however(except for MTA under Howwhich is a bit like machine learning,anyway.)To run powerful models using hundreds or thousands of features and millions of rows,traditional relational
220、databases just arent fast enough.Machine learning frameworks like TensorFlow(for training neural networks),pySpark(an extension of Python for Spark),and sparklyr(an extension of R for Spark)rely on data lake-based file storage.This is the use case where Data Lakehouse environments really shineallowi
221、ng very fast compute and machine learning along with the metadata layer that marketers need to keep models relevant.32Beyond just building great models,its critical to document,version control,and manage model deployment.scikit-learn in Python provides a nice framework for the model building process
222、,and recently R has caught up with comprehensive model management with tidymodels.Once models are built,they can be deployed in a scalable way via RESTful APIs(flask in Python,plumber in R),allowing applications like S and Adobe to ingest them.Finally,all models should be catalogued and updated usin
223、g Git/Github(or its enterprise equivalents),ensuring that even old,obsolete models can be examined.Reproducibility and QualityIn most organizations,Marketing Analytics is tasked with churning out a very high volume of extremely high-quality work.In this case,quality means PowerPoints,dashboards,and
224、models delivered with very few or no errors,and very high levels of precision.Analytics departments that produce high-quality work have better reputations,last longer before being reorganized or eliminated,and produce more insightsbecause they spend less time fixing errors.There is a large body of w
225、ork on quality improvement and management,pioneered by Deming and Juran in the 1950s.14 Deming and Juran mention in times of ever-increasing demands for productivity,there are usually breakdowns in quality controls as productivity/quantity is emphasized over quality.15 Therefore,a more logical and s
226、tructured approach to quality control is necessary to balance productivity with quality.Deming and Juran see this as“involving a shift from simple end-product inspection to the development of quality practices aimed at actively preventing defects by implementing checks and controls earlier in the pr
227、oduction process.”16 This can and should be applied to marketing organizations and the data they use.Quality problems can creep into a marketing organization at any point.Starting at the beginning,incorrect data in Source Systemswhether raw data or metadatais the first problem to root out.Changes to
228、 underlying technology are common culprits of upstream problems,but any sudden changes to reported data should be first inspected for data issues.Even with correct source data,most data go through multiple steps before being used in an analysis,sometimes fed from database to database,with small but
229、meaningful transformations happening at each step.Source data might go through 5,10,or 100 transformations before it reaches a report,dashboard,model,or PowerPoint slide.A single datum presented in one chart might have been multiplied,added,or divided 15 times to get where it is.There is little chan
230、ce that this number will be correct the first time through a set of calculations.Multiply this problem out by hundreds or thousands of discrete data points,and you will have a mess.Calculation problemswhether due to typos,Excel formulas not dragged far enough,relative vs.absolute references,outer vs
231、.inner joins,or misused variables or intermediate data framesare a big problem for marketers.To eliminate this problem,it is critical to embrace reproducibility.Reproducibility has been a hot topic in science for the past decade,as high-profile findings have not been able to be replicated in subsequ
232、ent experiments.For marketers,a reproducible approach has a few concrete implications.First,minimize black boxes.Black boxes are shorthand for any step in a process that cant be picked apart into steps.These can be“virtual”black boxes(e.g.,another department owns the process and its just impossible
233、to see into),or“physical”black boxes(proprietary software).In either flavor,black boxes have an input/output and its not clear what happens in the middle.This makes them anathema to reproducible results.The opposite of black-boxes are open,code-based algorithms.This might sound complicated,but it si
234、mply means using a plain text program to get something done.The language itself isnt whats important;that plain text can be SQL,R,Python,SAS,or any other languagethe point is,it can be saved as.txt file,readable in something as simple as Notepad.This file can then be used by anyone with the same inp
235、ut data to produce the same exact output.The deterministic nature of this approach(input A goes into text file B produces output C)is its beauty.Think of this approach versus having an analyst do a bunch of Excel drag-and-drop operations;while it might be easier for the first team,it will drive comp
236、ounding errors in the long run.Second,analysts and data scientists should always use version control.Version control software tracks changes made to text-based files by different users,line by line.Used correctly,it allows marketing organizations to evolve,building on past assets.When individuals le
237、ave,their changes to data pipelines,reports,and analyses live onallowing new employees to build on their work,instead of reinventing the wheel every few years.Specifically,Git and GitHub are the most used version control tools.Git is the protocol for tracking changes for an individual programmer,and
238、 Github is the cloud-based remote repository where multiple users can coordinate their branches and changes.Finally,analysts should use reproducible artifacts,like Quarto documents and Jupyter notebooks.Reproducible artifacts are documents that have code embedded in them that actually runs.The best“
239、knit”from plain code and can be refreshed from upstream data sources at any time.They mix code,text,and graphics,to tell a story dynamically.Commonly used by data scientists for predictive analytics and machine learning,reproducible artifacts are being used more and more by“why”-focused data scienti
240、sts.Another way to think about this is that source footnotes are eliminated;the source is embedded in the document itself.3334Business IntelligenceBusiness intelligence tools are used for What reporting.What reporting provides information on how the business is performing across many KPIs(measures)a
241、nd dimensions.In the marketing arena,What reporting counts and sums advertising,direct marketing and digital stimulus,leads,orders,sales,customer logins and activity.Basically,anything that can be counted,up-and down-the funnel.These measures are then grouped into categories including geography,audi
242、ence,channel,and line-of-business.5 Quality-Control Tips for Analysts and Data ScientistsDo not underestimate the necessity of building and QA-ing your working dataset.Depending on complexity,a large amount of time should be spent on building and QAing the dataset.If the dataset is not correct,all r
243、esulting analyses will be wrong.A common benchmark is 70%of time should be spent on the dataframe build versus the statistics/machine learning work.When working with relational databases,be extremely cautious of join type.An inner join will strip out many records,which is fine if thats what you want
244、.But if you want to keep all your records,make sure to use a right(or left)join.To ensure you do not lose any records,a best practice would be to default to right or left joins unless you know exactly what you are dropping.Use checks at every step and transformation.Keep tabs on how many records you
245、 have at every query.If youre doing a lot of joins and dont know all the tables you are selecting from by heart,its not uncommon to end up getting duplicate records on joins.This will ultimately cause you to overcount whatever you are building the dataset for.It is important to know exactly how many
246、 records you have remaining or are dropping at each transformation or calculation.Be wary of NULL and NA values.Different programswhether Python,R,SQL,or SAStreat missing or invalid values differently.Assuming that NULL(does not exist)and NA(unable to parse)are the same thing will guarantee incorrec
247、t results.Take the time to understand the difference between these types of data,by language.Only convert NULL and/or NA to zeroes when you are certain they are actually zero.35Maximizing the Value of Marketing DashboardsMinimize clutter.Keep it simple.A dashboard is useful because it is supposed to
248、 show you exactly how you are doing against certain key performance indicators.Cluttering up the dashboard with unnecessary graphs,metrics,or information takes away from the usefulness.If someone must search,scroll,and switch tabs to see the information they want,the dashboard is no longer useful.12
249、Automate recurring calculations to update at a certain cadence.Dashboards become obsolete quickly because they often display old data,making them irrelevant for whoever is checking the dashboard.Automating recurring calculations to occur at a given cadence ensures that dashboards consistently displa
250、y the newest data available to give the viewer the best possible approximation of the current situation.Last-touch marketing effectiveness reporting is a good example.By looking at a simple lead report,a Marketing Manager might find that in the first quarter that Direct Mail drove leads at a$150 CPA
251、(cost per acquisition)and drove 15%of all inbound leads.However,these“simple”reports are too often missing or corrupted,forcing analysts and data scientists to spend valuable time getting ad-hoc answers to“drop everything”requests.In fact,most marketing analysts spend most of their time doing What r
252、eporting;it should be the goal of the Marketing Analytics department to automate at least 80%of What reporting via comprehensive business intelligence tools.Comprehensive business intelligence tools cover both the tabular reporting case and the visual use case.The tabular reporting case essentially
253、provides pivot tables that summarize the underlying data for detailed tracking.The visual use case covers dashboarding and summarizing the data in graphs or charts that are easily understood at first glance.Comprehensive means the tools count as much of the underlying data as possible so the time an
254、alysts spend on what reporting is brought down to a minimum.In many cases,Marketing Analytics teams start with the visual BI tool and are frustrated when accurate,timely,and actionable intelligence remains elusive.It is critical to remember that comprehensive business intelligence is dependent on a
255、good marketing performance data warehouse/data lakehouse,with robust and organized metadata,as outlined in Section 3.Even accurate dashboards suffer from low utilization.Modern dashboarding tools like Tableau and PowerBI are popular because they allow executives and leaders to immediately see a snap
256、shot of KPIs in a visually compelling way that is easy to understand.However,just because data can be in a dashboard does not mean it should be.The key value of a dashboard is in its simplicity and ability to show essential information.These are tips and tricks to maximize the value of your dashboar
257、ds.17*Adapted from Tableau 1736For Marketing Analytics organizations,this means letting analysts and data scientists talk to marketersand more than that,letting them be involved in the strategy process.Many problems can be solved with direct communication.Adding extra layers of project management le
258、ngthens delivery times and reduces deliverable quality.Individuals and interactions over processes and toolsWorking software over comprehensive documentationCustomer collaboration over contract negotiationResponding to change over following a planThat is,while there is value in the items onthe right
259、,we value the items on the left more.19 OrganizationOver the past decade,agile software development has completed its takeover of technology companies.Recently,some Marketing Analytics departments have attempted to follow suit.However,the results have been less than stellar in most cases.Many organi
260、zations have fallen prey to the“agile industrial complex”a set of processes and tools imposed upon teams from above rather than teams deciding the processes that work best for them.18 This top-down imposition of software such as JIRA and Microsoft DevOpsalong with professional scrum-mastersviolates
261、the spirit and letter of the agile manifesto:6 34Make dashboards flexible.Dashboards should never be designed with a single user in mind.Since each person using the dashboard may have a different goal in mind,it is important to enable filtering and use a tool that embraces maximum flexibility that c
262、an cater to a variety of use cases.Create dashboards with purpose.Dashboards should be created with a clear purpose in mind.Dashboard creators should have a clear understanding of the business problem or question the dashboard is supposed to answer.From there they can develop what would be the simpl
263、est and clearest method of answering that question or problem.Creating dashboards just because detracts from the value they can provide if used correctly.37SkillsMarketing as a discipline is on an inevitable journey from creative to analytical.Creativity and art dominated marketing(in the form of ad
264、vertising)in the mid-to late-twentieth century.Marketers were tasked with creating the catchiest jingle,most aesthetically pleasing packaging or flyer,or unique commercial.20 Starting in the 1970s,quantitative discipline began to seep in,both due to advancements in direct marketing and the increasin
265、g numbers of MBAs managing marketing.Over the last two decades,analytical and technical skills have become even more necessaryfrom marketing software to digital marketing strategy to attribution.However,in many marketing departments,hard skills still take a back seat.To overcome this deficit,marketi
266、ng software has remained menu-driven and simple.Even in 2022,when pushed,most marketers will admit to doing data transfer,BI,and simple modeling in Microsoft Excel.7Simple Kanban boards like Trello are a good choice to organize and prioritize work.These are extremely loosely organized lists of thing
267、s(cards)with descriptions,organized in columns.Theyre just that simpleand adding complexity beyond this simplicity is where they can go wrong.By creating columns relating to the jobs to be done,simply stacked in order of priority,a lot of clarity can be achieved without much overhead.At the same tim
268、e,clear long-term strategic goals are necessary;agility doesnt mean letting people work on whatever they want to work on.There should be enough direction from leadership to guide the Marketing Analytics team towards accomplishing its big picture goals;however,when it comes to accomplishing more day-
269、to-day objectives,teams should have the flexibility to complete their tasks in the ways that suit them the best.Figure 14:Sample Kanban board organized around jobs-to-be-done,with due dates.Individuals can also be assigned,and cards can contain much more detail.Once complete,cards can be moved to a
270、QA or test column.38Skillset,a framework developed at MarketBridge,divides the necessary technical marketing skills into five buckets:Necessary Technical Marketing Skills Fit Into 5 BucketsSource Systems.Marketers should understand the capabilities,technical approaches,and limitations of the marketi
271、ng and CRM tech stack.While most marketers are familiar with surface-level“GUI”aspects of tools like Salesforce and Adobe,the underlying architecture and approaches tend to be glossed over.Data Engineering.Moving data from system to system using traceable,code-based methods is increasingly table sta
272、kes for analytical marketers.Marketers should understand the different types of data structures(e.g.,flat files vs.hierarchical/key-value pair structures like JSON);querying languages like SQL;and scheduling approaches like Airflow.Data Science.Data science approaches analytics with reproducible met
273、hods.Swapping out Excel for Python and R means reaching the same result given the same input data;this translates to more reliability and higher credibility.In addition,the tools provided in publicly available,open-source libraries dwarf those available to analysts using Excel,Tableau,and PowerBI.Ma
274、rketing.As marketing has become more technical,the core disciplines of marketing sometimes fall by the wayside.Segmentation,targeting,and positioning are critical tasks that are too often poorly understood by managers.Likewise,leaving the tuning of performance marketingwhether offline or digitalto a
275、n agency is a mistake.This goes hand-in-hand with test design and reading.Communication and Visualization.Marketers are natural storytellers,but they should be equipped with the right tools for the job.PowerPoint will always be table stakes for communicating ideas,but it is increasingly important to
276、 have a mastery of both business intelligence tools like Tableau and PowerBI and of more reproducible,programmatic visualization and narrative tools like Jupyter Notebooks(Python)and RMarkdown and Shiny(R).The ideal next generation marketing team will be recruited with a baseline of technical skills
277、,and then trained and inculcated in an environment that prizes transparency and scalability.The Next Gen Marketing Maximizing the Value of Marketing DashboardsSource SystemsSource Systems New tech customized with downstream use cases in mind Metadata and taxonomies enforced in campaign designData Pi
278、pelinesData Pipelines Streaming,not just batch Reproducibility 39Analytical DataAnalytical DataStorageStorage Standardize around a person promotion order schema Dont fall for archives;keep minimum five years of everything(storage is cheap now)Consider a data lakehouse approach to enjoy the best of b
279、oth worlds Think about end-state data frames for specific use cases;econometric time series panel and human longitudinal records are good ones to start withData ScienceData Science Leave the What What use case to the Business Intelligence team Use code-embedded documents(e.g.Rmarkdown)to explore Why
280、 Why use cases Identity resolution is key for Who Who and What What(MTA)work Machine learning should be undertaken in a“factory”approach,with clear versioning and APIs for modelsBusiness Business IntelligenceIntelligence Streaming data alwaysdont bother otherwise One use case per report Use tabular
281、and visual reports for their intended purposeOrganizationOrganization Strategy-Analytics collaboration upfront Avoid the agile industrial complex Use a simple Kanban board to track tasks across the five jobs-to-be-doneSkillsSkills Keep data,analytics,and data science functions in the marketing teamn
282、o centers of excellence Build a nerdy team;embrace quantitative marketingMarketing Analytics has struggled for years with inadequate source systems,opaque data pipelines,inflexible data storage,obsolete tools,and teams that lack the appropriate skills for success.By embracing a reproducible,transpar
283、ent approach to the five key jobs-to-be-done,Marketing Analytics can begin to operate as an agile,high-quality service for the organizationdriving efficient growth.The lack of a systematic roadmap for Marketing Analytics functions has prevented many teams from starting this journey.Clear definitions
284、 of“good”in each of the seven areasSource Systems,Data Pipelines,Marketing Performance Data,Data Science,Business Intelligence,Organization,and Teams provide this clarity for leaders.By starting with a clear assessment of current capabilities,it is possible for Marketing Analytics leaders to priorit
285、ize change areas and start the journey towards excellence.Conclusion40CASE STUDY Modernizing Marketing Analytics for a Health Insurer40As is often the case,the client initially came to MarketBridge frustrated with the inability to apportion value(leads,applications,and revenue)to a broad range of ma
286、rketing tactics,up-and down-the funnel.This How use case served as the forcing function for an assessment and retooling of the marketing analytics function,from source systems through data platforms to data science methods and team members.We began by assessing the clients current state across the f
287、ive jobs-to-be-done and the seven functional areas.We found basic capabilities existed in Source Systems,but that data pipelines,data storage,and lack of metadata management were hindering the ability of the analytics team to scale.In addition,data science capabilities were still primarily Excel-and
288、 GUI dashboard-based,preventing asset building for the Why,Who,How,and What Will Happen use cases.The assessment was quickly pivoted into the creation of a 3-year roadmap to methodically add reproducible approaches to the team.Replacement of some upfront Source Systems was already underway,so the fi
289、rst order of business was ensuring that proactive taxonomies were shipped to the integration teams to ensure that relevant campaign,audience,and channel metadata would be operationally locked.Data pipelines were initially prototyped using batch Python,with the eventual transition to streaming approa
290、ches using Apache Airflow.Data landed in a Data Lake built on Amazon S3,with a metadata layer built over top using Apache Delta Lake.This allowed data scientists to quickly use extracted data without the delay and overhead of a formal data warehouse for the Why,Who,How,and What Will Happen use cases
291、.Source SystemsSource SystemsData PipelinesData PipelinesAnalytical DataAnalytical DataStorageStorageData ScienceData ScienceBusiness Business IntelligenceIntelligenceOrganizationOrganizationSkillsSkills14/259/196/194/202/81/48/16Figure 15:Initial assessment of capabilities across the seven function
292、al areas showed opportunities for improvements-particularly in reproducible data science.4141A keyed marketing data warehouse already existed using an older relational database technology.The basic structure was kept,and old tables were moved to an Amazon Redshift platform.New data were streamed fro
293、m the new Data Lake using Airflow.This relational data warehouse fed all business intelligence for the What case.Marketing attributionthe original goal of the workwas used as a forcing function for the data lake(house).A cross-sectional time series data framethe preferred data artifact for econometr
294、ic modelingwas created for the purpose of both media mix modeling and the creation of a marketing“profit and loss statement.”This data frame was ingested using R,and the media mix model was developed over the course of three months by a hybrid MarketBridge-client team.All code was synched and tracke
295、d using Git and Github,which was also used for Python/Airflow data transformation code and the Delta Lake metadata layer.This,along with a rigorous adherence to readmes and package documentation,ensured long-run scalability.What started as an eight-week marketing effectiveness assessment ended up as
296、 a multi-year transformation.From struggling to thriving,the Marketing Effectiveness team operates effectively across What,Why,Who,How,and What Will Happen use cases,uses reproducible approaches,and a fast,metadata-enriched,scalable data environment.1”Gartner Survey Reveals Marketing Analytics are O
297、nly Influencing 53%of Decisions,”Stamford,CT:Gartner,September 15,2022,https:/ https:/en.wikipedia.org/wiki/The_Mythical_Man-Month#:text=The%20Mythical%20Man%2DMonth%3A%20Essays,schedule%20delays%20it%20even%20longer3 Craig Stedman,“What is data governance and why does it matter?,”TechTarget,May 202
298、2,https:/ Tho Nguyen,“The Value of ETL and Data Quality,”S-AS Institute Inc.,accessed November 2022,https:/ Alex Ross,“The History of Marketing Analytics,”Unsupervised,January 19,2022,https:/ Tom Percival,“Using Stimuli Effectively in B2B Market Research,”B2B International,January,25,2018,https:/ ht
299、tps:/en.wikipedia.org/wiki/Customer_data9“What is Identity Resolutions?,”Chicago,IL:TransUnion,August,23,2021,https:/ Finn Bartram,“4 Types of Customer Data You Should Be Using,”CX Lead,accessed November 2022,https:/ Megan Allinson,“Improving Marketing Outcomes with Four Types of Data,”BKM Marketing
300、,accessed November 2022,https:/ John Kutay,“Data Warehouse vs.Data Lake vs.Data Lakehouse:An Overview of Three Cloud Data Storage Patterns,”striim,accessed November 2022,https:/ Juran,“The History of Quality,”Juran,March 4,2020,https:/ Dashboards:The Dos&Donts,”Seattle,WA:Tableau Software,LLC,access
301、ed November 2022,https:/ Martin Fowler,“The State of Agile Software in 2018,”martinF,August 25,2018,https:/ for Agile Software Development,”accessed November 2022,https:/agilemanifesto.org/20 Mark Bonchek and Cara France,“What Creativity in Marketing Looks Like Today,”Harvard Business Review,March 2
302、2,2017,https:/hbr.org/2017/03/what-creativity-in-marketing-looks-like-today43Innovating the Way Businesses Grow RevenueHow We Solve the Marketing Analytics ProblemWe help clients answer the tough questionsscalability and repeatability.By using both discrete(machine learning)and aggregated(econometri
303、c)techniques,we solve advanced marketing and sales problemswhile helping our clients transform into agile analytics organizations.Learn more about our marketing mix and attribution methods We are the leading provider of rapidly scalable,innovative go-to-market solutions for Marketing,Sales,and CX leaders.For 25+years weve delivered customer-centric,data-driven strategies to Fortune 1000 clients.Our unique approachGo-to-Market Scienceuses analytics and insight to drive revenue growth and customer value.2,500+Client engagements75+Fortune 1000 clients10+Stevie Business awardsmarket-