《因果科学及其工业界落地.pdf》由会员分享,可在线阅读,更多相关《因果科学及其工业界落地.pdf(78页珍藏版)》请在三个皮匠报告上搜索。
1、因果科学及其工业界落地DataFunSummitDataFunSummit#20232023秦旋-快手-增长算法工程师ContentsWhy CausalData Flow Specification for Causal Inference ImplementationModel SelectionEvaluation and simulationOptimal Problems with Limited ResourcesWHY CAUSAL?What is correlation?WHY CAUSAL?What is correlation?An example:WHY CAUSAL?W
2、hat is correlation?An example:there is a statistical association between the number of people who drowned by falling into a pool and the number of films Nicolas Cage appeared in in a given year.However,there is obviously no causal relationship.WHY CAUSAL?What is correlation?Another example:We observ
3、ed students who wear glasses have better gradesWHY CAUSAL?What is correlation?Another example:We observed students who wear glasses have better gradesWHY CAUSAL?What is correlation?Another example:We observed students who wear glasses have better gradesWHY CAUSAL?What is correlation?Another example:
4、We observed students who wear glasses have better gradesRisk FactorComponent CausesEffectOutcomeWearingGlassesSpent more time for studyBetter preparationBetter gradesWHY CAUSAL?Issues while applying Correlation Methods in finding causal effect WHY CAUSAL?Issues while applying Correlation Methods in
5、finding causal effect WhereW:Study DurationX:If Wear GlassesY:GradeWHY CAUSAL?Issues while applying Correlation Methods in finding causal effect WhereW:Study DurationX:If Wear GlassesY:GradeWHY CAUSAL?Issues while applying Correlation Methods in finding causal effect WhereW:Study DurationX:If Wear G
6、lassesY:Grade _1WHY CAUSAL?How Causal tools help us?WHY CAUSAL?How Causal tools help us?WHY CAUSAL?How Causal tools help us?WHY CAUSAL?How Causal tools help us?WHY CAUSAL?How Causal tools help us?Back to glasses case:=0WHY CAUSAL?How Causal tools help us?WHY CAUSAL?Data Flow for implementation of Ca
7、usal toolsData Flow for implementation of Causal tools Magic of RCTData Flow for implementation of Causal tools Magic of RCTData Flow for implementation of Causal tools Shortcomings of RCT Expensive Lack of generalizabilityData Flow for implementation of Causal tools Shortcomings of RCT Expensive La
8、ck of generalizabilityData Flow for implementation of Causal tools Shortcomings of RCT Expensive Lack of generalizabilityData Flow for implementation of Causal tools Shortcomings of RCT Expensive Lack of generalizabilityData Flow for implementation of Causal tools RCT DesignData Flow for implementat
9、ion of Causal tools RCT DesignNested Design:Directly sample randomly from the target population,divide into two groups,with one group as the RCT experimental group and the other group as the strategy experimental groupNon-Nested Design:Using different sampling mechanisms from the target population,o
10、btain the RCT experimental group and the strategy experimental group.Data Flow for implementation of Causal tools RCT DesignNested Design:Directly sample randomly from the target population,divide into two groups,with one group as the RCT experimental group and the other group as the strategy experi
11、mental groupNon-Nested Design:Using different sampling mechanisms from the target population,obtain the RCT experimental group and the strategy experimental group.Data Flow for implementation of Causal tools How to Design1.Define Our Target Population clearly Data Flow for implementation of Causal t
12、ools How to Design 1.Define Our Target Population clearly Having a clear understanding of our target population is crucial in the industry.It can help us save costs and ensure the accuracy of the collected random data.For instance,when we develop strategies,there are often specific rules for certain
13、 users,and these rules are often easy to overlook.For example,some users with special properties are often given one or a few treatments.Therefore,if we do not define the target population well and include these groups,it will lead to an uneven distribution of samples under each treatment.Data Flow
14、for implementation of Causal tools How to Design 1.Define Our Target Population clearly Make Sure RCT take highest priority in online strategy code.good log records.When each sample is recorded,only one actual strategy key is recorded to facilitate correct data extraction.Data Flow for implementatio
15、n of Causal tools How to Design2.Shuffle before RCT&shuffle regularlyData Flow for implementation of Causal tools How to Design2.Shuffle before RCT&shuffle regularly Shuffling experimental traffic is an important and often overlooked step.Shuffling before the experiment is a key step in ensuring tha
16、t our RCT data is consistent with strategy data.RCT experiments themselves are a type of strategy in a broad sense and can also have a significant impact on the sample distribution of the experimental group.Therefore,it is important to shuffle traffic regularly or sample RCT data from the whole popu
17、lation every time.Data Flow for implementation of Causal tools How to Design3.Correct usage of featuresData Flow for implementation of Causal tools How to Design3.Correct usage of features In industry,there are usually two options for sample dimensionsData Flow for implementation of Causal tools How
18、 to Design3.Correct usage of features In industry,there are usually two options for sample dimensions User dimension:always give a user one Treatment until the end of an experimental cycle.Request dimension:each time a request comes in,the system randomly assigns a Treatment.Data Flow for implementa
19、tion of Causal tools How to Design3.Correct usage of features User dimension Only the features before the users first request can be used Request dimension features before the current request can be usedData Flow for implementation of Causal tools How to Design3.Correct usage of features User dimens
20、ion Only the features before the users first request can be used Request dimension features before the current request can be usedThe decision on which RCT to use depends on the characteristics of your business.Cumulative Causal Effect:user dimensionSingle Causal Effect:request dimensionData Flow fo
21、r implementation of Causal tools How to Design4.Online RCTData Flow for implementation of Causal tools How to Design4.Online RCTCompared to conducting a large-scale RCT during a specific time period,a small-scale RCT with continuous traffic is more cost-effective.This is because:It provides us with
22、randomized data that is always similar to the current population distribution.It allows us to be more flexible in changing treatments and avoid waste.It helps facilitate automated updates to the model.Model SelectionModel SelectionSome Popular Causal ToolsModel SelectionCausal ForestModel SelectionG
23、eneralized Causal ForestModel SelectionGeneralized Causal ForestModel SelectionPolicy Learning with Causal ForestModel SelectionPolicy Learning with Causal ForestModel SelectionPolicy Learning with Causal ForestModel SelectionPolicy Learning with Causal ForestStandard for splitting can be:ROI,Qini S
24、core,Distribution distance Model SelectionCBIVModel Selection Imagine a scenario E-commercial platform issuing subsidies to customersCBIVModel Selection Imagine a scenario E-commercial platform issuing subsidies to customersCBIVcouponModel Selection Imagine a scenario E-commercial platform issuing s
25、ubsidies to customersCBIVcouponactionModel Selection Imagine a scenario E-commercial platform issuing subsidies to customersCBIVcouponactionModel Selection Imagine a scenario E-commercial platform issuing subsidies to customersCBIVcouponactioncostModel Selection Imagine a scenario E-commercial platf
26、orm issuing subsidies to customersCBIVcouponactioncostfeatureModel Selection Imagine a scenario E-commercial platform issuing subsidies to customers Direct reason affect users action is cost,not coupon.CBIVcouponactioncostfeatureModel Selection Imagine a scenario E-commercial platform issuing subsid
27、ies to customers Direct reason affect users action is cost,not coupon.CBIVcouponactioncostfeatureModel Selection Imagine a scenario E-commercial platform issuing subsidies to customers Direct reason affect users action is cost,not coupon.CBIVcouponactioncostfeatureModel Selection Imagine a scenario
28、E-commercial platform issuing subsidies to customers Direct reason affect users action is cost,not coupon.CBIVcouponactioncostfeatureCoupon is an IV Under RCTModel SelectionCBIVcouponactioncostfeatureModel SelectionCBIVTreatment RegressionConfounder BalancingOutcome RegressionModel SelectionCBIVIn R
29、CTTreatment RegressionOutcome Regression=1=1 0 2 2Model SelectionOther de-confounding methodsModel SelectionOther de-confounding methodsFeature decompositionDMLFront-door Path AdjustmentBackdoor Path AdjustmentModel SelectionOther de-confounding methodsFeature decompositionDMLFront-door Path Adjustm
30、entBackdoor Path AdjustmentEvaluation&SimulationEvaluation&SimulationEvaluation&SimulationRCTD(RCT)=D(ODB)SolverPredict on RCT datasetcalculate the optimal solutionperformance analysisSimulate economic profit on RCTAnalysis of strategyDisplay cost-reward curvesimulationEvaluation&Simulationsimulatio
31、nCost-reward curveComparison between modelsPerformance analysis by featuresOptimal Problem with Limited budgetOptimal Problem with Limited budgetFromrom thethe primalprimal to to thethe dualdualOptimal Problem with Limited budgetFromrom thethe primalprimal to to thethe dualdualOptimal Problem with Limited budgetFromrom thethe primalprimal to to thethe dualdualOptimal Problem with Limited budget