上海品茶

您的当前位置:上海品茶 > 报告分类 > PDF报告下载

杨梦月_causal_decision_making_white.pdf

编号:155540 PDF 101页 4.11MB 下载积分:VIP专享
下载报告请您先登录!

杨梦月_causal_decision_making_white.pdf

1、Causalityfor Decision MakingMengyue YangUniversity College London Email:mengyue.yang.20ucl.ac.ukWebsite:https:/ymy4323460.github.io/1Whats causal decision making Causal inference is the process of understanding the cause-and-effect relationships between variables or events.A decision-making system i

2、s an approach used to make informed choices in various contexts.2Causal decision-making system usually apply causal inference technics to make a better decision to meet the needs of the requirements of explanation,generalization or safety etc.Causality and Decision Making3 Reasoning:Understanding th

3、e factors in the system Making decision:Learning how to take actionsCausal modelAgentEnvironmentDecision making systemcontext/staterewarddecision/actionAdvantages of causal decision making Clarifying Causal Relations:Identify the key factors and avoid being misled by spurious correlations.Enhancing

4、Decision Accuracy and Effectiveness:predict theoutcomes make the wisest choices.Reducing Decision Risks:Identify potential bad effects and avoid risks of generalization.4Outline Backgrounds Intro to causality and decision-making system.Current causal decision-making method Causality in Static and Dy

5、namic system,including environment understanding,learning to intervene,counterfactual reasoning Advanced topic Challenges about causality in LLM agents5The technical details please refer to the related papersBackground:Causal Inference The Pearls Hierarchy6Correlation doesnt mean causality7Causal in

6、formation rather than commonly used association has more generalization ability in prediction task.thermometer temperatureIce cream salesCausalityCorrelation weatherThe philosophy of Causality8 Descartes ascribed cause to eternal truth.The truth of the worldThe truth of the world in our headImagine

7、what will happen-ConsciousnessJudea Pearl.CausalityCausal Diagram9CDRContextDecisionRewardD=(,!)=(,)Causal Model(,)(|)=Decision effectCausal Diagram10Causal ModelD=(,)Intervention(|=)Observational(|=)InterventionalCDRContextDecisionRewardCDRContextDecisionReward=!($)=Intervention and correlation pro

8、bability11CancerSmoking =(|)CancerSmoking =()GenotypeCausal Diagram12Causal ModelD=(,)Counterfactual(!#$!|=,=)CounterfactualCDRContextDecisionRewardCDRContextDecisionReward=!($)=Structure Causal Model Endogenous Variables:=!,.,the variables in the system13A structure causal model M is a tuple of fac

9、tors,()!#$%Structure Causal Model Exogenous Variables:U=!,.,&,the variables out of the system,but have causal effect in system14A structure causal model M is a tuple of factors,()!#$%&Structure Causal Model Confounder:the variable is a confounder if and only if itinfluence both cause and effect15The

10、 confounder will influence the causal estimation,the method doesnt consider confounder may lead to the estimation error which called the Simpsons Paradox.!#$%&Simpsons Paradox16Structure Causal Model Functions:F=!,.,the generative function determine endogenous variables=,where ,.17A structure causal

11、 model M is a tuple of factors,()!#$%&Pearls Causal Hierarchy18AssociationInterventionQuestion:What if?What will happen if someone keep smokingQuestion:What is?What does thesmoking tell us about the lung cancer.CounterfactualQuestion:Was it?Will the lung cancer get worse if someone smoking.SeeingDoi

12、ngImaginationPearls Causal Hierarchy19AssociationInterventionQuestion:What if?Reinforcement LearningQuestion:What is?(Un)Supervised LearningCounterfactualQuestion:What if?Retrospective Model-based RLSeeingDoingImaginationPearls Causal Hierarchy20AssociationInterventionQuestion:What if?(|()Question:W

13、hat is?(,)CounterfactualQuestion:What if?()!|=,=)SeeingDoingImaginationProbabilistic ModelingCausal ModelingCausal ModelingCausal Diagram21Causal ModelD=(,)Intervention(|=)Observational(|=)InterventionalCDRContextDecisionRewardCDRContextDecisionReward=!(*)=When the underling causal model is unknown,

14、the intervention probability cannot be directly inferred from the Observation dataCausal Diagram22Causal ModelD=(,)Counterfactual(!#$!|=,=)CounterfactualCDRContextDecisionRewardCDRContextDecisionReward=!(*)=When the underling causal model is unknown,the counterfactual probability cannot be directly

15、inferred from the interventional probabilityCausal Diagram23CDRContextDecisionRewardCausal ModelCDRContextDecisionRewardIntervention=!($)=Understanding the underling SCMs is a prerequisite for inferringintervention and counterfactualBackground:Causal Inference from Observation data24The scenario whe

16、re ideal intervention distribution is hard to get Observation from the world25Observation is the Mixture of factorsUnknown causal relationsUnderstanding the world26 How to identify the causal structure,causal effect from the pure observations?It was decided by the property of data and the form of mo

17、del function butnot related to the way we train the model.Understanding the world Causal Disentangle27 Causal Discovery!#$Observation State/RepresentationCausal Graph,=,Structure Causal ModelsCDRContextCDRContext=!($)=Intervention identification28(|(=)Understanding the world(|)DecisionRewardDecision

18、RewardCDRContext!($)=Counterfactual estimation29(|(=)Understanding the world(!#$!,!#$!|=,=,=)DecisionRewardCausal Disentangle30CausalDecoderMaskEncoder,Inference Generate=+=#Causal disentangle aims at finding the causal factors from observation data.Yang et al.,Suter et al.,Besserveet al.The causal

19、factors might have the causal relationships.Identifiability in Disentanglement IdentifiabilityUniquely determine the representation of each factor from observed data.Khemakhem et al.1,Khemakhem et al.2(,)=(|)Representation contains all the information of the underling factors.31Causal Direction Disc

20、overy Independent causal mechanism D-Separation32A fork or a chain such that the middle vertex is in,or a collider such that middle vertex,or any descendant of it,is not in.Intervention Identification Front door criterion33 Back door criterionCancerSmokingGenotypeCancerSmokingGenotypeIntervention Id

21、entification34 Back door criterionCancerSmokingGenotypeIntervention Identification35 Back door criterionCancerSmokingGenotypeFront door criterion Front door criterion36CancerSmokingGenotypeFront door criterion Front door criterion37CancerSmokingGenotypeCausal Diagram38CancerSmoking =(|)CancerSmoking

22、 =()GenotypeCancerSmokingGenotype =noncomputableCancerSmokingGenotype =computableCounterfactual Estimation39Abduction-action-prediction Abduction:deriving the posterior of the exogenous variables =)Action:modifying causal graph G by removing the edges going into and set =(intervention)to derive y =,

23、Prediction:computing the distribution *+,-)=.=)Conclusion of Causal Inference so far Basic Concept Association,intervention and counterfactual and estimation Models Structure Causal models40Background:Decision Making System41Static decision making Goal oriented:making decision by modeling the enviro

24、nment.Doesnt care about long-term interactionOnline advertising,auction,recommendation,healthcare42Big Picture Making decision by maximizing the short-term reward(user feedback)or just by the rules.The method relies on modeling the environment.43Agentdecision/actionEnvironmentCollect asdatasetcontex

25、t/statemodeling environmentBig Picture The decision based on the domain knowledge,prior rules or the model learned from historical data.Learning to make decision without directly interaction with environment like planning and MBRL.44Agentdecision/actionEnvironmentCollect asdatasetcontext/statemodeli

26、ng environmentAn example Agent(the system)making decision(provide impression list)based on context.The decision aim to get well user selection.45CRSContextRecommendation List User SelectionAgentDynamic decision makingGoal oriented:making decision to maximize long-term reward.The decision based on th

27、e interaction with environment.46The dynamic decision making system Making decision to maximize environment reward.Making decision by interaction with environment.47AgentEnvironmentcontext/staterewarddecision/actionThe dynamic decision making system General approach using reinforcement learning(RL)/

28、online learning.48AgentEnvironmentcontext/staterewarddecision/actionFactors in RL Observation():The observation from environment State():the feature to describe current state of environment and agent Action():Agent takes action to interact with environment.Reward():the environment feedbacks regardin

29、g action in current state.Policy():the probability to take action/=(/)Transition:The probability of next state(/0!|/,/)49Static&Dynamic50InteractionPolicy StaticNo interactionThe policy based on maximizing the potential reward based on model or just prior rules.DynamicInteraction with environmentMax

30、imizing the long-time reward.Causality for Decision Making51Big Picture of Causal Decision Making Reasoning environment Making better decision 52Causal modelAgentEnvironmentDecision making systemcontext/staterewarddecision/actionCausal Decision Making Understanding the world/environment What to inte

31、rvene Whats the counterfactual results Better explanation/reasoning ability Decision for generalization,robustness and sample efficiency53Tasks for Causal Decision Making54Understanding causal variableScenario:POMDP,Static DMTasks:Causal disentangleUnderstanding causal modelScenario:RL,Static DMTask

32、s:Causal discoveryWhere to interveneScenario:RL,Static DM Tasks:InterventionidentificationUnderstanding CounterfactualScenario:Static DMTasks:Counterfactual estimationReasoning Decision Making General process55Understanding the worldCausal disentangleEnvironment estimationWhat to interveneFind the a

33、ction worth to takeCounterfactual inferenceEnhance the imagination Reasoning Decision Making Decision Making Causal Decision Making Tasks Causal disentangle in RL Sontakke et al.Environment estimation:Li et al.1,Zholus et al,Ding et al.,Liu et al.Where to intervene:Wang et al 1,Huang et al.,Counterf

34、actual imagination:Li et al.,Yang et al.2,Pitis et al.56Causality for Decision Making57Understanding the world by using causalityCausal Curiosity58 Understanding the causal world Sontakke et al.Causal Curiosity59 Classical POMDP:,observation space,state space,action space,the transition function,emi

35、ssion function,and the reward function.Causal POMDP The state are divided into the controllable state,and the uncontrollable state-Causal Curiosity60 Causal POMDP The transition function The Observation if a body on the ground(i.e.,state/1)is thrown upwards(i.e.,action/),the outcome/0!is caused by t

36、he causal factor gravity(i.e.,234(,/1,/)=),Causal Curiosity61 The Experiment Planner:allow the agent to discover action sequences such that the resultant observation trajectory is caused by a single causal factorCausal Curiosity62 Causal Inference Module:Inferring the related representation by obser

37、vational data.Causal Curiosity63 Interventions on beliefs:recursively intervene the environment by generated actionsCausal Curiosity64Explain the World Yu et al.65 Learning the causal model to explain the world Factorized MDP:,.Each state is factorized into n state variables.SCMs:The structure causa

38、l model AIMs:The action influence model SCM to explain the world that can be converted to an AIM based on specially-designed structural equationsExplain the World SCM:The model formalizes the causal relationships between multiple variables.AIM:a causal model for RL,to generate explanations about why

39、 the agent take some actions.66Explain the World67 Causal Discovery,between current step =(,)and next step =(,)(5)(5 6),Causal Influence network(AIM)(5(5)Connection between SCMs and AIMExplain the World68Explain the World The causal model might decline the efficiency of RL and planning69Explanation

40、and Planning Goal Orientation:Considering the causal explanation and the goal of the task,simontanously.70When and Where tointervene/take action for a better performance?Causality for Decision Making71What to interveneCausal Enhanced Decision Rarely in control of the object of interest Physical cont

41、acts are hard to model Objects are enabling manipulation towards further goals.72Knowing when and what the agent can influence with its actions Seitzer et al.Causal Enhanced Decision Agents can be rewarded with a bonus for visiting states of causal influence.Such a bonus leads the agent to quickly d

42、iscover useful behavior even in the absence of task-specific rewards.73Causal Enhanced Decision Modeling the environment Independent Causal Mechanism74Causal Enhanced Decision Empirical Evaluation of Causal Influence Detection75Causal Enhanced Decision Improving Efficiency in Reinforcement Learning

43、Better state exploration through an exploration bonus.Causal action exploration.Prioritizing experiences with causal influence during training.76Causal Enhanced Decision Causal Action Influence as Reward Bonus.77Reward of the goal+Reward of the satisfaction of causal influence detectionCausal Enhanc

44、ed Decision Following Actions with the Most Causal Influence.78Causal Enhanced Decision Causal Influence-based Experience Replay Prioritizing According to Causal Influence.influence-based prioritization(CAI-P),hindsight experience replay(HER)79Effectiveness of causalitySample Efficiency Seitzer et a

45、l.Generalization Ding et al.Explanation Yu et al.80Causality for Decision Making81The Wings of Counterfactual ImaginationCounterfactual estimation in decision making General Process Learning functions in SCMs -Abduction:find exogenous variables-Action:generating new training samples -Prediction:diff

46、erent generate policy:random,learning based Producing better samples,help to get better decisions.82Counterfactual estimation in decision making Generate counterfactual data.Yang et al.2 Randomly augmented samples:debias from historical policy Goal oriented augmented samples:better rewards.83Counter

47、factual estimation in decision making Counterfactual estimation in World Models Li et al.84Counterfactual estimation in decision making Counterfactual estimation in World Models85Counterfactual estimation in decision making Counterfactual performance and efficiency86Advanced Topic:Challenges in Reas

48、oning and Decision Making of Causal LLM87LLM basics88Training and Fine-tuning Supervised fine-tuning(SFT).Reward model(RM):reinforcement learning via proximal policy optimization(PPO)on this reward model.In-context Learning Directly inference by providing prompts Can LLM tell causal rather than asso

49、ciation?89Problem 1:Unstable.Fail to determine implicit causal but it can tell the explicit causal relationships.Gao et al.It can only find causal under specific prompt.Zeevi et al.,Hobbhahn et al.Fail to find causality under very complex sentence which contains lot of factors.Gao et al.Can LLM tell

50、 causal rather than association?90 Problem 2:AI Hallucinations.From the bias between factual and counterfactual observations(data level)From the training and fine-tuning policy like RLHF(training level)From the advanced technology like CoT and in context learning(inference level)Can LLM tell causal

51、rather than association?91Question:What is heavier:A kilogram of metal or a kilogram of feathers?”Answer:A kilogram of metal is heavier than a kilogram of feathers.Question:A kilogram of metal is heavier than a kilogram of feathers”Answer:They weigh the same.The boundary of LLMs causal ability Zhang

52、 92Type 1:Identifying causal relationships using domain knowledgeExample 1:Patient:Will my minor spine injury cause numbness in my shoulder?Example 2:Person:I am balancing a glass of water on my head.Suppose I take a quick step to the right.What will happen to the glass?Type 2:Discovering new knowle

53、dge from dataExample 1:Scientist:In a new scientific experiment.I observe two variables A and B which were A causes B or B causes A.Example 2:Marketing specialist:I plan to launch a new membership program different from our competitors X and Y.There are two ways to design the benefit as members.The

54、first is buy four and get a fifth one for free,and the other is get 20 dollar cash return for every 100 dollar spend.Which one should I choose?The boundary of LLMs causal ability Zhang 93Type 3:Quantitative estimating of the consequences of actionsExample 1:Sales manager:I have 1000 dealers with the

55、 following information about them.I can only give membership to 100 of them next year.I want the membership program provides the highest revenue growth.Which 100 dealers should I choose?Example 2:Medical doctor:This is the third time that this patient has returned with lumbago.The epidural steroid i

56、njections helped him before,but not for long.I injected 12mn betamethasone the last two times.What is the dose that I should use this time?Why LLM can not tell causality stably?Why LLM can not tell causality stably?94 Bias in training/inference data:lack of counterfactual data.Lack of explainable ex

57、plicit identifiable causal relationships/representation in model designing.Lack of causal/counterfactual learning form like learning strategy or objectives.It will produce bias.The inference process not include causal restrictions.Future work:what we could do?Future work:what we could do?Let LLM get

58、 the ability of understanding the causal mechanism Data Level The counterfactual data collection Model Level Explicit and Implicit causal model Method Level Causal constraints In-context learning Better Instruction95Thank You!More question feel free to reach me at mengyue.yang.20ucl.ac.uk96Reference

59、 Suter et al.Robustly disentangled causal mechanisms:Validating deep representations for interventional robustness.Besserve et al.Counterfactuals uncover the modular structure of deep generative models.Yang et al.1 Causalvae:Disentangled representation learning via neural structural causal models.Kh

60、emakhem et al.1 Variational Autoencoders and Nonlinear ICA:A Unifying Framework.Khemakhem et al.2 Ice-beem:Identifiable conditional energy-based deep models based on nonlinear ica97Reference Sontakke et al.Causal Curiosity:RL Agents Discovering Self-supervised Experiments for Causal Representation L

61、earning Yang et al.2 Top-N Recommendation with Counterfactual User Preference Simulation.Li et al.Causal World Models by Unsupervised Deconfounding of Physical Dynamics.Zholus et al.Factorized World Models for Learning Causal Relationships.Ding et al.Generalizing Goal-Conditioned Reinforcement Learn

62、ing with Variational Causal Reasoning98Reference Wang et al.1 Causal Dynamics Learning for Task-Independent State Abstraction.Yu et al.2 Explainable Reinforcement Learning via a Causal World Model.Huang et al.Action-Sufficient State Representation Learning for Control with Structural Constraints Liu

63、 et al.Learning World Models with Identifiable Factorization.Pitis et al.MOCODA:Model-based Counterfactual Data Augmentation.99Reference Lee et al.Characterizing Optimal Mixed Policies:Where to Intervene,What to Observe Seitzer et al.Causal Influence Detection for Improving Efficiency in Reinforceme

64、nt Learning Gao et al.Is ChatGPT a Good Causal Reasoner?A Comprehensive Evaluation Zeevi et al.Causal Parrots:Large Language Models May Talk Causality But Are Not Causal Hobbhahn et al.Investigating causal understanding in LLMs100Reference Zhang et al.Understanding Causality with Large Language Models:Feasibility and Opportunities Hammond et al.Reasoning about Causality in Games Pearl.Causality.101

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(杨梦月_causal_decision_making_white.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
会员购买
客服

专属顾问

商务合作

机构入驻、侵权投诉、商务合作

服务号

三个皮匠报告官方公众号

回到顶部