《基于因果推断的推荐系统.pdf》由会员分享,可在线阅读,更多相关《基于因果推断的推荐系统.pdf(86页珍藏版)》请在三个皮匠报告上搜索。
1、基于因果推断的推荐系统高宸清华大学 信息国家研究中心https:/ 2023:因果推断在线峰会推荐与因果推断论坛Background2 2 Why is causal inference needed in recommender system?Chen Gao et al.Causal inference in recommender systems:A survey and future directionsJ.arXiv preprint arXiv:2208.12397,2022.Outline3 3 Disentangled learning for user interest an
2、d conformity Disentangled learning for long-term and short-term interests Debiasing in short-video recommendationDisentangling User Interest and Conformity for Recommendation with Causal EmbeddingY.Zheng,Chen Gao,et al.Disentangling user interest and conformity for recommendation with causal embeddi
3、ngC/Proceedings of the Web Conference 2021.2021:2980-2991.4Background5 5 What are the causes behind each user-item interaction?There are two main causes:InterestConformitya best-sellerbuybuyhigh salestire,speed,.How users tend to follow other peopleGoal:Learn disentangled representations for interes
4、t and conformityMotivation6 6 Why learning disentangled representations?Causal recommendation under non-IID situations!IID:independent and identically distributed Robustness Recommenders are trained and updated in real-time Training data and test data are not IID Interpretability Improve user-friend
5、liness Facilitates algorithm developingtraining datatest datarepresentationCausal Recommendation7 7 Inverse Propensity Scoring(IPS)1propensityscore Propensity score is estimated from item popularity Intuition:impose lower weights on popular items,andboost unpopular items Interest and popularity are
6、bundled as one unifiedrepresentationTwo factors are entangled!1 Yang,L.,Cui,Y.,Xuan,Y.,Wang,C.,Belongie,S.,&Estrin,D.(2018,September).Unbiased offline recommender evaluation for missing-not-at-random implicit feedback.In Proceedings of the 12th ACM Conference on Recommender Systems(pp.279-287).Causa
7、l Recommendation8 8 Causal Embeddings(CausE)1 Require a large fraction of biased data and a small fractionof unbiased data Perform two MF on biased and unbiased data,respectively Impose L1/L2 regularization on two MFMF on smallunbiased dataMF on largebiased dataregularizationon two MFStill entangled
8、 representations!1 Bonner,S.,&Vasile,F.(2018,September).Causal embeddings for recommendation.In Proceedings of the 12th ACM conference on recommender systems(pp.104-112).Variety of conformity Conformity depends on both users and items One users conformity varies on different items,and conformity tow
9、ards one item varies for different users Learning disentangled representations is intrinsically hard Only observational data is accessible.No ground-truth for user interest.An interaction can come from one or both factor Careful designs are needed for combining the two factors tomake recommendations
10、.Disentangling interest and conformity9 9Methodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Challenge 1:Variety of conformity Our proposal:Adopt separate embeddings of interest and conformity for users and items 10 Benefit 1:Embedding proximity in high dimens
11、ional space can express the variety of conformity(challenge 1 addressed)Benefit 2:Independent modeling of interest and conformityinterestembeddingconformityembeddinguseritemMethodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Challenge 2:Learning disentangled r
12、epresentations is intrinsically hard Our proposal:Utilize the colliding effect from causal inference to obtain cause-specific data.11Intuition:Train interest/conformity embeddings with interactions that are caused by interest/conformityMethodology:Our DICE Model Disentangling Interest and Conformity
13、 with Causal Embedding(DICE)Challenge 3:Aggregation of the two factors is complicated Our proposal:Leverage multi-task curriculum learning to combine the two causes.12interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embeddin
14、gMethodology:Our DICE Model Disentangling Interest and Conformity with Causal Embedding(DICE)Causal Embedding Disentangled Representation Learning Multi-task Curriculum Learning13interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Cau
15、sal EmbeddingMethodology:Our DICE Model Causal graph and Structural Causal Model(SCM)14interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embeddingcausal graphSCMMethodology:Our DICE Model Causal embedding Separate embeddings
16、for interest and conformity User:(#$),(&()Item:(#$),(&()Use inner product to compute matching score Predict click by combining two causes15interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model
17、Mining cause-specific data with causal inference Immorality and collider16ABC Colliding effect A and B are independent A and B are NOT independent when conditioned on CimmoralitycolliderMethodology:Our DICE Model Mining cause-specific data with causal inference e.g.A:whether a student is talented B:
18、whether a student is hard-working C:whether a student passes an exam17ABC Bob passes the exam,and Bob is not talentedHe is hard-working with high probability Alice doesnt pass the exam,and Alice is talentedShe is most likely not hard-workingMethodology:Our DICE Model Mining cause-specific data with
19、causal inference The colliding effect can come to help!Click is the collider of interest and conformity!18interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal Embedding Use popularity as a proxy for conformity A clicked item wit
20、h low popularityhigh interest An unclicked item with high popularitylow interestMethodology:Our DICE Model Notation):interest matching probability matrix*:conformity matching probability matrix19Case 1:clicks a popular item,doesnt click an unpopular item Case 2:clicks an unpopular item,doesnt click
21、apopular item Methodology:Our DICE Model20 :whole training set(,):user,pos item,neg item!:negative samples more popular than positive samples:negative samples less popular than positive samples=+,+,Solution:train different embeddings with different cause-specific datainterestembeddingconformityembed
22、dinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model Main task:estimating clicks21+,Methodology:Our DICE Model Interest modeling Only use interest embedding22,interestembeddingconformityembeddinguseritemdiscrepancylossconfor
23、mitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE Model Conformity modeling Only use conformity embedding23+,interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our D
24、ICE Model Discrepancy task direct supervision on disentanglement24 L1-inv:1(#$,-.#)L2-inv:2(#$,-.#)distance correlation:#$,-.#=(#$,-.#)(#$)8(-.#)interestembeddingconformityembeddinguseritemdiscrepancylossconformitylossinterestlossclicklossconcat(a)Causal Graph(b)Causal EmbeddingMethodology:Our DICE
25、Model Multi-task learning25 Popularity based Negative Sampling with Margin(PNSM)Popularity of the positive item:Sample negative items with popularity:Larger than +Lower than Large:high confidence on inequalities,easy Small:low confidence on inequalities,hard Curriculum learning:an easy-to-hard strat
26、egy decay,and by a factor of 0.9 after each epochExperiments26 Datasets:Movielens-10M Netflix Evaluation:non-IID protocol(same with CausE1):Train:60%normal+10%intervened Validation:10%intervened Test:20%intervened Metrics:Recall,Hit Ratio,NDCG Recommendation models MF2 LightGCN31 Bonner,S.,&Vasile,F
27、.(2018,September).Causal embeddings for recommendation.In Proceedings of the 12th ACM conference on recommender systems(pp.104-112).2 Rendle,S.,Freudenthaler,C.,Gantner,Z.,&Schmidt-Thieme,L.(2012).BPR:Bayesian personalized ranking from implicit feedback.arXivpreprint arXiv:1205.2618.3 He,X.,Deng,K.,
28、Wang,X.,Li,Y.,Zhang,Y.,&Wang,M.(2020,July).Lightgcn:Simplifying and powering graph convolution network for recommendation.In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retri.Conclusion and Future Work8585 We conduct a large-scale analysis to show that duration bias leads to inaccurate and unfair recommendation of micro-videos.A new measurement of watch time
89、 on micro-videos,WTG,is proposed which eliminates duration bias and can evaluate recommendation performance without favoring either long or short videos.A general framework DVR is further designed to help recommendation models learn unbiased user preferences.Future work Apply WTG and DVR in online systems.Codes can be found at:https:/ Disentangled learning for user interest and conformity Disentangled learning for long-term and short-term interests Debiasing in short-video https:/