1-1 大规模游戏社交网络节点相似性算法及其应用.pdf

编号：102485

PDF 33页 4.24MB 下载积分：VIP专享

下载报告请您先登录！

1-1 大规模游戏社交网络节点相似性算法及其应用.pdf

1、规模游戏社交络节点相似性算法及其应林清Large-scale game social network node similarity algorithm and its application腾讯游戏社交络算法负责CONTENTS01Introduction02Previous Work03Experiments and Deployment04Our Solution05Optimizations06Conclusions 01IntroductionGraph is EverywhereMost of data can be naturally modeled as graphs.Soci

2、al networkGraph of webProtein-protein interaction(PPI)networkGraphs can easily depict the relations or interactions of data.Some homogeneous graphs:friendships,interactions between players,etc.Some heterogenous graphs:club memberships,item interactions,etc.playersItems/clubsGraph in GamesFriendships

3、 in gamesClub memberships or item purchasing in gamesGraphs could be massive!Billions of nodesHundred-billions of edgesRecommendation on GraphsFriend recommendation on the billion-scale game social network.News recommendation on the content heterogenous graphClub recommendation for game players.Appl

4、icationsTwo kinds of problemsOrdering existing edgesPredicting non-existing edges,Link Analysis and PredictionThe problems can be solved by learning the node proximity functions.?Common NeighborsFriends friends could be friends.Personalized PageRank(PPR)The PPR of with respect to,denoted by ,is defi

5、ned as the probability of a random walk starting with and ending at.Some Classical Node Proximity PPRs are asymmetric.Its able to answer high order proximity of two nodes by considering all the possible paths between them.$+(1 )$restart/termination probabilitycontinue probabilitystarting matrixnorma

6、lized adjacent matrixPPR matrixA Formal Definition of PPR Distributed algorithms for fully PPR,instead of single machine solutions or pair-wise/single-source solutions.DriverWorkerWorkerWorkerGraph is stored as the adjacent list,such as.Our FocusMapReduce computing framework:Map phase Take as input

7、a pair Apply map function on the pair Output a set of new pairsReduce phase All pairs with the same key are sent to one reducer Apply reduce function on the values Output new pairs02Previous WorkThe PPR ,of node with respect to can be computed as!,=,+(1 ).(,%)(!*+,(,)The algorithm terminates when!*+

8、is less than a user-defined threshold.Previous Work:Power Iteration ApproachPower Iteration approach requires a space complexity of (2),where is the number of nodes.Simulate random walks starting from each interested node with the termination probability.If random walks stop at node,then we estimate

9、 ,as/.The choice of if Previous Work:Monte-Carlo ApproachChernoff Bounds.Let =!#$!,where!=1 with probability!and!=0 with probability 1 !,and all!are independent.Let =!#$!.Then,Pr(|)2%&!/)for all 0 1When =(*+,(#/.)0!1),we can obtain an-approximate PPR for and ,with a probability 1-2.Preprocessing sta

10、geSample/short segments of length for each nodeOnline computation stageMerge the segments to form the longer walks.Previous Work:The Doubling AlgorithmDifficult to decide No guarantee on the accuracyload imbalance for large node03Our SolutionParallel pipeline frameworkEach random is generated in mul

11、tiple steps.Each pipeline has at most random walks.Subsequent pipelines has one step in gap.Terminated paths are stored in!,while active paths continue expanding until terminated.Solution OverviewParallel pipeline framework Each random is generated in multiple steps.Each pipeline has at most random

12、walks.Subsequent pipelines has one step in gap.Terminated paths are stored in!,while active paths continue expanding until terminated.Solution Overview Reduce the memory overhead.Remedy the under-utilization at long walks.Expected number of rounds is/+#3 1.AdvantagesParallel pipeline framework Each

13、random is generated in multiple steps.Each pipeline has at most random walks.Subsequent pipelines has one step in gap.Terminated paths are stored in 4,while active paths continue expanding until terminated.Solution OverviewReduce the memory overhead.Remedy the under-utilization at long walks.Expecte

14、d number of rounds is/+1/1.Random neighbor selection Deficiency on small nodesHow to choose?AdvantagesChallenges04OptimizationsPre-processing phaseConvert the weight into an alias and switching probability.Sampling phaseFirst,identify an element with probability 1/,where is the number of elements.Th

15、en,choose the element or its alias according to the switching probability.aliasswitching probabilityweightTime complexity:pre-processing phase has(),and sampling phase has(1).Space complexity:().Optimization 1:The Alias Method1/21/31/121/121/312/32/32/31/31/3The Alias Tree Divide the neighbor set of

16、 a large node into several subsets,each of which can fit in memory.Organize the subsets in a hierarchical manner.Sampling is performed by traversing the tree from the root.Optimization 2:Handle Large NodesCorrectness analysis Let be the path from the root of the Alias Tree to the leaf%,where the ele

17、ment resides.The probability of choosing equalsS1S2S3SiAnalysis of the Alias Tree L+M L !=+Complexity analysis Time complexity is(log).Although total space complexity remains(),each node of the Alias Tree can fit in memory.Big Move A length-big move is a walk of steps.length-1 big moveslength-2 big

18、movesAn example graphOptimization 3:Handle Small NodesThe Set of Big Moves Given a number and a starting node,enumerate all length-big move starting from,where 1 .A random walk with several steps can be generated immediately from the big move set.The probability of a length-big move =#,5,4is product

19、 of probabilities of edges in.A Heuristic ApproachThe number of active random walks in the 1stpipeline at the 1ststep/round is#=.The number of active random walks at the 2ndround is 5=+1#.The number of active random walks at the(+1)-th round is Optimization 4:Optimizing%&=+1%=(%&(1 )()*+,.Let be the

20、 size of available memory,and be the size of a path,then we have c!#,i.e.,#$!%.05Experiments and DeploymentDatasets13 public datasetsCompetitors Doubling Bahmani et al.SIGMOD11 GraphX spark.apache.org/graphxExperimental Settings An in-house cluster with up to 201 machines with 16 GB memory and 12 co

21、res.By default,we exploit 51 machines with 50 as workers and 1 as master,and set-=1/,=0.5,=0.5,=0.5.Experiment SetupRunning timeComparisons with Previous WorkAccuracy Use the results of GXPPR as the ground-truth.Effects of OptimizationsOther ExperimentsVarying parameters,38404244464850525411/22/2017

22、11/23/201711/24/201711/25/201711/26/201711/27/201711/28/2017点击率规则PPR(stable)师徒推荐3638404244464843034310443105累计当点击率（%）PPRTAP规则随机好友排序Deployment in Tencent Gamesupdated in few daysFriendships in gamesSort all according to(,)06Conclusions 1.Distributed algorithms based on the parallel pipelin

23、e framework for fully personalized PageRank computation.2.Hierarchical sampling algorithm with the Alias Tree structure that avoids the skewness of large nodes.3.Big moves to accelerate the computation on small nodes.4.Optimizing the size of pipeline to improve the throughput of the distributed algo

24、rithms.5.Demonstrate the performance of our approach on both efficiency and accuracy on various datasets.Conclusions1 Wenqing Lin:Distributed Algorithms for Fully Personalized PageRank on Large Graphs.WWW 2019:1084-10942 Siqiang Luo,Xiaokui Xiao,Wenqing Lin,Ben Kao:BATON:Batch One-Hop Personalized PageRanks with Efficiency and Accuracy.IEEE Trans.Knowl.Data Eng.32(10):1897-1908(2020)ReferencesQ&A环节

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（1-1 大规模游戏社交网络节点相似性算法及其应用.pdf）为本站（云闲）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。