1-3 当强化学习遇上高自由度动作游戏：问题研究与应用实践.pdf

编号：102359

PDF 36页 4.78MB 下载积分：VIP专享

下载报告请您先登录！

1-3 当强化学习遇上高自由度动作游戏：问题研究与应用实践.pdf

1、When RL Meets Highly Free Action Game:Research and Case Study2022/09/24胡裕靖胡裕靖1Overview2Navigation3Melee Combat4FutureIntro of Fuxi&Naraka:Bladepoint How we solve navigation problem in Naraka:BladepointHow we solve melee combat problem in Naraka:BladepointWhat we wantto do nextOverviewIntro of Fuxi&N

2、araka:Bladepoint NetEase FuxiBusiness and Research InterestsFuxi is founded on the principle of bridging artificial intelligence and video gamesReinforcement LearningComputer VisionNatural Language ProcessingUser PersonaVirtual HumanRoboticsNetEase Fuxi RL GroupBusiness and Research InterestsCard Ga

3、meRevelation MobileRevelation MobileMMORPGJustice 6Justice 6-vsvs-6 6Sports GameFever Basketball 3Fever Basketball 3-vsvs-3 3ACT GameNarakaNaraka BladepointBladepointTypical Applications of RL in Games Game AI Bots60-player PVP mythical action combatMelee combatGravity defying mobilityVast arsenals

4、of melee&ranged weaponsLegendary customizable heroes with epic abilitiesAction-adventure Battle Royale Game Developed by 24 Entertainment and published by NetEase Games MontrealNaraka:Bladepoint（永劫无间）Naraka:BladepointTwo major problems in Naraka（人机模式）we want to solve1.1.NavigationNavigation in very

5、complex terrains2.2.Melee combatMelee combat bots with high skill levelReinforcement Learning Applications in NarakaNavigation and Melee CombatNavigation TaskMelee Combat TaskNavigationComplex threeComplex three-dimensional terrainsdimensional terrains:mountains,trees,rivers,temples,tall buildings(T

6、oo many disconnected areas)Problems for AI in Naraka:BladepointProblems for pathfindingNavMeshTypical terrains in Naraka:BladepointDynamic environmentDynamic environment(i.e.,poison circle,bombing zone,traps)Problems for AI in Naraka:BladepointProblems for pathfindingBombing ZonePoison CircleTrapsMu

7、ltiple game mechanisms for moving Multiple game mechanisms for moving(i.e.,grappling hooks,scale rush,sliding jump,charge-to-dodge)Demand for humanDemand for human-likenesslikenessGrappling hookProblems for AI in Naraka:BladepointProblems for pathfindingScale RushSliding Jump&Charge-to-dodgeThreeThr

8、ee-Dimensional realDimensional real-time perceptiontime perceptionComplex threeComplex three-dimensional terrainsdimensional terrainsDynamic environmentDynamic environmentDisconnected areasDisconnected areasMultiple game mechanism for movingMultiple game mechanism for movingHuman Like moving operati

9、onsHuman Like moving operationsHumanHuman-like Policy Output Designlike Policy Output DesignDeep Reinforcement LearningDeep Reinforcement LearningNavigation:3D perception with DRLProblems and methodsTechniques such as Automated Reward Techniques such as Automated Reward Shaping and Curriculum Learni

10、ngShaping and Curriculum LearningRaderDepth MapNavigation:3D perception with DRL3D real-time perception in the game3D Features3D FeaturesScalar FeaturesScalar FeaturesTimeTime-Series FeaturesSeries FeaturesW/A/S/DW/A/S/D（ForwardForward、BackBack、LeftLeft、RightRight）Hook/Crouch/Hook/Crouch/Dodge/Jump/

11、Dodge/Jump/Navigation:3D perception with DRLNeural Network StructureNavigation:3D perception with DRLAgent can get stuck and lacks human-likeness Agent gets stuck in cornersAgent keeps jumpingNavigation:3D perception with DRLAutomated Reward ShapingReward Shaping needs tedious tuning work to get app

12、ropriate weight hyperparametersReward Shaping needs tedious tuning work to get appropriate weight hyperparametersOptimal PolicySuboptimal PolicyTrue RewardShaping RewardIRATLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.Navigation:3D perc

13、eption with DRLAutomated Reward Shaping Updating Shaping PolicyLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.For each shaping policy and the target policy:When two policies are consistentconsistent,the shaping policy should learn quickly

14、learn quickly.When two policies conflict conflict too much,the shaping policy should update carefullyupdate carefully.Combine with its original optimization objective:=1max,+1min,An increasing-effect KL regularizer is introduced to distill target policy knowledge:=,A new objective is:=clip,1 ,1+Simi

15、larity between and is defined as:=|Navigation:3D perception with DRLAutomated Reward Shaping Updating Target PolicyLi Wang,Yupeng Zhang,Yujing Hu,et al.Individual Reward Assisted Multi-Agent Reinforcement Learning.ICML 2022.Target policy uses learning objective corrected by importance sampling:=A de

16、creasing-effect KL regularizer to ensure effective update.The total learning objective of team policy is:=min ,clip ,1 ,1+,Where is a decreasing coefficient.Curriculum learningCurriculum learning:choose start point in specific areas,then randomly choose from the full map,and lastly choose stuck poin

17、tsNavigation:3D perception with DRLCurriculum LearningArea NameFull MapCelestraStilltide TempleWreckage PlainsShadow jade MineSun wings RestAverageNavMesh Arrival Rate63.40%32.70%27.90%35.90%24.80%41.40%37.70%Our methodArrival rate81.50%88.00%74.70%85.50%81.50%73.30%80.75%Increase Ratio28.54%169.11%

18、167.74%138.16%228.63%77.05%114.19%Comparison of the arrival rate between NavMesh and our method in different areaNavigation in complex terrainsHigh arrival rate in complex terrainsNavigation in complex terrainsHigh arrival rate in complex terrainsShadow Jade MineRL Navigation Agent vs Rule-based Age

19、ntMelee CombatRockRock-paperpaper-scissors combat systemscissors combat systemFocus Strikes Common AttackCounterstrikes Focus StrikesCommon Attack CounterstrikesProblems for AI in Naraka:BladepointProblems for melee combatThirteen heroesThirteen heroes(more in the future)with different hero skillsPr

20、oblems for AI in Naraka:BladepointProblems for melee combatSkills of Different Heros in Naraka:BladepointVarious melee weaponsVarious melee weapons with different mechanismsProblems for AI in Naraka:BladepointProblems for melee combatSpearNunchukA playing demo of Naraka,showing rich attack modesProb

21、lems for AI in Naraka:BladepointProblems for melee combatPolicy distillationPolicy distillation:knowledge transferOpponent ModelingOpponent Modeling:observing opponents historical behaviors to predict opponents next movesVarious melee weapons Various melee weapons with different mechanismsThirty her

22、oes Thirty heroes(more in the future)with different hero skillsRockRock-paperpaper-scissors combat system scissors combat system Requires players to guess/predict and counteract the others strategiesCombat Bot with High Skill LevelProblems and methodsPolicy distillation Policy distillation can impro

23、ve student agents performance effectively by transferring knowledge from multiple teachersCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weapon:knowledge transfer Stage One:Train all teacher proficient in one weapon Stage Two:Distill their knowledge to one studentSince wea

24、pon combos are more complicated than hero skills,we only use distillation to handle weaponsCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weapon:knowledge transfer How to predict opponents next moves:Observe and encode their historical behaviorsDifferent historical behavio

25、rsEncoded featuresCommon Attack(White)Force Strikes(Blue)Counter Strikes(Red)0.4,0.05,0.450.4,0.55,0.050.8,0.15,0.05Combat Bot with High Skill LevelPredict opponents next movesCombat Bot with High Skill LevelAn all-rounder AI to master every kind of weaponCombat Bot with High Skill LevelPredict opponents next moves(PVE)Future workWhat we want to do nextFuture workNavigation in the room and high buildings with big height differencesThe timing for switching melee/ranged weaponsOther sub-goal in battle royale game,i.e.,resource collection,team cooperation etc.Remaining ProblemsQ&A

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（1-3 当强化学习遇上高自由度动作游戏：问题研究与应用实践.pdf）为本站（云闲）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。