《【陈欣】AIGC 前沿技术 —— 虚拟人动作生成技术的发展与应用.pdf》由会员分享,可在线阅读,更多相关《【陈欣】AIGC 前沿技术 —— 虚拟人动作生成技术的发展与应用.pdf(40页珍藏版)》请在三个皮匠报告上搜索。
1、A AI IG GC C 前前沿沿技技术术虚虚拟拟人人动动作作生生成成技技术术的的发发展展与与应应用用陈陈 欣欣腾讯科技 QQ 影像中心研究员2021年加入腾讯QQ影像中心负责QQ秀的虚拟人服饰、动画的AIGC技术研究致力于AIGC、多模态大模型方向的研究与落地中国科学院大学博士,研究生成式人工智能,具体包括虚拟人生成、动作生成、三维物体生成等,在CVPR,ICCV,SIGGRAPH 等国际顶会发表论文20余篇个人主页:HTTPS:/CHENXIN.TECH/陈陈欣欣动动作作生生成成技技术术的的发发展展扩扩散散模模型型与与动动作作生生成成语语言言模模型型与与动动作作生生成成目目录录动动作作生生
2、成成技技术术的的发发展展Motion Background扩扩散散模模型型与与动动作作生生成成Motion-Latent-Diffusion语语言言模模型型与与动动作作生生成成MotionGPTAIGC前沿 虚拟人动作生成技术MMo ot ti io on n B Ba ac ck kg gr ro ou un nd dMMo ot ti io on n-L La at te en nt t-D Di if ff fu us si io on nMMo ot ti io on nG GP PT Th ht tt tp ps s:/g gi it th hu ub b.c co omm/C Ch
3、 he en nF Fe en ng gY Ye e/mmo ot ti io on n-l la at te en nt t-d di if ff fu us si io on nh ht tt tp ps s:/g gi it th hu ub b.c co omm/O Op pe en nMMo ot ti io on nL La ab b/MMo ot ti io on nG GP PT T相相关关资资源源获获取取数据下载与代码开源0 01 1MMo ot ti io on n S Sy yn nt th he es si is s动动作作生生成成技技术术的的发发展展1 OSSO:Ob
4、taining Skeletal Shape from Outside2 A Skinned Multi-Person Linear Model3 Expressive Body Capture:3D Hands,Face,and Body from a Single ImageS Sk ke el le et to on nS Sk ke el le et to on n a an nd d S Sk ki in nWWh ha at t i is s H Hu umma an n MMo ot ti io on n?S Sk ke el le et to on nS Sk ke el le
5、 et to on n a an nd d S Sk ki in nS SMMP PL L/S SMMP PL L-X XS Sk ki in nn ni in ng gMMo ot ti io on n P Pa ar ra amme et te er rs s:动动作作参参数数1 OSSO:Obtaining Skeletal Shape from Outside2 A Skinned Multi-Person Linear Model3 Expressive Body Capture:3D Hands,Face,and Body from a Single IMMe et ta av v
6、e er rs se eF Fi il lmmG Ga amme eA AR RWWh hy y I In nd du us st tr ry y N Ne ee ed ds s MMo ot ti io on n S Sy yn nt th he es si is s?A An ni imma at ti io on n:虚虚拟拟人人动动作作制制作作C Cr ra af ft tl ly y MMa ad de eT Ti imme e-c co on ns su ummi in ng gU Un nn na at tu ur ra al l人人工工制制作作耗耗时时巨巨大大过过程程不不自自然
7、然P Po ol ly yf fj jo or rd d MMo oC Ca ap p:动动作作捕捕捉捉技技术术E Eq qu ui ip pmme en nt t r re eq qu ui ir re ed dH Hi ig gh h c co os st tH Hu umma an n D Dr ri iv ve en n需需要要外外部部设设别别花花费费巨巨大大专专业业表表演演人人员员I Imma ag ge eMMo ot ti io on nL La an ng gu ua ag ge eGesture UnderstandingAnimation GenerationMotion
8、CaptureAction/Behavior RecognitionSpeech GenerationText-driven GenerationWWh hy y A Ac ca ad de emmi ia a N Ne ee ed ds s MMo ot ti io on n S Sy yn nt th he es si is s?T Ti imme el li in ne e:动动作作生生成成技技术术发发展展2 20 01 18 83 3D D MMo ot ti io on n C Ca ap pt tu ur re e动动作作捕捕捉捉2 20 02 21 12 20 02 20 02
9、20 02 22 22 20 02 23 3与动作捕捉技术的研究工作3 3D D MMo ot ti io on n S Sy yn nt th he es si is s动动作作生生成成H HMMR RHuman Mesh Recovery1.4k github stars1k citationsV VI IB BE E -Video Inference for Body Pose and Shape Estimation2.6k github stars636 citations4 4D DH Hu umma an ns sRecent WorkMMD DMM -Motion Diffus
10、ion Models2.2k github stars,100 citationsMML LD D -Motion-latent-diffusion300+github starsMMo ot ti io on nG GP PT TRecent W0 02 2MMo ot ti io on n L La at te en nt tD Di if ff fu us si io on n MMo od de el ls s扩扩散散模模型型与与动动作作生生成成c ci iv vi it ta ai i.c co D Di if ff fu us si io on n i in n I Imma ag
11、 ge e S Sy yn nt th he es si is sD DA AL LL LE E2 2,D Di is sc co o-D Di if ff fu us si io on n,G GL LI ID DE E .I Imma ag ge en nlarger Text encoder rather than CLIPtext-encoder:T5-XXL(4.6B)diffusion decoder 64(2B)diffusion upsampler 64-256Efficient Unet 2-3x fasterL La at te en nt t D Di if ff fu
12、us si io on nDiffusion in VAE latent spaceImage 512x512-VAE 64x64x4Less computational costsSupported Inputs:segmentations,images,G Ge en ne er ra at ti iv ve e MMo od de el ls s:生生成成式式模模型型https:/lilianweng.github.io/posts/2021-07-11-diffusion-models/D Di if ff fu us si io on n MMo od de el ls s:扩扩散散
13、模模型型图图像像生生成成的的理理解解生成式模型本质上是一个采样过程挑战:图像的维度很高,直接构建分布很困难D Di if ff fu us si io on n P Pr ro ob ba ab bi il li is st ti ic c MMo od de el ls s(2 20 02 20 0年年)将复杂的数据采样过程简化从一个二维纯高斯噪声分布逐步去噪的过程D Di if ff fu us si io on n MMo od de el ls s:扩扩散散模模型型Fixed Forward Diffusion ProcessGenerative Reverse Denoising P
14、rocessD Da at ta aN No oi is se ehttps:/ Pr ri io or r WWo or rk k:动动作作生生成成相相关关工工作作A A p pe er rs so on n i is s c cr ro ou uc ch he ed d d do owwn n a an nd d wwa al lk ki in ng g a ar ro ou un nd d s sn ne ea ak ki il ly y.T Te ex xt t-t to o-MMo ot ti io on nRealistic human motionPoor condition m
15、atchingMMD DMMGuy Tevet,Sigal Raab,Brian Gordon,Yonatan Shafir,Amit H Bermano,and Daniel Cohen-Or.Human motion diffusion model.arXiv preprint arXiv:2209.14916,MMo ot ti io on n L La at te en nt t D Di if ff fu us si io on n MMo od de el ls sT Te ex xt t-t to o-MMo ot ti io on nFaster inference timeL
16、imited task capacityMML LD DChen Xin,Biao Jiang,Wen Liu,Zilong Huang,Bin Fu,Tao Chen,Jingyi Yu,and Gang Yu.Executing your commands via motion diffusion in latent space.In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),June 2023.C CV VP PR R2 23 0 03 3MMo ot t
17、i io on nG GP PT T H Hu umma an n MMo ot ti io on n a as s a a F Fo or re ei ig gn n L La an ng gu ua ag ge e语语言言模模型型与与动动作作生生成成L La ar rg ge e L La an ng gu ua ag ge e MMo od de el l:语语言言模模型型WWh ha at t MMa ak ke es s MMo ot ti io on n S Sy yn nt th he es si is s C Ch ha al ll le en ng gi in ng g?“D
18、 Dr ri in nk k”“D Dr ri in nk k”“D Dr ri in nk k”1.Text&Motion Modality Distribution varies greatly Limited paired data2.Motion DiversityDiverse motionsDiverse descriptionsS So omme e wwh ho o i is s wwa av vi in ng g t to o a a p pe er rs so on nS Smma al ll lD Da at ta as se et ts C Ch ha al ll le
19、 en ng ge es s:动动作作生生成成的的挑挑战战Uniform multi-task framework23145Modeling language-motion relationA A p pe er rs so on n j ju ummp ps s f fo or rwwa ar rd ds s a an nd d t tu ur rn ns s r ri ig gh ht t .A A p pe er rs so on n j ju ummp ps s f fo or rwwa ar rd ds s a an nd d t tu ur rn ns s l le ef ft P
20、 Pr ri io or r WWo or rk k:T T2 2MM-G GP PT TT Te ex xt t-t to o-MMo ot ti io on nRealistic human motionJianrong Zhang,Yangsong Zhang,Xiaodong Cun,Shaoli Huang,Yong Zhang,Hongwei Zhao,Hongtao Lu,and Xi Shen.T2m-gpt:Generating human motion from textual descriptions with discrete representations.In Pr
21、oceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),2023.Limited to single taskA A p pe er rs so on n i is s c cr ro ou uc ch he ed d d do owwn n a an nd d wwa al lk ki in ng g a ar ro ou un nd d s sn ne ea ak ki il ly y.T T2 2MM-G GP PT TA A U Un ni if fi ie ed d F
22、 Fr ra amme ewwo or rk k:MMo ot ti io on nG GP PT TD De emmo o:MMo ot ti io on nG GP PT TP Pi ip pe el li in ne e:MMo ot ti io on nG GP PT T 的的框框架架MMO OT TI IO ON N T TO OK KE EN NI IZ ZE ER R(S SE EC C.3 3,1 1)MMO OT TI IO ON N V VO OC CA AB BU UL LA AR RY YP Pi ip pe el li in ne e:MMo ot ti io on
23、nG GP PT T 的的框框架架MMO OT TI IO ON N T TO OK KE EN NI IZ ZE ER R(S SE EC C.3 3,1 1)MMO OT TI IO ON N V VO OC CA AB BU UL LA AR RY YMMO OT TI IO ON N-A AWWA AR RE E L LA AN NG GU UA AG GE E MMO OD DE EL L S SE EC C.3 3.2 2)I In ns st tr ru uc ct te ed d D Da at ta as se et t:MMo ot ti io on nG GP PT T
24、的的数数据据集集T Tr ra ai in ni in ng g:MMo ot ti io on nG GP PT T 的的训训练练S St te ep p 1 1T Tr ra ai in ni in ng g o of f MMo ot ti io on n T To ok ke en ni iz ze er r.A motion sequenceis sampled from 3Dmotion dataset.Motion tokenizerlearns motionrepresentation.Motion codebookis used to representhuman motio
25、n asdiscrete tokens.S St te ep p 2 2MMo ot ti io on n-l la an ng gu ua ag ge e P Pr re e-t tr ra ai in ni in ng g.A motion and alanguage descriptionare sampled.This motion ismapped to discretemotion indices andmixed with words.This data is used topre-train our motion-language model.S St te ep p 3 3I
26、 In ns st tr ru uc ct ti io on n T Tu un ni in ng g.The QAs are sampled from our prompt templates.The prompts are used to fine-tune our model on diverse motion T Tr ra ai in ni in ng g:MMo ot ti io on nG GP PT T 的的训训练练MMo ot ti io on nG GP PT T 的的效效果果展展示示T Te ex xt t-t to o-MMo ot ti io on n C Co om
27、mp pa ar ri is so on nA A p pe er rs so on n i is s c cr ro ou uc ch he ed d d do owwn n a an nd d wwa al lk ki in ng g a ar ro ou un nd d s sn ne ea ak ki il ly y.G GT TT T2 2MM-G GP PT TMMD DMMO Ou ur rs A A mma an n s st ta ar rt ts s t to o wwa al lk k s st tr ra ai ig gh ht t t th he en n wwa a
28、l lk ks s t to o t th he e r ri ig gh ht t.G GT TO Ou ur rs sA A p pe er rs so on n s sl lo owwl ly y wwa al lk k i in n a a c co ou un nt te er rc cl lo oc ck kwwi is se e c ci ir rc cl le e.A A p pe er rs so on n wwa al lk ks s i in n a a s se emmi i-c ci ir rc cu ul la ar r p pa at tt te er rn n,
29、t ti ip p-t to oe ei in ng g.MMo ot ti io on nG GP PT T 的的效效果果展展示示MMo ot ti io on n-t to o-T Te ex xt t C Co ommp pa ar ri is so on nT TMM2 2T TD De ep pi ic ct t a a mmo ot ti io on n a as s l li ik ke e y yo ou u h ha av ve e s se ee en n i it t.R Ra an nd do omm s sa ay y s so omme et th hi in ng
30、 g a ab bo ou ut t d de es sc cr ri ib bi in ng g a a h hu umma an n mmo ot ti io on n.D De es sc cr ri ib be e t th he e mmo ot ti io on n o of f s so omme eo on ne e a as s y yo ou u wwi il ll l.A A mma an n i is s s st ta an nd di in ng g s st ti il ll l s swwa ay yi in ng g a an nd d t th he en
31、n wwa al lk ks s s sl lo owwl ly y t to owwa ar rd ds s t th he e 1 1 o o c cl lo oc ck k.A A p pe er rs so on n i is s s st ta an nd di in ng g u up pl le ef ft t wwh hi il le e p pu ut tt ti in ng g t th he ei ir r h ha an nd ds s t to og ge et th he er r i in n a a p pr ra ay yi in ng g mmo ot ti
32、 io on n.A A s st ta an nd di in ng g p pe er rs so on n h ho ol ld ds s t th he ei ir r h ha an nd ds s i in n f fr ro on nt t o of f t th he ei ir r c ch he es st t a an nd d c cl la ap ps s t th hr re ee e t ti imme es s.MMo ot ti io on nG GP PT T 的的效效果果展展示示T Te ex xt t-t to o-T Te ex xt t R Re e
33、s su ul lt ts 0 04 4F Fu ut tu ur re e WWo or rk ks s未未来来的的研研究究方方向向N Ne ex xt t S St te ep ps sMMu ul lt ti i-mmo od da al l L La ar rg ge e MMo od de el ldiverse modal generationsunderstanding motions at different levelsI Imma ag ge e a an nd d MMo ot ti io on nL La ar rg ge e H Hu umma an n MMo ot ti io on n D Da at ta as se et ts MMo ot ti io on n-L La at te en nt t-D Di if ff fu us si io on nMMo ot ti io on nG GP PT Thttps:/ TH HA AN NK KS S