research-article

Pixel-Level Character Motion Style Transfer using Conditional Adversarial Networks

Authors:

Xin LiuAuthors Info & Claims

CGI 2018: Proceedings of Computer Graphics International 2018

Pages 129 - 138

https://doi.org/10.1145/3208159.3208161

Published: 11 June 2018 Publication History

Abstract

In this paper, we describe a novel method for synthesizing realistic human movement in videos according to different body motion inputs, which are based on conditional GAN and Gram loss. Moreover, we present a character motion style transfer model with two-branch networks to characterize natural video sequences. The first branch is built upon convolutional LSTMs to capture spatio-temporal representations of style video, and the second branch is structured by convolutional networks to extract the spatial feature of content frame image. The entire network is constructed with encoder-decoder architecture to learn the representations for both spatial content and temporal correlations in videos, which can transform a motion style to another given style video. The main benefits of our approach lies in jointly considering the spatio-temporal correlations of motion video and establishing Gram constraint to achieve real-world character motion style transfer. The experiments demonstrate the effectiveness of our proposed motion style transfer approach on real-world video, and the generated motions with pixel-level motion style transfer are of high visual quality.

References

[1]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).

[2]

Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Realtime Facial Animation with Image-based Dynamic Avatars. ACM Trans. Graph. 35, 4, Article 126 (July 2016), 12 pages.

Digital Library

[3]

Jinxiang Chai and Jessica K. Hodgins. 2005. Performance Animation from Low-dimensional Control Signals. ACM Trans. Graph. 24, 3 (July 2005), 686--696.

Digital Library

[4]

Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir, and Shi-Min Hu. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (2013), 824--837.

Digital Library

[5]

Emily L Denton, Soumith Chintala, Rob Fergus, et al. 2015. Deep Generative Image Models using aï£? Laplacian Pyramid of Adversarial Networks. In Advances in neural information processing systems. 1486--1494.

Digital Library

[6]

Matthew Flagg and James M Rehg. 2013. Video-based crowd synthesis. IEEE transactions on visualization and computer graphics 19, 11 (2013), 1935--1947.

Digital Library

[7]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.

[8]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

Digital Library

[9]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028 (2017).

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[11]

Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned Neural Networks for Character Control. ACM Trans. Graph. 36, 4, Article 42 (July 2017), 13 pages.

Digital Library

[12]

Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (July 2016), 11 pages.

Digital Library

[13]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).

[14]

Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for realtime style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.

[15]

Mubbasir Kapadia, Xu Xianghao, Maurizio Nitti, Marcelo Kallmann, Stelian Coros, Robert W. Sumner, and Markus Gross. 2016. Precision: Precomputing Environment Semantics for Contact-rich Character Animation. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '16). ACM, New York, NY, USA, 29--37.

Digital Library

[16]

Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steven M Seitz. 2010. Being john malkovich. In European Conference on Computer Vision. Springer, 341--353.

Digital Library

[17]

Vladimir G. Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2Pose: Human-centric Shape Analysis. ACM Trans. Graph. 33, 4, Article 120 (July 2014), 12 pages.

Digital Library

[18]

Manfred Lau and James J. Kuffner. 2005. Behavior Planning for Character Animation. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '05). ACM, New York, NY, USA, 271--280.

Digital Library

[19]

Kang Hoon Lee, Myung Geol Choi, and Jehee Lee. 2006. Motion Patches: Building Blocks for Virtual Environments Annotated with Motion Data. In ACM SIGGRAPH 2006 Papers (SIGGRAPH '06). ACM, New York, NY, USA, 898--906.

Digital Library

[20]

LPC 2017. Liberated pixel cup. (2017). Retrieved November 27, 2017 from http://lpc.opengameart.org/

[21]

Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5188--5196.

[22]

Jianyuan Min and Jinxiang Chai. 2012. Motion Graphs++: A Compact Generative Model for Semantic Motion Analysis and Synthesis. ACM Trans. Graph. 31, 6, Article 153 (Nov. 2012), 12 pages.

Digital Library

[23]

Tomohiko Mukai. 2011. Motion Rings for Interactive Gait Synthesis. In Symposium on Interactive 3D Graphics and Games (I3D '11). ACM, New York, NY, USA, 125--132.

Digital Library

[24]

Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical Motion Interpolation. ACM Trans. Graph. 24, 3 (July 2005), 1062--1070.

Digital Library

[25]

Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483--499.

[26]

Xue Bin Peng, Glen Berseth, and Michiel van de Panne. 2016. Terrain-adaptive Locomotion Skills Using Deep Reinforcement Learning. ACM Trans. Graph. 35, 4, Article 81 (July 2016), 12 pages.

Digital Library

[27]

Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).

[28]

Scott E Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. 2015. Deep visual analogy-making. In Advances in neural information processing systems. 1252--1260.

Digital Library

[29]

Charles Rose, Michael F Cohen, and Bobby Bodenheimer. 1998. Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications 18, 5 (1998), 32--40.

Digital Library

[30]

Charles F Rose III, Peter-Pike J Sloan, and Michael F Cohen. 2001. Artist-Directed Inverse-Kinematics Using Radial Basis Function Interpolation. In Computer Graphics Forum, Vol. 20. Wiley Online Library, 239--250.

[31]

Alla Safonova and Jessica K. Hodgins. 2007. Construction and Optimal Search of Interpolated Motion Graphs. ACM Trans. Graph. 26, 3, Article 106 (July 2007).

Digital Library

[32]

Manolis Savva, Angel X. Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nie ssner. 2016. PiGraphs: Learning Interaction Snapshots from Observations. In SIGGRAPH ASIA 2016 Virtual Reality Meets Physical Reality: Modelling and Simulating Virtual Humans and Environments (SA '16). ACM, New York, NY, USA, Article 5, 2 pages.

Digital Library

[33]

Christian Schuldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 3. IEEE, 32--36.

Digital Library

[34]

Ari Shapiro, Yong Cao, and Petros Faloutsos. 2006. Style components. In Proceedings of Graphics Interface 2006. Canadian Information Processing Society, 33--39.

Digital Library

[35]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[36]

Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International Conference on Machine Learning. 843--852.

Digital Library

[37]

Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Trans. Graph. 30, 3, Article 18 (May 2011), 12 pages.

Digital Library

[38]

Graham W. Taylor and Geoffrey E. Hinton. 2009. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 1025--1032.

Digital Library

[39]

Nikolaus F Troje. 2002. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of vision 2, 5 (2002), 2--2.

[40]

Tony Tung, Shohei Nobuhara, and Takashi Matsuyama. 2009. Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 1709--1716.

[41]

Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. In ICML. 1349--1357.

Digital Library

[42]

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. arXiv preprint arXiv:1701.02096 (2017).

[43]

Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, and Honglak Lee. 2017. Decomposing motion and content for natural video sequence prediction. ICLR 1, 2 (2017), 7.

[44]

Jack Wang, Aaron Hertzmann, and David M Blei. 2006. Gaussian process dynamical models. In Advances in neural information processing systems. 1441--1448.

Digital Library

[45]

Jack M Wang, David J Fleet, and Aaron Hertzmann. 2008. Gaussian process dynamical models for human motion. IEEE transactions on pattern analysis and machine intelligence 30, 2 (2008), 283--298.

Digital Library

[46]

Shihong Xia, Congyi Wang, Jinxiang Chai, and Jessica Hodgins. 2015. Realtime Style Transfer for Unlabeled Heterogeneous Human Motion. ACM Trans. Graph. 34, 4, Article 119 (July 2015), 10 pages.

Digital Library

[47]

SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems. 802--810.

Digital Library

[48]

Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based Characters: Creating New Human Performances from a Multi-view Video Database. ACM Trans. Graph. 30, 4, Article 32 (July 2011), 10 pages.

Digital Library

[49]

Tianfan Xue, Jiajun Wu, Katherine Bouman, and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems. 91--99.

Digital Library

Index Terms

Pixel-Level Character Motion Style Transfer using Conditional Adversarial Networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Motion capture
  2. Computer graphics
    1. Animation
      1. Motion processing

Recommendations

Adult2child: Motion Style Transfer using CycleGANs
MIG '20: Proceedings of the 13th ACM SIGGRAPH Conference on Motion, Interaction and Games

Child characters are commonly seen in leading roles in top-selling video games. Previous studies have shown that child motions are perceptually and stylistically different from those of adults. Creating motion for these characters by motion capturing ...
Motion Puzzle: Arbitrary Motion Style Transfer by Body Part
This article presents Motion Puzzle, a novel motion style transfer network that advances the state-of-the-art in several important respects. The Motion Puzzle is the first that can control the motion style of individual body parts, allowing for local ...
Res-LGAN for human motion style transfer with more features preserved

Human motion style transfer is a technique that aims to apply a desired style to neutral motions, which is an essential aspect of motion generation and retargeting. With the advancement of deep learning networks, significant progress has been made in ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CGI 2018: Proceedings of Computer Graphics International 2018

June 2018

284 pages

ISBN:9781450364010

DOI:10.1145/3208159

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CGI 2018

CGI 2018: Computer Graphics International 2018

June 11 - 14, 2018

Island, Bintan, Indonesia

Acceptance Rates

CGI 2018 Paper Acceptance Rate 35 of 159 submissions, 22%;

Overall Acceptance Rate 35 of 159 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
242
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)5

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten