skip to main content
10.1145/3208159.3208161acmotherconferencesArticle/Chapter ViewAbstractPublication PagescgiConference Proceedingsconference-collections
research-article

Pixel-Level Character Motion Style Transfer using Conditional Adversarial Networks

Published: 11 June 2018 Publication History

Abstract

In this paper, we describe a novel method for synthesizing realistic human movement in videos according to different body motion inputs, which are based on conditional GAN and Gram loss. Moreover, we present a character motion style transfer model with two-branch networks to characterize natural video sequences. The first branch is built upon convolutional LSTMs to capture spatio-temporal representations of style video, and the second branch is structured by convolutional networks to extract the spatial feature of content frame image. The entire network is constructed with encoder-decoder architecture to learn the representations for both spatial content and temporal correlations in videos, which can transform a motion style to another given style video. The main benefits of our approach lies in jointly considering the spatio-temporal correlations of motion video and establishing Gram constraint to achieve real-world character motion style transfer. The experiments demonstrate the effectiveness of our proposed motion style transfer approach on real-world video, and the generated motions with pixel-level motion style transfer are of high visual quality.

References

[1]
Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gan. arXiv preprint arXiv:1701.07875 (2017).
[2]
Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Realtime Facial Animation with Image-based Dynamic Avatars. ACM Trans. Graph. 35, 4, Article 126 (July 2016), 12 pages.
[3]
Jinxiang Chai and Jessica K. Hodgins. 2005. Performance Animation from Low-dimensional Control Signals. ACM Trans. Graph. 24, 3 (July 2005), 686--696.
[4]
Tao Chen, Ping Tan, Li-Qian Ma, Ming-Ming Cheng, Ariel Shamir, and Shi-Min Hu. 2013. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics 19, 5 (2013), 824--837.
[5]
Emily L Denton, Soumith Chintala, Rob Fergus, et al. 2015. Deep Generative Image Models using a� Laplacian Pyramid of Adversarial Networks. In Advances in neural information processing systems. 1486--1494.
[6]
Matthew Flagg and James M Rehg. 2013. Video-based crowd synthesis. IEEE transactions on visualization and computer graphics 19, 11 (2013), 1935--1947.
[7]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.
[8]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[9]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. Improved training of wasserstein gans. arXiv preprint arXiv:1704.00028 (2017).
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[11]
Daniel Holden, Taku Komura, and Jun Saito. 2017. Phase-functioned Neural Networks for Character Control. ACM Trans. Graph. 36, 4, Article 42 (July 2017), 13 pages.
[12]
Daniel Holden, Jun Saito, and Taku Komura. 2016. A Deep Learning Framework for Character Motion Synthesis and Editing. ACM Trans. Graph. 35, 4, Article 138 (July 2016), 11 pages.
[13]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2016. Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016).
[14]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for realtime style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.
[15]
Mubbasir Kapadia, Xu Xianghao, Maurizio Nitti, Marcelo Kallmann, Stelian Coros, Robert W. Sumner, and Markus Gross. 2016. Precision: Precomputing Environment Semantics for Contact-rich Character Animation. In Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games (I3D '16). ACM, New York, NY, USA, 29--37.
[16]
Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, and Steven M Seitz. 2010. Being john malkovich. In European Conference on Computer Vision. Springer, 341--353.
[17]
Vladimir G. Kim, Siddhartha Chaudhuri, Leonidas Guibas, and Thomas Funkhouser. 2014. Shape2Pose: Human-centric Shape Analysis. ACM Trans. Graph. 33, 4, Article 120 (July 2014), 12 pages.
[18]
Manfred Lau and James J. Kuffner. 2005. Behavior Planning for Character Animation. In Proceedings of the 2005 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA '05). ACM, New York, NY, USA, 271--280.
[19]
Kang Hoon Lee, Myung Geol Choi, and Jehee Lee. 2006. Motion Patches: Building Blocks for Virtual Environments Annotated with Motion Data. In ACM SIGGRAPH 2006 Papers (SIGGRAPH '06). ACM, New York, NY, USA, 898--906.
[20]
LPC 2017. Liberated pixel cup. (2017). Retrieved November 27, 2017 from http://lpc.opengameart.org/
[21]
Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5188--5196.
[22]
Jianyuan Min and Jinxiang Chai. 2012. Motion Graphs++: A Compact Generative Model for Semantic Motion Analysis and Synthesis. ACM Trans. Graph. 31, 6, Article 153 (Nov. 2012), 12 pages.
[23]
Tomohiko Mukai. 2011. Motion Rings for Interactive Gait Synthesis. In Symposium on Interactive 3D Graphics and Games (I3D '11). ACM, New York, NY, USA, 125--132.
[24]
Tomohiko Mukai and Shigeru Kuriyama. 2005. Geostatistical Motion Interpolation. ACM Trans. Graph. 24, 3 (July 2005), 1062--1070.
[25]
Alejandro Newell, Kaiyu Yang, and Jia Deng. 2016. Stacked hourglass networks for human pose estimation. In European Conference on Computer Vision. Springer, 483--499.
[26]
Xue Bin Peng, Glen Berseth, and Michiel van de Panne. 2016. Terrain-adaptive Locomotion Skills Using Deep Reinforcement Learning. ACM Trans. Graph. 35, 4, Article 81 (July 2016), 12 pages.
[27]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[28]
Scott E Reed, Yi Zhang, Yuting Zhang, and Honglak Lee. 2015. Deep visual analogy-making. In Advances in neural information processing systems. 1252--1260.
[29]
Charles Rose, Michael F Cohen, and Bobby Bodenheimer. 1998. Verbs and adverbs: Multidimensional motion interpolation. IEEE Computer Graphics and Applications 18, 5 (1998), 32--40.
[30]
Charles F Rose III, Peter-Pike J Sloan, and Michael F Cohen. 2001. Artist-Directed Inverse-Kinematics Using Radial Basis Function Interpolation. In Computer Graphics Forum, Vol. 20. Wiley Online Library, 239--250.
[31]
Alla Safonova and Jessica K. Hodgins. 2007. Construction and Optimal Search of Interpolated Motion Graphs. ACM Trans. Graph. 26, 3, Article 106 (July 2007).
[32]
Manolis Savva, Angel X. Chang, Pat Hanrahan, Matthew Fisher, and Matthias Nie ssner. 2016. PiGraphs: Learning Interaction Snapshots from Observations. In SIGGRAPH ASIA 2016 Virtual Reality Meets Physical Reality: Modelling and Simulating Virtual Humans and Environments (SA '16). ACM, New York, NY, USA, Article 5, 2 pages.
[33]
Christian Schuldt, Ivan Laptev, and Barbara Caputo. 2004. Recognizing human actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, Vol. 3. IEEE, 32--36.
[34]
Ari Shapiro, Yong Cao, and Petros Faloutsos. 2006. Style components. In Proceedings of Graphics Interface 2006. Canadian Information Processing Society, 33--39.
[35]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[36]
Nitish Srivastava, Elman Mansimov, and Ruslan Salakhudinov. 2015. Unsupervised learning of video representations using lstms. In International Conference on Machine Learning. 843--852.
[37]
Jochen Tautges, Arno Zinke, Björn Krüger, Jan Baumann, Andreas Weber, Thomas Helten, Meinard Müller, Hans-Peter Seidel, and Bernd Eberhardt. 2011. Motion Reconstruction Using Sparse Accelerometer Data. ACM Trans. Graph. 30, 3, Article 18 (May 2011), 12 pages.
[38]
Graham W. Taylor and Geoffrey E. Hinton. 2009. Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML '09). ACM, New York, NY, USA, 1025--1032.
[39]
Nikolaus F Troje. 2002. Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of vision 2, 5 (2002), 2--2.
[40]
Tony Tung, Shohei Nobuhara, and Takashi Matsuyama. 2009. Complete multi-view reconstruction of dynamic scenes from probabilistic fusion of narrow and wide baseline stereo. In Computer Vision, 2009 IEEE 12th International Conference on. IEEE, 1709--1716.
[41]
Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S Lempitsky. 2016. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images. In ICML. 1349--1357.
[42]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2017. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. arXiv preprint arXiv:1701.02096 (2017).
[43]
Ruben Villegas, Jimei Yang, Seunghoon Hong, Xunyu Lin, and Honglak Lee. 2017. Decomposing motion and content for natural video sequence prediction. ICLR 1, 2 (2017), 7.
[44]
Jack Wang, Aaron Hertzmann, and David M Blei. 2006. Gaussian process dynamical models. In Advances in neural information processing systems. 1441--1448.
[45]
Jack M Wang, David J Fleet, and Aaron Hertzmann. 2008. Gaussian process dynamical models for human motion. IEEE transactions on pattern analysis and machine intelligence 30, 2 (2008), 283--298.
[46]
Shihong Xia, Congyi Wang, Jinxiang Chai, and Jessica Hodgins. 2015. Realtime Style Transfer for Unlabeled Heterogeneous Human Motion. ACM Trans. Graph. 34, 4, Article 119 (July 2015), 10 pages.
[47]
SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems. 802--810.
[48]
Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based Characters: Creating New Human Performances from a Multi-view Video Database. ACM Trans. Graph. 30, 4, Article 32 (July 2011), 10 pages.
[49]
Tianfan Xue, Jiajun Wu, Katherine Bouman, and Bill Freeman. 2016. Visual dynamics: Probabilistic future frame synthesis via cross convolutional networks. In Advances in Neural Information Processing Systems. 91--99.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CGI 2018: Proceedings of Computer Graphics International 2018
June 2018
284 pages
ISBN:9781450364010
DOI:10.1145/3208159
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolutional networks
  2. Deep leaning
  3. Motion style transfer
  4. Spatio-temporal feature

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CGI 2018
CGI 2018: Computer Graphics International 2018
June 11 - 14, 2018
Island, Bintan, Indonesia

Acceptance Rates

CGI 2018 Paper Acceptance Rate 35 of 159 submissions, 22%;
Overall Acceptance Rate 35 of 159 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 242
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)5
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media