Abstract
Humans have the remarkable ability of imagination, where within the human mind virtual simulations are done of scenarios whether visual, auditory or any other senses. These imaginations are based on the experiences during interaction with the real world, where human senses help the mind understand their surroundings. Such level of imagination has not yet been achieved using current algorithms, but a current trend in deep learning architectures known as Generative Adversarial Networks (GANs) have proven capable of generating new and interesting images or videos based on the training data. In that way, GANs can be used to mimic human imagination, where the resulting generated visuals of GANs are based on the data used during training. In this paper, we use a combination of Long Short-Term Memory (LSTM) Networks and 3D GANs to generate videos. We use a 3D Convolutional GAN to generate new human action videos based on trained data. The generated human action videos are used to generate longer videos consisting of a sequence of short actions combined creating longer and more complex activities. To generate the sequence of actions needed we use an LSTM network to translate a simple input description text into the required sequence of actions. The generated chunks are then concatenated using a motion interpolation scheme to form a single video consisting of many generated actions. Hence a visualization of the input text description is generated as a video of a subject performing the activity described.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goertzel, B.: AGI Revolution: An Inside View of the Rise of Artificial General Intelligence. Humanity Press, San Jose (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Goodfellow, I.: NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3, May 2016
Denton, E.L., Chintala, S., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Advances in Neural Information Processing Systems, pp. 613–621 (2016)
Mathieu, M., Couprie, C., LeCun, Y.: Deep multi-scale video prediction beyond mean square error. arXiv preprint arXiv:1511.05440
Finn, C., Goodfellow, I., Levine, S.: Unsupervised learning for physical interaction through video prediction. In: Advances in Neural Information Processing Systems, pp. 64–72 (2016)
Wu, J., Zhang, C., Xue, T., Freeman, B., Tenenbaum, J.: Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems, pp. 82–90 (2016)
Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. arXiv preprint arXiv:1610.09585 (2016)
Koo, S.: Automatic colorization with deep convolutional generative adversarial networks (2016). http://cs231n.stanford.edu/reports/2016/pdfs/224_Report.pdf
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2226–2234 (2016)
Odena, A.: Semi-supervised learning with generative adversarial networks. arXiv preprint arXiv:1606.01583 (2016)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Hoyes, K.A.: 3D simulation: the key to AI. In: Goertzel, B., Pennachin, C. (eds.) Artificial General Intelligence, pp. 353–387. Springer, Heidelberg (2007)
Gorelick, L., Blank, M., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. IEEE Trans. Pattern Anal. Mach. Intell. 29(12), 2247–2253 (2007)
Acknowledgements
This work is supported by Center of Excellence for Higher Education Research Grant funded by Indonesian Ministry of Research and Higher Education. Contract No. 2626/UN2.R3.1/HKP05.00/2017. This paper is also supported by GPU grant from NVIDIA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Arinaldi, A., Fanany, M.I. (2017). Generating Single Subject Activity Videos as a Sequence of Actions Using 3D Convolutional Generative Adversarial Networks. In: Everitt, T., Goertzel, B., Potapov, A. (eds) Artificial General Intelligence. AGI 2017. Lecture Notes in Computer Science(), vol 10414. Springer, Cham. https://doi.org/10.1007/978-3-319-63703-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-63703-7_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63702-0
Online ISBN: 978-3-319-63703-7
eBook Packages: Computer ScienceComputer Science (R0)