Abstract
This research focuses on the generation algorithm of successive future images of a walking pedestrian from successive past images. The proposal algorithm is based on Generative adversarial networks (GANs) whose generative network and discriminative network are defined by Convolutional neural networks (CNN). The algorithm takes successive images as input data and generates their future images as the output images. The algorithm is compared with the other algorithms such as Optical Flow and Long Short-Term Memory (LSTM) for different numbers of input and output images. The results show that the accuracy of the proposed algorithm is better than LSTM in all cases and that the proposed algorithm shows better accuracy than Optical Flow in the case of large numbers of input and output images.
Similar content being viewed by others
References
Chen, Y. F., Liu, M., Liu, S.-Y., Miller, J., & How, J. P. (2016). Predictive modeling of pedestrian motion patterns with Bayesian nonparametrics. In Proceedings of AIAA guidance navigation and control conference (pp.1861–1875). https://doi.org/10.2514/6.2016-1861. Accessed 8 Aug 2021.
Schneider, N., Gavrila, D. M., Weickert, J., Hein, M., & Schiele, B. (2013). Pedestrian path prediction with recursive Bayesian filters: A comparative study. Proceedings of German Conference on Pattern Recognition, Lecture Notes in Computer Science, 8142, 174–183.
Bonnin, S., Weisswange, T. H., Kummert, F., & Schmuedderich, J. (2014). General behavior prediction by a combination of scenario specific models. IEEE Transactions on Intelligent Transportation Systems, 15, 1478–1488.
Bera, A., & Manocha, D. (2017). PedLearn: Realtime pedestrian tracking, behavior learning, and navigation for autonomous vehicles. In Proceedings of 9th workshop on planning, perception and navigation for intelligent vehicles. http://ppniv17.irccyn.ec-nantes.fr/session4/Bera/paper.pdf. Accessed 8 Aug 2021.
Ziebart, B. D., Ratliff, N., Gallagher, G., Mertz, C., Peterson, K., Andrew Bagnell, J., Hebert, M., Dey, A. K., & Srinivasa, S. (2009). Planning-based prediction for pedestrians. In Proceedings of IEEE/RSJ international conference on intelligent robots and systems (pp. 3931–3936). https://doi.org/10.1109/IROS.2009.5354147.
Lucas, B., & Kanade, T. (1981). An iterative image registration technique with an application to stereo vision. In IJCAI'81: Proceedings of the 7th international joint conference on artificial intelligence, Vol. 2 (pp. 674–679). https://dl.acm.org/doi/proceedings/10.5555/1623264
Horn, B., & Schunck, B. (1981). Determining optical flow. Artificial Intelligence, 17(1–3), 185–203.
Fortun, D., Bouthemy, P., & Kervrann, C. (2015). Optical flow modeling and computation: A survey. Computer Vision and Image Understanding, 134, 1–21.
Wedel, A., & Cremers, D. (2011). Stereo scene flow for 3D motion analysis. Springer.
Zach, C., Pock, T., & Bischof, H. (2007). A duality based approach for realtime TV-L 1 optical flow. Pattern Recognition, 1(1), 214–223.
Brox, T., & Malik, J. (2011). Large displacement optical flow: Descriptor matching in variational motion estimation. IEEE Transaction on Software Engineering, 33(3), 500–513.
Treiber, M. A. (2013). Optimization for computer vision ? An introduction to core concepts and methods. Springer.
Klette, R. (2019). Concise computer vision: An introduction into theory and algorithms. Springer.
Ranzato, M., Szlam, A., Bruna, J., Mathieu, M., Collobert, R., & Chopra, S. (2014). Video (Language) modeling: A baseline for generative models of natural videos. Computing Research Repository arXiv preprint arXiv:1412.6604. Accessed 8 Aug 2021.
Revaud, J., Weinzaepfel, P., Harchaoui, Z., & Schmid, C. (2015). EpicFlow: EdgePreserving interpolation of correspondences for optical flow. In Proceedings of IEEE computer vision and pattern recognition (CVPR). arXiv preprint arXiv:1501.02565. Accessed 8 Aug 2021.
Srivastava, N., Mansimov, E., & Salakhutdinov, R. (2015). Unsupervised learning of video representations using LSTMs. In Proceedings of international conference on machine learning, Vol. 37 (pp. 843–852).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Proceedings of neural information processing systems (NIPS), Vol. 27.
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., & Krishnan. (2017). Unsupervised pixel-level domain adaptation with generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR. arXiv preprint arXiv:1612.05424. Accessed 8 Aug 2021.
Molano-Mazon, M., Onken, A., Piasini, E., & Panzeri, S. Synthesizing realistic neural population activity patterns using generative adversarial networks. In Proceedings of 6th international conference on learning representations (ICLR). arXiv preprint arXiv:1803.00338. Accessed 8 Aug 2021.
Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of International Conference on Learning Representations (ICLR).
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., & Courville, A. (2017). Improved training of Wasserstein GANs. In Proceedings of the 31st International conference on neural information processing (pp. 5769–5779).
Shmelkov, K., Schmid, C., & Alahari, K. (2018). How good is my GAN?. In Proceedings of European conference on computer vision. arXiv preprint arXiv:1807.09499. Accessed 8 Aug 2021.
Mathieu, M., Couprie, C., & LeCun, Y. (2016). Deep multi-scale video prediction beyond mean square error. In Proceedings of 4th International conference on learning representations (ICLR2016). arXiv preprint arXiv:1511.05440. Accessed 8 Aug 2021.
Pan, Y., Qiu, Z., Yao, T., Li, H., & Mei, T. (2017). To create what you tell: Generating videos from captions. In Proceedings of the 25th ACM international conference on multimedia (p. 1789–1798). https://doi.org/10.1145/3123266.3127905.
Vondrick, C., Pirsiavash, H., & Torralba, A. (2016). Generating videos with scene dynamics. In Proceedings of neural information processing systems (NIPS) (pp. 613–621).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3431–3440). https://doi.org/10.1109/CVPR.2015.7298965.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th international conference on neural information processing systems, Vol. 1 (pp. 1097–1105).
Dosovitskiy, A., Springenberg, J. T., & Brox, T. (2015). SLearning to generate chairs with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR2015) (pp. 1538–1546). https://doi.org/10.1109/CVPR.2015.7298761.
Zhang, Z. (2012). Microsoft KINECT sensor and its effect. In Proceedings of IEEE multiMedia, Vol. 19 (pp. 4–12).
Huynh-Thu, Q., & Ghanbari, M. (2008). Scope of validity of PSNR in image/video quality assessment. Electronics Letters, 44(13), 800–801.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, B., Kita, E. Successive Future Image Generation of a Walking Pedestrian Using Generative Adversarial Networks. Rev Socionetwork Strat 15, 309–325 (2021). https://doi.org/10.1007/s12626-021-00085-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12626-021-00085-6