Skip to main content
Log in

SFSN: smart frame selection network for multi-task human synthesis on mobile devices

  • Published:
Wireless Networks Aims and scope Submit manuscript

Abstract

Most human synthesis schemes use high-performance servers, so the user interaction experience of mobile devices is not satisfied. Viewing human synthesis results on smartphones directly increases user interaction and enhances user experience. This paper proposes a smart frame selection network (SFSN) on mobile devices to reduce the traffic between smartphones and cloud. We leverage the attention and relationship model to focus on the relationship between a single frame and the entire video, which can better select important frames, thus reducing the traffic and computing effectively. In addition, we build a multi-task human synthesis system based on SFSN to process the generation tasks such as background changing, pose transfer and virtual try-on in a unified framework. Evaluation results indicate proposed approach reduces the number of frames to be processed by more than 42.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. AlBahar, B., & Huang, J.-B. (2019). Guided image-to-image translation with bi-directional feature transformation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 9016–9025

  2. Hahn, F., Thomaszewski, B., Coros, S., Sumner, R. W., Cole, F., Meyer, M., DeRose, T., & Gross, M. (2014). Subspace clothing simulation using adaptive bases. ACM Transactions on Graphics (TOG), 33(4), 1–9.

    Article  Google Scholar 

  3. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems, 27

  4. Jetchev, N., & Bergmann, U. (2017). The conditional analogy gan: Swapping fashion articles on people images. In: Proceedings of the IEEE international conference on computer vision workshops, pp. 2287–2292

  5. Balakrishnan, G., Zhao, A., Dalca, A.V., Durand, F., & Guttag, J. (2018). Synthesizing images of humans in unseen poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8340–8348

  6. Men, Y., Mao, Y., Jiang, Y., Ma, W.-Y., & Lian, Z. (2020). Controllable person image synthesis with attribute-decomposed gan. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5084–5093

  7. Liu, W., Piao, Z., Min, J., Luo, W., Ma, L., & Gao, S. (2019) Liquid warping gan: A unified framework for human motion imitation, appearance transfer and novel view synthesis. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 5904–5913

  8. Belousov, S. (2021). Mobilestylegan: A lightweight convolutional neural network for high-fidelity image synthesis. arXiv preprint arXiv:2104.04767

  9. Li, M., Lin, J., Ding, Y., Liu, Z., Zhu, J.-Y., & Han, S. (2020). Gan compression: Efficient architectures for interactive conditional gans. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5284–5294

  10. Ren, Y., Wu, J., Xiao, X., & Yang, J. (2021). Online multi-granularity distillation for gan compression. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 6793–6803

  11. Liu, L., Li, H., & Gruteser, M. (2019). Edge assisted real-time object detection for mobile augmented reality. In: The 25th annual international conference on mobile computing and networking, pp. 1–16

  12. Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., & Van Gool, L. (2017). Pose guided person image generation. arXiv preprint arXiv:1705.09368

  13. Siarohin, A., Sangineto, E., Lathuiliere, S., & Sebe, N. (2018). Deformable gans for pose-based human image generation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3408–3416

  14. Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

  15. Petrovic, N., Jojic, N., & Huang, T. S. (2005). Adaptive video fast forward. Multimedia Tools and Applications, 26(3), 327–344.

    Article  Google Scholar 

  16. Wolf, W. (1996). Key frame selection by motion analysis. In: Proceedings 1996 IEEE international conference on acoustics, speech, and signal processing conference, Vol. 2, pp. 1228–1231 . IEEE

  17. Cheng, K.-Y., Luo, S.-J., Chen, B.-Y., & Chu, H.-H. (2009). Smartplayer: user-centric video fast-forwarding. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 789–798

  18. Zhang, Q., Yu, S.-P., Zhou, D.-S., & Wei, X.-P. (2013). An efficient method of key-frame extraction based on a cluster algorithm. Journal of Human Kinetics, 39, 5.

    Article  Google Scholar 

  19. Li, Y., Liu, M., & Rehg, J.M. (2018). In the eye of beholder: Joint learning of gaze and actions in first person video. In: Proceedings of the European conference on computer vision (ECCV), pp. 619–635

  20. Wang, X., Girshick, R., Gupta, A., & He, K. (2018). Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794–7803

  21. Meng, D., Peng, X., Wang, K., & Qiao, Y. (2019). Frame attention networks for facial expression recognition in videos. In: 2019 IEEE international conference on image processing (ICIP), pp. 3866–3870 . IEEE

  22. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520

  23. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J. (2015). Smpl: A skinned multi-person linear model. ACM Transactions on Graphics (TOG), 34(6), 1–16.

    Article  Google Scholar 

  24. Kanazawa, A., Black, M. J., Jacobs, D. W., & Malik, J. (2018). End-to-end recovery of human shape and pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7122–7131

  25. Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3d mesh renderer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3907–3916

  26. Bazarevsky, V., Grishchenko, I., Raveendran, K., Zhu, T., Zhang, F., & Grundmann, M. (2020). Blazepose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204

  27. Sheena, C. V., & Narayanan, N. (2015). Key-frame extraction by analysis of histograms of video frames using statistical methods. Procedia Computer Science, 70, 36–40.

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Science Fund for Distinguished Young Scholars (62025205) and the Natural Science Basic Research Program of Shaanxi (Program No. 2022JQ-623).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Qiu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, B., Feng, X., Qiu, C. et al. SFSN: smart frame selection network for multi-task human synthesis on mobile devices. Wireless Netw 30, 4655–4668 (2024). https://doi.org/10.1007/s11276-022-03112-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11276-022-03112-8

Keywords