Abstract
The deformability and high degree of freedom of mollusks bring challenges in mathematical modeling and synthesis of motions. Traditional analytical and statistical models are limited by either rigid skeleton assumptions or model capacity, and have difficulty in generating realistic and multi-pattern mollusk motions. In this work, we present a large-scale dynamic pose dataset of Drosophila larvae and propose a motion synthesis model named Path2Pose to generate a pose sequence given the initial poses and the subsequent guiding path. The Path2Pose model is further used to synthesize long pose sequences of various motion patterns through a recursive generation method. Evaluation analysis results demonstrate that our novel model synthesizes highly realistic mollusk motions and achieves state-of-the-art performance. Our work proves high performance of deep neural networks for mollusk motion synthesis and the feasibility of long pose sequence synthesis based on the customized body shape and guiding path.
摘要
软体动物身体可变形性和高自由度的特点为数学建模和运动合成带来很大挑战. 受限于刚体骨骼假设或模型容量, 传统解析模型和统计模型难以生成逼真和多模态的软体动物运动. 本文建立一个大规模果蝇幼虫动态姿态数据集, 并提出一个运动合成模型(Path2Pose), 通过给定一段幼虫初始运动姿态序列和引导路径生成后续运动姿态序列. 进一步地, 通过循环生成的方式, Path2Pose模型可以合成长时间、 多模态的果蝇幼虫运动姿态序列. 运动评估实验表明, Path2Pose模型可以生成高度真实的软体动物运动, 并在现有同类型模型中取得最好生成效果. 本文的工作证明了深度神经网络在软体动物运动合成任务中的良好性能以及通过定制软体动物体型和引导路径生成长时间运动姿态的可行性.
Data availability
The source code and data used in this study are openly available in Github at https://github.com/chenjj0702/Path2Pose.git.
References
Aksan E, Kaufmann M, Hilliges O, 2019. Structured prediction helps 3D human motion modelling. Proc IEEE/CVF Int Conf on Computer Vision, p.7143–7152. https://doi.org/10.1109/ICCV.2019.00724
Aksan E, Kaufmann M, Cao P, et al., 2021. A spatio-temporal transformer for 3D human motion prediction. Proc Int Conf on 3D Vision, p.565–574. https://doi.org/10.1109/3DV53792.2021.00066
Barsoum E, Kender J, Liu ZC, 2018. HP-GAN: probabilistic 3D human motion prediction via GAN. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition Workshops, p.1499–1508. https://doi.org/10.1109/CVPRW.2018.00191
Bhattacharya U, Rewkowski N, Banerjee A, et al., 2021. Text2Gestures: a transformer-based network for generating emotive body gestures for virtual agents. Proc IEEE Virtual Reality and 3D User Interfaces, p.1–10. https://doi.org/10.1109/VR50410.2021.00037
Busso C, Deng ZG, Neumann U, et al., 2005. Natural head motion synthesis driven by acoustic prosodic features. Comput Anim Virtual Worlds, 16:283–290. https://doi.org/10.1002/cav.80
Cao JK, Tang HY, Fang HS, et al., 2019. Cross-domain adaptation for animal pose estimation. Proc IEEE/CVF Int Conf on Computer Vision, p.9497–9506. https://doi.org/10.1109/ICCV.2019.00959
Carreira J, Zisserman A, 2017. Quo Vadis, action recognition? A new model and the kinetics dataset. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4724–4733. https://doi.org/10.1109/CVPR.2017.502
Coros S, Beaudoin P, van de Panne M, 2010. Generalized biped walking control. Proc ACM SIGGRAPH, p.130. https://doi.org/10.1145/1833349.1781156
Cui QJ, Sun HJ, 2021. Towards accurate 3D human motion prediction from incomplete observations. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4799–4808. https://doi.org/10.1109/CVPR46437.2021.00477
Dang Q, Yin JQ, Wang B, et al., 2019. Deep learning based 2D human pose estimation: a survey. Tsinghua Sci Technol, 24(6):663–676. https://doi.org/10.26599/TST.2018.9010100
Dong R, Chang Q, Ikuno S, 2021. A deep learning framework for realistic robot motion generation. Neur Comput Appl, p.1–14. https://doi.org/10.1007/s00521-021-06192-3
Eberly D, 2007. 3D Game Engine Design: a Practical Approach to Real-Time Computer Graphics (2nd Ed.). CRC Press, Boca Raton, USA.
Fragkiadaki K, Levine S, Felsen P, et al., 2015. Recurrent network models for human dynamics. Proc IEEE Int Conf on Computer Vision, p.4346–4354. https://doi.org/10.1109/ICCV.2015.494
Ghosh P, Song J, Aksan E, et al., 2017. Learning human motion models for long-term predictions. Proc Int Conf on 3D Vision, p.458–466. https://doi.org/10.1109/3DV.2017.00059
Goodfellow I, Pouget-Abadie J, Mirza M, et al., 2014. Generative adversarial networks. Commun ACM, 63(11):139–144. https://doi.org/10.1145/3422622
Guo X, Choi J, 2019. Human motion prediction via learning local structure representations and temporal dependencies. Proc 33rd AAAI Conf on Artificial Intelligence, p.2580–2587. https://doi.org/10.1609/aaai.v33i01.33012580
He KM, Gkioxari G, Dollár P, et al., 2017. Mask R-CNN. Proc IEEE Int Conf on Computer Vision, p.2980–2988. https://doi.org/10.1109/ICCV.2017.322
Heusel M, Ramsauer H, Unterthiner T, et al., 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc 31st Int Conf on Neural Information Processing Systems, p.6629–6640. https://doi.org/10.5555/3295222.3295408
Holden D, Saito J, Komura T, 2016. A deep learning framework for character motion synthesis and editing. ACM Trans Graph, 35(4):138. https://doi.org/10.1145/2897824.2925975
Ionescu C, Papava D, Olaru V, et al., 2014. Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans Patt Anal Mach Intell, 36(7):1325–1339. https://doi.org/10.1109/TPAMI.2013.248
Jain A, Zamir AR, Savarese S, et al., 2016. Structural-RNN: deep learning on spatio-temporal graphs. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.5308–5317. https://doi.org/10.1109/CVPR.2016.573
Jain DK, Zareapoor M, Jain R, et al., 2020. GAN-Poser: an improvised bidirectional GAN model for human motion prediction. Neur Comput Appl, 32(18):14579–14591. https://doi.org/10.1007/s00521-020-04941-4
Ji SW, Xu W, Yang M, et al., 2013. 3D convolutional neural networks for human action recognition. IEEE Trans Patt Anal Mach Intell, 35(1):221–231. https://doi.org/10.1109/TPAMI.2012.59
Kalman RE, 1960. A new approach to linear filtering and prediction problems. J Basic Eng, 82(1):35–45. https://doi.org/10.1115/1.3662552
Kingma DP, Ba J, 2015. Adam: a method for stochastic optimization. Proc 3rd Int Conf on Learning Representations. https://doi.org/10.48550/arXiv.1412.6980
Kundu JN, Gor M, Babu RV, 2019. BiHMP-GAN: bidirectional 3D human motion prediction GAN. Proc 33rd AAAI Conf on Artificial Intelligence, p.8553–8560. https://doi.org/10.1609/aaai.v33i01.33018553
Lehrmann AM, Gehler PV, Nowozin S, 2013. A non-parametric Bayesian network prior of human pose. Proc IEEE Int Conf on Computer Vision, p.1281–1288. https://doi.org/10.1109/ICCV.2013.162
Li C, Zhang Z, Lee WS, et al., 2018. Convolutional sequence to sequence model for human dynamics. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5226–5234. https://doi.org/10.1109/CVPR.2018.00548
Li MS, Chen SH, Zhao YH, et al., 2020. Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.211–220. https://doi.org/10.1109/CVPR42600.2020.00029
Li RL, Yang S, Ross DA, et al., 2021. AI choreographer: music conditioned 3D dance generation with AIST++. Proc IEEE/CVF Int Conf on Computer Vision, p.13381–13392. https://doi.org/10.1109/ICCV48922.2021.01315
Li YR, Wang Z, Yang XS, et al., 2019. Efficient convolutional hierarchical autoencoder for human motion prediction. Vis Comput, 35(6):1143–1156. https://doi.org/10.1007/s00371-019-01692-9
Liu C, Wang DL, Zhang H, et al., 2022. Using simulated training data of voxel-level generative models to improve 3D neuron reconstruction. IEEE Trans Med Imaging, 41(12): 3624–3635. https://doi.org/10.1109/TMI.2022.3191011
Liu LB, Yin KK, van de Panne M, et al., 2010. Sampling-based contact-rich motion control. ACM Trans Graph, 29(4):128. https://doi.org/10.1145/1778765.1778865
Liu XL, Yin JQ, Liu J, et al., 2021. TrajectoryCNN: a new spatio-temporal feature learning network for human motion prediction. IEEE Trans Circ Syst Video Technol, 31(6):2133–2146. https://doi.org/10.1109/TCSVT.2020.3021409
Mao W, Liu MM, Salzmann M, et al., 2019. Learning trajectory dependencies for human motion prediction. Proc IEEE/CVF Int Conf on Computer Vision, p.9488–9496. https://doi.org/10.1109/ICCV.2019.00958
Martinez J, Black MJ, Romero J, 2017. On human motion prediction using recurrent neural networks. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.4674–4683. https://doi.org/10.1109/CVPR.2017.497
Miyato T, Kataoka T, Koyama M, et al., 2018. Spectral normalization for generative adversarial networks. Proc 6th Int Conf on Learning Representations.
Mourot L, Hoyet L, Le Clerc F, et al., 2022. A survey on deep learning for skeleton-based human animation. Comput Graph Forum, 41(1):122–157. https://doi.org/10.1111/cgf.14426
Negrete SB, Labuguen R, Matsumoto J, et al., 2021. Multiple monkey pose estimation using OpenPose. https://doi.org/10.1101/2021.01.28.428726
Okajima S, Tournier M, Alnajjar FS, et al., 2018. Generation of human-like movement from symbolized information. Front Neurorobot, 12:43. https://doi.org/10.3389/fnbot.2018.00043
Pavllo D, Grangier D, Auli M, 2018. QuaterNet: a quaternion-based recurrent model for human motion. Proc British Machine Vision Conf. https://doi.org/10.48550/arXiv.1805.06485
Pavlovic V, Rehg JM, MacCormick J, 2000. Learning switching linear models of human motion. Proc 13th Int Conf on Neural Information Processing Systems, p.942–948. https://doi.org/10.5555/3008751.3008888
Sha T, Zhang W, Shen T, et al., 2021. Deep person generation: a survey from the perspective of face, pose and cloth synthesis. https://doi.org/10.48550/arXiv.2109.02081
Shooter M, Malleson C, Hilton A, 2021. SyDog: a synthetic dog dataset for improved 2D pose estimation. https://doi.org/10.48550/arXiv.2108.00249
Sok KW, Kim M, Lee J, 2007. Simulating biped behaviors from human motion data. ACM Trans Graph, 26(3):107.1–107.9. https://doi.org/10.1145/1276377.1276511
Stephens GJ, Johnson-Kerner B, Bialek W, et al., 2008. Dimensionality and dynamics in the behavior of C. elegans. PLoS Comput Biol, 4(4):e1000028. https://doi.org/10.1371/journal.pcbi.1000028
Sun K, Xiao B, Liu D, et al., 2019. Deep high-resolution representation learning for human pose estimation. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5686–5696. https://doi.org/10.1109/CVPR.2019.00584
Wang YC, Wang X, Jiang PL, et al., 2019. RNN-based human motion prediction via differential sequence representation. Proc IEEE 6th Int Conf on Cloud Computing and Intelligence Systems, p.138–143. https://doi.org/10.1109/CCIS48116.2019.9073734
Yan SJ, Li ZZ, Xiong YJ, et al., 2019. Convolutional sequence generation for skeleton-based action synthesis. Proc IEEE/CVF Int Conf on Computer Vision, p.4393–4401. https://doi.org/10.1109/ICCV.2019.00449
Yekutieli Y, Sagiv-Zohar R, Hochner B, et al., 2005. Dynamic model of the octopus arm. II. Control of reaching movements. J Neurophysiol, 94(2):1459–1468. https://doi.org/10.1152/jn.00685.2004
Yin KK, Loken K, van de Panne M, 2007. SIMBICON: simple biped locomotion control. ACM Trans Graph, 26(3): 105–es. https://doi.org/10.1145/1276377.1276509
Yin KK, Coros S, Beaudoin P, et al., 2008. Continuation methods for adapting simulated skills. ACM Trans Graph, 27(3): 1–7. https://doi.org/10.1145/1360612.1360680
Yu SZ, 2010. Hidden semi-Markov models. Artif Intell, 174(2): 215–243. https://doi.org/10.1016/j.artint.2009.11.011
Zhang DJ, Wu YQ, Guo MY, et al., 2021. Deep learning methods for 3D human pose estimation under different supervision paradigms: a survey. Electronics, 10(18):2267. https://doi.org/10.3390/electronics10182267
Zhang H, Starke S, Komura T, et al., 2018. Mode-adaptive neural networks for quadruped motion control. ACM Trans Graph, 37(4):145. https://doi.org/10.1145/3197517.3201366
Zhao R, Ji Q, 2018. An adversarial hierarchical hidden Markov model for human pose modeling and generation. Proc 32nd AAAI Conf on Artificial Intelligence, p.2636–2643. https://doi.org/10.1609/aaai.v32i1.11860
Author information
Authors and Affiliations
Contributions
Nenggan ZHENG and Junjun CHEN conceived the idea and designed the research. Zhefeng GONG and Yixuan SUN conducted the experiments and recorded the videos of Drosophila larval motions. Yifei YU and Zi’ao LIU preprocessed the video data. Junjun CHEN drafted the paper. Zhefeng GONG and Nenggan ZHENG helped organize the paper. Junjun CHEN and Yijun WANG revised and finalized the paper.
Corresponding author
Ethics declarations
Nenggan ZHENG is a corresponding expert of Frontiers of Information Technology & Electronic Engineering, and he was not involved with the peer review process of this paper. Junjun CHEN, Yijun WANG, Yixuan SUN, Yifei YU, Zi’ao LIU, Zhefeng GONG, and Nenggan ZHENG declare that they have no conflict of interest.
Additional information
Project supported by the Zhejiang Lab, China (No. 2020KB0AC02), the Zhejiang Provincial Key R&D Program, China (Nos. 2022C01022, 2022C01119, and 2021C03003), the National Natural Science Foundation of China (Nos. T2293723 and 61972347), the Zhejiang Provincial Natural Science Foundation, China (No. LR19F020005), and the Fundamental Research Funds for the Central Universities, China (No. 226-2022-00051)
List of supplementary materials
Fig. S1 Estimated pose sequence in the DLPose dataset depicting Drosophila larval turning motion
Fig. S2 Pose sequences depicting Drosophila larval head sweeps: (a) real pose sequence from the DLPose dataset; (b) synthesized pose sequence with the same initial poses and guiding path. The guiding and synthesized movement paths are represented by the blue and red lines, respectively
Fig. S3 Cumulative variance of the principal components (PCs) for eigenwaves (a) and eigenbodies (b)
Fig. S4 Morphological analysis for eigenwaves and eigenbodies: (a) typical pose frames (top panel) and top four eigenwaves (bottom panel) of a pose sequence (peristaltic wave position is labeled by the red arrow); (b) typical pose frames (top panel) and top four eigenbodies (bottom panel) of a pose sequence
Fig. S5 Pose sequence synthesized by RNN (a), MANN (b), and Path2Pose (c) models
Fig. S6 Synthesized long pose sequence joined with four segments depicting Drosophila larval forward locomotion
Fig. S7 Synthesized long pose sequence joined with four segments depicting Drosophila larval head sweeps and turning
Supplementary materials for







Rights and permissions
About this article
Cite this article
Chen, J., Wang, Y., Sun, Y. et al. Path guided motion synthesis for Drosophila larvae. Front Inform Technol Electron Eng 24, 1482–1496 (2023). https://doi.org/10.1631/FITEE.2200529
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2200529