ABSTRACT
In this paper, we propose a method that takes the audio of a solo pipa performance as input and utilizes pre-trained models of other instruments to enhance the pipa's 3D skeleton motion performance. This approach aims to address the limited availability of training data. The key idea is that we believe that all instruments exhibit similar motion trends in their large limbs while playing. Therefore, we can utilize videos of other instruments playing to pre-train and capture the motion trends of the large limbs. Subsequently, we can fine-tune the model using videos of the pipa solo, thereby addressing the issue of limited data for the pipa solo. We utilize a combination of LSTM network, U-net architecture and self-attention mechanism to validate the effectiveness of instrumental pre-training using videos of pipa solos on the Internet.
- Eli Shlizerman, Lucio Dery, Hayden Schoen, and Ira Kemelmacher-Shlizerman. 2018. Audio to Body Dynamics. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7574–7583.https://doi.org/10.1109/CVPR.2018.00790Google ScholarCross Ref
- Hsuan-Kai Kao and Li Su. 2020. Temporally Guided Music-to-Body-Movement Generation. In Proceedings of the 28th ACM International Conference on Multimedia (MM '20). Association for Computing Machinery, New York, NY, USA, 147–155. https://doi.org/10.1145/3394171.3413848Google ScholarDigital Library
- Lingting Zhu, Xian Liu, Xuanyu Liu, Rui Qian, Ziwei Liu, and Lequan Yu. 2023. Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10544–10553. https://doi.org/10.1109/CVPR52729.2023.01016Google ScholarCross Ref
- Zhuoran Zhao, Jinbin Bai, Delong Chen, Debang Wang, and Yubo Pan. 2023. Taming Diffusion Models for Music-driven Conducting Motion Generation.Google Scholar
- Delong Chen, Fan Liu, Zewen Li, and Feng Xu. 2021. VirtualConductor: Music-driven Conducting Video Generation System.Google Scholar
- Ruilong Li, Shan Yang, David A. Ross, and Angjoo Kanazawa. 2021. AI Choreographer: Music Conditioned 3D Dance Generation with AIST++. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 13381–13392. https://doi.org/10.1109/ICCV48922.2021.01315Google ScholarCross Ref
- Andreas Aristidou, Anastasios Yiannakidis, Kfir Aberman, Daniel Cohen-Or, Ariel Shamir, and Yiorgos Chrysanthou. 2023. Rhythm is a Dancer: Music-Driven Motion Synthesis With Global Structure. IEEE Transactions on Visualization and Computer Graphics 29, 8 (2023), 3519–3534. https://doi.org/10.1109/TVCG.2022.3163676Google ScholarDigital Library
- Wenlin Zhuang, Congyi Wang, Jinxiang Chai, Yangang Wang, Ming Shao, and Siyu Xia. 2022. Music2Dance: DanceNet for Music-Driven Dance Generation. ACM Trans. Multimedia Comput. Commun. Appl. 18, 2, Article 65 (May 2022), 21 pages. https://doi.org/10.1145/3485664Google ScholarDigital Library
- Li Buyu, Zhao Yongchi, Zhelun, Shi,Sheng Lu. 2022. DanceFormer: Music Conditioned 3D Dance Generation with Parametric Motion Transformer. Proceedings of the AAAI Conference on Artificial Intelligence, 36(2), 1272-1279. https://doi.org/10.1609/aaai.v36i2.20014Google ScholarCross Ref
- Li Siyao, Weijiang Yu, Tianpei Gu, Chunze Lin, Quan Wang, Chen Qian, Chen Change Loy, and Ziwei Liu. 2022. Bailando: 3D Dance Generation by Actor-Critic GPT with Choreographic Memory. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11040–11049. https://doi.org/10.1109/CVPR52688.2022.01077Google ScholarCross Ref
- Yuancheng Wang, Yuyang Jing, Wei Wei, Dorian Cazau, Olivier Adam, and Qiao Wang. 2022. PipaSet and TEAS: A Multimodal Dataset and Annotation Platform for Automatic Music Transcription and Expressive Analysis Dedicated to Chinese Traditional Plucked String Instrument Pipa. IEEE Access 10, (2022), 113850–113864. https://doi.org/10.1109/ACCESS.2022.3216282Google ScholarCross Ref
- Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1302–1310. https://doi.org/10.1109/CVPR.2017.143Google ScholarCross Ref
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation.Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2023. Attention Is All You Need.Google Scholar
Index Terms
- Pipa Performance Generation Based on Pre-trained Temporally Guided Network
Recommendations
A Human-Computer Duet System for Music Performance
MM '20: Proceedings of the 28th ACM International Conference on MultimediaVirtual musicians have become a remarkable phenomenon in the contemporary multimedia arts. However, most of the virtual musicians nowadays have not been endowed with abilities to create their own behaviors, or to perform music with human musicians. In ...
iSargam: music notation representation for Indian Carnatic music
Indian classical music, including its two varieties, Carnatic and Hindustani music, has a rich music tradition and enjoys a wide audience from various parts of the world. The Carnatic music which is more popular in South India still continues to be ...
Computational Analysis of Jazz Music: Estimating Tonality through Chord Progression Distances
CSAE '23: Proceedings of the 7th International Conference on Computer Science and Application EngineeringCurrently, research in music informatics focuses extensively on music theory, particularly on the theoretical systems of Western classical music dating back to the 19th century. However, contemporary popular music genres such as pop, rock, and jazz often ...
Comments