ABSTRACT
In recent years, the style transfer is being studied in the field of AI. It extracts a style from one data set and applies to another data set. This technique is actively studied mainly in computer vision. It is also being studied in music and speech processing with many deep learning methods. However, no research focused on changes in performance expression of music. Performance expression of music refers to the player's artistic interpretation of a piece of music, the changes in performance, and the techniques used to achieve these changes. Changes in performance are represented by slow and fast beats, changes in strength and weakness. In this study, as a part of the study of analysis and creation of performance expression models of music, the image style transfer method is applied to musical expression. In addition, in order to analyze the relationship between musical structure and musical expression, the implication-realization model of music theory is used to extract the structure of music to improve the performance of style transfer. The analysis result shows that the I-R model captures the feature of musical expression well, and moreover, the style transfer method is shown to be useful for analyzing performance expression of music. It will lead to the study of music performance expression style extraction and will contribute to the analysis and creation of performance expression models of music.
- Pierluigi Bontempi, Sergio Canazza, Filippo Carnovalini and Antonio Rodà (2023). Research in Computational Expressive Music Performance and Popular Music Production: A Potential Field of Application? In Multimodal Technologies and Interaction, 2023, 7 (2), 15, 29 pages. DOI: https://doi.org/10.3390/mti7020015.Google ScholarCross Ref
- Leon A. Gatys, Alexander S. Ecker and Matthias Bethge (2016). Image Style Transfer Using Convolutional Neural Networks, In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16), Las Vegas, NV, USA, 2016, 2414-2423. DOI: https://doi.org/10.1109/CVPR.2016.265.Google ScholarCross Ref
- Erik Reinhard, Michael Ashikhmin, Bruce Gooch and Peter Shirley (2001). Gooch and P. Shirley, Color Tansfer between Images, In IEEE Computer Graphics and Applications, vol. 21, no. 5, 34-41, July-Aug. 2001. DOI: https://doi.org/10.1109/38.946629.Google ScholarDigital Library
- Yu-Wing Tai, Jiaya Jia and Chi-Keung Tang (2005). Local Color Transfer via Probabilistic Segmentation by Expectation-Maximization, In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), San Diego, CA, USA, 2005, 747-754 1. DOI: https://doi.org/10.1109/CVPR.2005.215.Google ScholarDigital Library
- Sasaki Shigefumi (2023). Reproduction of Performance Expressions of Music by Style Transfer using Note Classification based on the Implication-Realization Model, Master's Thesis in Computer Science, Degree Programs in Systems and Information Engineering, Graduate School of Science and Technology, University of Tsukuba, in Japanese.Google Scholar
- Eugene Narmour (1990). The Analysis and Cognition of Basic Melodic Structures: The Implication-Realization Model, Chicago: University of Chicago Press.Google Scholar
- Eugene Narmour (1992). The Analysis and Cognition of Melodic Complexity: The Implication- Realization Model. The University of Chicago Press.Google Scholar
- Mizutani Tetsuya and Sasaki Shigefumi (2021). A Linear Regression Analysis of Musical Expressions using the Implication-Realization Model, In The 2021 9th International Conference on Computer and Communications Management (ICCCM '21), (July 16-18, 2021), Singapore, Singapore. ACM, New York, NY, USA, 85-91. DOI: https://doi.org/10.1145/3479162.3479175.Google ScholarDigital Library
- Allen Cadwallader and David Gagne (2019). Analysis of Tonal Music: A Schenkerian Approach (4th edition), Oxford University Press.Google Scholar
- Fred Lerdahl and Ray Jackendoff (1983). Generative Theory of Tonal Music. The MIT Press, Cambridge, https://doi.org/10.7551/mitpress/12513.003.0001Google ScholarCross Ref
- Ray Jackendoff (2001). Tonal Pitch Space, Oxford University Press, Oxford. DOI: https://doi.org/10.2307/40285402Google ScholarCross Ref
- Sakurako Yazawa, Masatoshi Hamanaka and Takehito Utsuro (2016). Subjective Melodic Similarity based on Extended Implication-Realization Model. In International Journal of Affective Engineering, 15, 3 (May 2016), 249-257. DOI: https://doi.org/10.5057/ijae.IJAE-D-15-00050Google ScholarCross Ref
- Marco Pasini (2019). MelGAN-VC: Voice Conversion and Audio Style Transfer on Arbitrarily Long Samples using Spectrograms, arXiv:1910.03713v2. DOI: https://doi.org/10.48550/arXiv.1910.03713.Google ScholarCross Ref
- Zhejing Hu, Yan Liu, Gong Chen and Yongxu Liu (2022). Can Machines Generate Personalized Music? A Hybrid Favorite-aware Method for User Preference Music Transfer, arXiv:2201.08526v1. DOI: https://doi.org/10.48550/arXiv.2201.08526.Google ScholarCross Ref
- CrestMusePEDB. Retrieved April 01, 2017 from http://www.crestmuse.jp/pedb/ (Currently out of service)Google Scholar
- François Petitjean, Alain Ketterlin and Pierre Gançarski (2010). A Global Averaging Method for Dynamic Time Warping, with Applications to Clustering, In Pattern Recognition, 44, 3, 2011, 678-693, ISSN 0031-3203. DOI: https://doi.org/10.1016/j.patcog.2010.09.013.Google ScholarDigital Library
- Meinard Müller (2007). Information Retrieval for Music and Motion, Springer Berlin, Heidelberg, Germany. DOI: https://doi.org/10.1007/978-3-540-74048-3.Google ScholarCross Ref
- Meinard Müller (2015). Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer Berlin, Heidelberg, Germany. DOI: https://doi.org/10.1007/978-3-319-21945-5.Google ScholarCross Ref
- Jianrong Xu, Boyu Diao, Bifeng Cui, Chao Li, Yongjun Xu and Hailong Hong (2021). Analysis of the Influence Degree of Network Pruning on Fine-grained Image Processing Tasks, In 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China 508-515. DOI: https://doi.org/10.1109/ICSIP52628.2021.9688612.Google ScholarCross Ref
- Xuan Mobai and Mizutani Tetsuya (2022). Peak Picking Multiple Onset Detection Function Using Recurrent Neural Networks. In The 10th International Conference on Computer and Communications Management (ICCCM 2022), (July 29–31, 2022), Okayama University, Okayama JAPAN, ACM, New York, NY, USA, 31-36. . DOI: https://doi.org/10.1145/3556223.3556228.Google ScholarDigital Library
- Yifan Ding and Mizutani Tetsuya (2022). Audio Feature Extraction for DTW- based Audio-to-Score Alignment. In The 10th International Conference on Computer and Communications Management (ICCCM 2022), (July 29–31, 2022), Okayama, Japan. ACM, New York, NY, USA, 214-220. DOI: https://doi.org/10. 1145/3556223.3556255.Google ScholarDigital Library
Index Terms
- Style Transfer of Musical Performance Expression Using Note Classification Based on the Implication-Realization Model
Recommendations
Musical composition style transfer via disentangled timbre representations
IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial IntelligenceMusic creation involves not only composing the different parts (e.g., melody, chords) of a musical work but also arranging/selecting the instruments to play the different parts. While the former has received increasing attention, the latter has not been ...
Natural interfaces for musical expression: physiphones and a physics-based organology
NIME '07: Proceedings of the 7th international conference on New interfaces for musical expressionThis paper presents two main ideas:
(1) Various newly invented liquid-based or underwater musical instruments are proposed that function like woodwind instruments but use water instead of air. These "woodwater" instruments expand the space of known ...
Creating new interfaces for musical expression: introduction to NIME (Copyright restrictions prevent ACM from providing the full text for this article)
SA '10: ACM SIGGRAPH ASIA 2010 CoursesDue to advances in digital audio technologies, computers now play a role in most music production and performance. Digital technologies offer unprecedented opportunities for creation and manipulation of sound, but the flexibilty of these new ...
Comments