skip to main content
10.1145/3025453.3025505acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

MuEns: A Multimodal Human-Machine Music Ensemble for Live Concert Performance

Published: 02 May 2017 Publication History

Abstract

Musical ensemble between human musicians and computers is a challenging task. We achieve this with a concert-quality synchronization using machine learning. Our system recognizes the position in a given song from the human performance using the microphone and camera inputs, and responds in real-time with audio and visual feedback as a music ensemble. We address three crucial requirements in a musical ensemble system. First, our system interacts with human players through both audio and visual cues, the conventional modes of coordination for musicians. Second, our system synchronizes with human performances while retaining its intended musical expression. Third, our system prevents failures during a concert due to bad tracking, by displaying an internal confidence measure and allowing a backstage human operator to "intervene" if the system is unconfident. We show the feasibility of the system with several experiments, including a professional concert.

Supplementary Material

suppl.mov (pn1297-file3.mp4)
Supplemental video
suppl.mov (pn1297p.mp4)
Supplemental video

References

[1]
Takashi Baba, Mitsuyo Hashida, and Haruhiro Katayose. 2010. "VirtualPhilharmony:" A Conducting System with Heuristics of Conducting an Orchestra. In Proc. New Interfaces for Music Expression, Vol. 2010. 263--270.
[2]
Julio José Carabias-Orti, Francisco J. Rodríguez-Serrano, Pedro Vera-Candeas, Nicolás Ruiz-Reyes, and Francisco J. Cañadas-Quesada. 2015. An Audio to Score Alignment Framework Using Spectral Factorization and Dynamic Time Warping. In Proc. International Conference on Music Information Retrieval. 742--748.
[3]
Ali T. Cemgil. 2009. Bayesian Inference for Nonnegative Matrix Factorisation Models. Computational Intelligence and Neuroscience 2009 (2009).
[4]
Marcelo Cicconet, Mason Bretan, and Gil Weinberg. 2012. Visual Cues-based Anticipation for Percussionist-Robot Interaction. In Proc. of the International Conference on Human-Robot Interaction. 117--118.
[5]
Arshia Cont. 2008. ANTESCOFO: Anticipatory Synchronization and Control of Interactive Parameters in Computer Music. In Proc. International Computer Music Conference. 33--40.
[6]
Arshia Cont. 2010. A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 6 (2010), 974--987.
[7]
Arshia Cont, Diemo Schwarz, Norbert Schnell, and Christopher Raphael. 2007. Evaluation of Real-Time Audio-to-Score Alignment. In Proc. International Conference on Music Information Retrieval. Vienna, Austria, 315--316.
[8]
Roger B Dannenberg. 1984. An On-line Algorithm for Real-time Accompaniment. In Proc. International Computer Music Conference. 193--198.
[9]
Roger B. Dannenberg. 2011. A Virtual Orchestra for Human-Computer Music Performance. In Proc. International Computer Music Conference. 185--188.
[10]
Roger B. Dannenberg and Christopher Raphael. 2006. Music Score Alignment and Computer Accompaniment. Commun. ACM 49, 8 (2006), 38--43.
[11]
Sebastian Ewert and Meinard Müller. 2012. Using Score-Informed Constraints for NMF-based Source Separation. In Proc. International Conference on Acoustics, Speech and Signal Processing. 129--132.
[12]
Dorottya Fabian, Renee Timmers, and Emery Schubert (Eds.). 2014. Expressiveness in Music Performance. Oxford University Press.
[13]
Nicolas E. Gold, Octav-Emilian Sandu, Praneeth N. Palliyaguru, Roger B. Dannenberg, Zeyu Jin, Andrew Robertson, Adam Stark, and Rebecca Kleinberger. 2013. Human-Computer Music Performance: From Synchronized Accompaniment to Musical Partner. In Proc. Sound and Music Computing Conference. 277--283.
[14]
Lorin Grubb and Roger B. Dannenberg. 1997. A Stochastic Method of Tracking a Vocal Performer. In Proc. International Computer Music Conference. 301--308.
[15]
Ning Hu, Roger B. Dannenberg, and George Tzanetakis. 2003. Polyphonic Audio Matching and Alignment for Music Retrieval. In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics. 185--188.
[16]
Tatsuhiko Itohara, Kazuhiro Nakadai, Tetsuya Ogata, and Hiroshi G. Okuno. 2012. Improvement of Audio-Visual Score Following in Robot Ensemble with Human Guitarist. In Proc. International Conference on Humanoid Robots. 574--579.
[17]
Takayuki Iwasaki. 2016. AI Piano to Ningen ga Gassou - Kyoshou no Ensou, Saigen he Daiippo [AI Piano and Humans Plays in an Ensemble - One Step Forward for Recuperating Past Legend's Performance]. The Nikkei [In Japanese] (23 May 2016).
[18]
Cyril Joder, Slim Essid, and Gaël Richard. 2010. A Conditional Random Field Viewpoint of Symbolic Audio-to-Score Matching. In Proc. ACM Multimedia. 871--874.
[19]
Rudolph E. Kalman. 1960. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering 82, 1 (1960), 35--45.
[20]
Peter E. Keller. 2001. Attentional Resource Allocation in Musical Ensemble Performance. Psychology of Music 29, 1 (2001), 20--38.
[21]
Angelica Lim, Takeshi Mizumoto, Louis-Kenzo Cahier, Takuma Otsuka, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno. 2010. Robot Musical Accompaniment: Integrating Audio and Visual Cues for Real-time Synchronization with a Human Flutist. In Proc. Intelligent Robots and Systems. 1964--1969.
[22]
Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, and Hiroshi G. Okuno. 2015. Unified Inter- and Intra-Recording Duration Model for Multiple Music Audio Alignment. In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics. 1--5.
[23]
Nicola Montecchio and Arshia Cont. 2011. Accelerating the Mixing Phase in Studio Recording Productions By Automatic Audio Alignment. In Proc. International Conference on Music Information Retrieval. 627--632.
[24]
Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, and Hiroshi G. Okuno. 2011. Incremental Bayesian Audio-to-Score Alignment with Flexible Harmonic Structure Models. In Proc. International Conference on Music Information Retrieval. 525--530.
[25]
François Pachet. 2002. The Continuator: Musical Interaction with Style. In Proc. International Computer Music Conference. 211--218.
[26]
Miller Puckette and Cort Lippe. 1992. Score Following in Practice. In Proc. International Computer Music Conference. 182--182.
[27]
Christopher Raphael. 2001. A Bayesian Network for Real-Time Musical Accompaniment. In Proc. Advances in Neural Information Processing Systems. 1433--1439.
[28]
Christopher Raphael. 2004. A Hybrid Graphical Model for Aligning Polyphonic Audio with Musical Scores. In Proc. International Conference on Music Information Retrieval. 387--394.
[29]
Christopher Raphael. 2010. Music Plus One and Machine Learning. In Proc. International Conference on Machine Learning. 21--28.
[30]
Bruno H. Repp. 2005. Sensorimotor Synchronization: a Review of the Tapping Literature. Psychonomic Bulletin & Review 12, 6 (2005), 969--992.
[31]
Bogdan Vera, Elaine Chew, and Patrick G. T. Healey. 2013. A Study of Ensemble Synchronisation Under Restricted Line of Sight. In Proc. International Conference on Music Information Retrieval. 293--298.
[32]
Guangyu Xia and Roger B. Dannenberg. 2015. Duet Interaction: Learning Musicianship for Automatic Accompaniment. In Proc. New Interfaces for Music Expression. 259--264.
[33]
Guangyu Xia, Yun Wang, Roger B. Dannenberg, and Geoffrey Gordon. 2015. Spectral Learning for Expressive Interactive Ensemble Music Performance. In Proc. International Conference on Music Information Retrieval. 816--822.
[34]
Xiao Xiao and Hiroshi Ishii. 2011. Duet for solo piano: MirrorFugue for Single User Playing with Recorded Performances. In CHI Extended Abstracts. 1285--1290.
[35]
Ryuichi Yamamoto, Shinji Sako, and Tadashi Kitamura. 2013. Robust On-line Algorithm for Real-time Audio-to-Score Alignment based on a Delayed Decision and Anticipation Framework. In Proc. International Conference on Acoustics, Speech and Signal Processing. 191--195.

Cited By

View all
  • (2024)Enhanced Detection of Musical Performance Timings Using MediaPipe and Multilayer Perceptron ClassifierAI, Computer Science and Robotics Technology10.5772/acrt.202400023Online publication date: 30-Aug-2024
  • (2023)Role of Multimodal Learning Systems in Technology-Enhanced Learning (TEL): A Scoping ReviewResponsive and Sustainable Educational Futures10.1007/978-3-031-42682-7_12(164-182)Online publication date: 28-Aug-2023
  • (2021)The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machinesHumanities and Social Sciences Communications10.1057/s41599-021-00751-88:1Online publication date: 17-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
May 2017
7138 pages
ISBN:9781450346559
DOI:10.1145/3025453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-machine music ensemble
  2. live concert system
  3. machine learning
  4. multimodal interaction
  5. score following

Qualifiers

  • Research-article

Conference

CHI '17
Sponsor:

Acceptance Rates

CHI '17 Paper Acceptance Rate 600 of 2,400 submissions, 25%;
Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)5
Reflects downloads up to 13 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhanced Detection of Musical Performance Timings Using MediaPipe and Multilayer Perceptron ClassifierAI, Computer Science and Robotics Technology10.5772/acrt.202400023Online publication date: 30-Aug-2024
  • (2023)Role of Multimodal Learning Systems in Technology-Enhanced Learning (TEL): A Scoping ReviewResponsive and Sustainable Educational Futures10.1007/978-3-031-42682-7_12(164-182)Online publication date: 28-Aug-2023
  • (2021)The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machinesHumanities and Social Sciences Communications10.1057/s41599-021-00751-88:1Online publication date: 17-Mar-2021
  • (2020)2. Vivaldi 's “The Four Seasons” Live Animation Concert2. 映像と音楽の同期システムを用いた ヴィヴァルディ「四季」ライブアニメーション コンサートThe Journal of The Institute of Image Information and Television Engineers10.3169/itej.74.62074:4(620-621)Online publication date: 2020
  • (2019)Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and Listen to MusicIEEE Signal Processing Magazine10.1109/MSP.2018.287436036:1(74-81)Online publication date: Jan-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media