research-article

MuEns: A Multimodal Human-Machine Music Ensemble for Live Concert Performance

Authors:

Kazuhiko YamamotoAuthors Info & Claims

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Pages 4290 - 4301

https://doi.org/10.1145/3025453.3025505

Published: 02 May 2017 Publication History

Abstract

Musical ensemble between human musicians and computers is a challenging task. We achieve this with a concert-quality synchronization using machine learning. Our system recognizes the position in a given song from the human performance using the microphone and camera inputs, and responds in real-time with audio and visual feedback as a music ensemble. We address three crucial requirements in a musical ensemble system. First, our system interacts with human players through both audio and visual cues, the conventional modes of coordination for musicians. Second, our system synchronizes with human performances while retaining its intended musical expression. Third, our system prevents failures during a concert due to bad tracking, by displaying an internal confidence measure and allowing a backstage human operator to "intervene" if the system is unconfident. We show the feasibility of the system with several experiments, including a professional concert.

Supplementary Material

suppl.mov (pn1297-file3.mp4)

Supplemental video

Download
72.82 MB

suppl.mov (pn1297p.mp4)

Supplemental video

Download
2.00 MB

References

[1]

Takashi Baba, Mitsuyo Hashida, and Haruhiro Katayose. 2010. "VirtualPhilharmony:" A Conducting System with Heuristics of Conducting an Orchestra. In Proc. New Interfaces for Music Expression, Vol. 2010. 263--270.

[2]

Julio José Carabias-Orti, Francisco J. Rodríguez-Serrano, Pedro Vera-Candeas, Nicolás Ruiz-Reyes, and Francisco J. Cañadas-Quesada. 2015. An Audio to Score Alignment Framework Using Spectral Factorization and Dynamic Time Warping. In Proc. International Conference on Music Information Retrieval. 742--748.

[3]

Ali T. Cemgil. 2009. Bayesian Inference for Nonnegative Matrix Factorisation Models. Computational Intelligence and Neuroscience 2009 (2009).

Digital Library

[4]

Marcelo Cicconet, Mason Bretan, and Gil Weinberg. 2012. Visual Cues-based Anticipation for Percussionist-Robot Interaction. In Proc. of the International Conference on Human-Robot Interaction. 117--118.

Digital Library

[5]

Arshia Cont. 2008. ANTESCOFO: Anticipatory Synchronization and Control of Interactive Parameters in Computer Music. In Proc. International Computer Music Conference. 33--40.

[6]

Arshia Cont. 2010. A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 6 (2010), 974--987.

Digital Library

[7]

Arshia Cont, Diemo Schwarz, Norbert Schnell, and Christopher Raphael. 2007. Evaluation of Real-Time Audio-to-Score Alignment. In Proc. International Conference on Music Information Retrieval. Vienna, Austria, 315--316.

[8]

Roger B Dannenberg. 1984. An On-line Algorithm for Real-time Accompaniment. In Proc. International Computer Music Conference. 193--198.

[9]

Roger B. Dannenberg. 2011. A Virtual Orchestra for Human-Computer Music Performance. In Proc. International Computer Music Conference. 185--188.

[10]

Roger B. Dannenberg and Christopher Raphael. 2006. Music Score Alignment and Computer Accompaniment. Commun. ACM 49, 8 (2006), 38--43.

Digital Library

[11]

Sebastian Ewert and Meinard Müller. 2012. Using Score-Informed Constraints for NMF-based Source Separation. In Proc. International Conference on Acoustics, Speech and Signal Processing. 129--132.

[12]

Dorottya Fabian, Renee Timmers, and Emery Schubert (Eds.). 2014. Expressiveness in Music Performance. Oxford University Press.

[13]

Nicolas E. Gold, Octav-Emilian Sandu, Praneeth N. Palliyaguru, Roger B. Dannenberg, Zeyu Jin, Andrew Robertson, Adam Stark, and Rebecca Kleinberger. 2013. Human-Computer Music Performance: From Synchronized Accompaniment to Musical Partner. In Proc. Sound and Music Computing Conference. 277--283.

[14]

Lorin Grubb and Roger B. Dannenberg. 1997. A Stochastic Method of Tracking a Vocal Performer. In Proc. International Computer Music Conference. 301--308.

[15]

Ning Hu, Roger B. Dannenberg, and George Tzanetakis. 2003. Polyphonic Audio Matching and Alignment for Music Retrieval. In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics. 185--188.

[16]

Tatsuhiko Itohara, Kazuhiro Nakadai, Tetsuya Ogata, and Hiroshi G. Okuno. 2012. Improvement of Audio-Visual Score Following in Robot Ensemble with Human Guitarist. In Proc. International Conference on Humanoid Robots. 574--579.

[17]

Takayuki Iwasaki. 2016. AI Piano to Ningen ga Gassou - Kyoshou no Ensou, Saigen he Daiippo [AI Piano and Humans Plays in an Ensemble - One Step Forward for Recuperating Past Legend's Performance]. The Nikkei [In Japanese] (23 May 2016).

[18]

Cyril Joder, Slim Essid, and Gaël Richard. 2010. A Conditional Random Field Viewpoint of Symbolic Audio-to-Score Matching. In Proc. ACM Multimedia. 871--874.

Digital Library

[19]

Rudolph E. Kalman. 1960. A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering 82, 1 (1960), 35--45.

[20]

Peter E. Keller. 2001. Attentional Resource Allocation in Musical Ensemble Performance. Psychology of Music 29, 1 (2001), 20--38.

[21]

Angelica Lim, Takeshi Mizumoto, Louis-Kenzo Cahier, Takuma Otsuka, Toru Takahashi, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno. 2010. Robot Musical Accompaniment: Integrating Audio and Visual Cues for Real-time Synchronization with a Human Flutist. In Proc. Intelligent Robots and Systems. 1964--1969.

[22]

Akira Maezawa, Katsutoshi Itoyama, Kazuyoshi Yoshii, and Hiroshi G. Okuno. 2015. Unified Inter- and Intra-Recording Duration Model for Multiple Music Audio Alignment. In Proc. Workshop on Applications of Signal Processing to Audio and Acoustics. 1--5.

[23]

Nicola Montecchio and Arshia Cont. 2011. Accelerating the Mixing Phase in Studio Recording Productions By Automatic Audio Alignment. In Proc. International Conference on Music Information Retrieval. 627--632.

[24]

Takuma Otsuka, Kazuhiro Nakadai, Tetsuya Ogata, and Hiroshi G. Okuno. 2011. Incremental Bayesian Audio-to-Score Alignment with Flexible Harmonic Structure Models. In Proc. International Conference on Music Information Retrieval. 525--530.

[25]

François Pachet. 2002. The Continuator: Musical Interaction with Style. In Proc. International Computer Music Conference. 211--218.

[26]

Miller Puckette and Cort Lippe. 1992. Score Following in Practice. In Proc. International Computer Music Conference. 182--182.

[27]

Christopher Raphael. 2001. A Bayesian Network for Real-Time Musical Accompaniment. In Proc. Advances in Neural Information Processing Systems. 1433--1439.

[28]

Christopher Raphael. 2004. A Hybrid Graphical Model for Aligning Polyphonic Audio with Musical Scores. In Proc. International Conference on Music Information Retrieval. 387--394.

[29]

Christopher Raphael. 2010. Music Plus One and Machine Learning. In Proc. International Conference on Machine Learning. 21--28.

[30]

Bruno H. Repp. 2005. Sensorimotor Synchronization: a Review of the Tapping Literature. Psychonomic Bulletin & Review 12, 6 (2005), 969--992.

[31]

Bogdan Vera, Elaine Chew, and Patrick G. T. Healey. 2013. A Study of Ensemble Synchronisation Under Restricted Line of Sight. In Proc. International Conference on Music Information Retrieval. 293--298.

[32]

Guangyu Xia and Roger B. Dannenberg. 2015. Duet Interaction: Learning Musicianship for Automatic Accompaniment. In Proc. New Interfaces for Music Expression. 259--264.

[33]

Guangyu Xia, Yun Wang, Roger B. Dannenberg, and Geoffrey Gordon. 2015. Spectral Learning for Expressive Interactive Ensemble Music Performance. In Proc. International Conference on Music Information Retrieval. 816--822.

[34]

Xiao Xiao and Hiroshi Ishii. 2011. Duet for solo piano: MirrorFugue for Single User Playing with Recorded Performances. In CHI Extended Abstracts. 1285--1290.

Digital Library

[35]

Ryuichi Yamamoto, Shinji Sako, and Tadashi Kitamura. 2013. Robust On-line Algorithm for Real-time Audio-to-Score Alignment based on a Delayed Decision and Anticipation Framework. In Proc. International Conference on Acoustics, Speech and Signal Processing. 191--195.

Cited By

Tobita KMima K(2024)Enhanced Detection of Musical Performance Timings Using MediaPipe and Multilayer Perceptron ClassifierAI, Computer Science and Robotics Technology10.5772/acrt.202400023Online publication date: 30-Aug-2024
https://doi.org/10.5772/acrt.20240002
Lee YLimbu BRusak ZSpecht M(2023)Role of Multimodal Learning Systems in Technology-Enhanced Learning (TEL): A Scoping ReviewResponsive and Sustainable Educational Futures10.1007/978-3-031-42682-7_12(164-182)Online publication date: 28-Aug-2023
https://doi.org/10.1007/978-3-031-42682-7_12
Chakraborty SDutta STimoney J(2021)The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machinesHumanities and Social Sciences Communications10.1057/s41599-021-00751-88:1Online publication date: 17-Mar-2021
https://doi.org/10.1057/s41599-021-00751-8
Show More Cited By

Index Terms

MuEns: A Multimodal Human-Machine Music Ensemble for Live Concert Performance
1. Applied computing
  1. Arts and humanities
    1. Sound and music computing
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Music retrieval
  2. Information systems applications
    1. Multimedia information systems

Recommendations

Design and implementation of two-level synchronization for interactive music robot
AAAI'10: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence

Our goal is to develop an interactive music robot, i.e., a robot that presents a musical expression together with humans. A music interaction requires two important functions: synchronization with the music and musical expression, such as singing and ...
Virtual rap dancer: invitation to dance
CHI EA '06: CHI '06 Extended Abstracts on Human Factors in Computing Systems

This paper presents a virtual rap dancer that is able to dance to the beat of music coming in from music recordings, beats obtained from music, voice or other input through a microphone, motion beats detected in the video stream of a human dancer, or ...
An effective method for audio-to-score alignment using onsets and modified constant Q spectra

This paper proposes an effective algorithm for polyphonic audio-to-score alignment that aligns a polyphonic music performance to its corresponding score. The proposed framework consists of three steps: onset detection, note matching, and dynamic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

May 2017

7138 pages

ISBN:9781450346559

DOI:10.1145/3025453

General Chairs:
Gloria Mark
University of California Irvine
,
Susan Fussell
Cornell University
,
Program Chairs:
Cliff Lampe
University of Michigan
,
m.c. schraefel
University of Southampton
,
Juan Pablo Hourcade
University of Iowa
,
Caroline Appert
Université Paris-Sud
,
Daniel Wigdor
University of Toronto

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCHI: ACM Special Interest Group on Computer-Human Interaction

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 May 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CHI '17

Sponsor:

SIGCHI

CHI '17: CHI Conference on Human Factors in Computing Systems

May 6 - 11, 2017

Colorado, Denver, USA

Acceptance Rates

CHI '17 Paper Acceptance Rate 600 of 2,400 submissions, 25%;

Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
546
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)5

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tobita KMima K(2024)Enhanced Detection of Musical Performance Timings Using MediaPipe and Multilayer Perceptron ClassifierAI, Computer Science and Robotics Technology10.5772/acrt.202400023Online publication date: 30-Aug-2024
https://doi.org/10.5772/acrt.20240002
Lee YLimbu BRusak ZSpecht M(2023)Role of Multimodal Learning Systems in Technology-Enhanced Learning (TEL): A Scoping ReviewResponsive and Sustainable Educational Futures10.1007/978-3-031-42682-7_12(164-182)Online publication date: 28-Aug-2023
https://doi.org/10.1007/978-3-031-42682-7_12
Chakraborty SDutta STimoney J(2021)The Cyborg Philharmonic: Synchronizing interactive musical performances between humans and machinesHumanities and Social Sciences Communications10.1057/s41599-021-00751-88:1Online publication date: 17-Mar-2021
https://doi.org/10.1057/s41599-021-00751-8
Kiriyama TUsuha RUehira TKuwabara TKoshida NMaezawa ATamura M(2020)2. Vivaldi 's “The Four Seasons” Live Animation Concert2. 映像と音楽の同期システムを用いたヴィヴァルディ「四季」ライブアニメーションコンサートThe Journal of The Institute of Image Information and Television Engineers10.3169/itej.74.62074:4(620-621)Online publication date: 2020
https://doi.org/10.3169/itej.74.620
Goto MDannenberg R(2019)Music Interfaces Based on Automatic Music Signal Analysis: New Ways to Create and Listen to MusicIEEE Signal Processing Magazine10.1109/MSP.2018.287436036:1(74-81)Online publication date: Jan-2019
https://doi.org/10.1109/MSP.2018.2874360

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten