research-article

Groovy Pixels: Generating Drum Set Rhythms from Images

Author:
Yanjia Zhang

United World College of South East Asia, Singapore

United World College of South East Asia, Singapore
View Profile

AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and SystemNovember 2021Article No.: 67Pages 1–5https://doi.org/10.1145/3503047.3503119

Published:19 January 2022Publication History

AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System

Pages 1–5

ABSTRACT

It is a consensus that auditory and visual information can be quite similar in terms of the expression of emotions and knowledge. To explore this relationship with machine learning, this paper proposes a feasible system to generate drum beats from images. Specifically, the model converts the input image to an embedding vector, calculates a corresponding music embedding of a 4-bar drum set performance for this image embedding, and converts it to a playable MIDI file. The training process of the model is implemented by categorising the source dataset into the same set of genres and training with different combinations of images and drum beat for each genre. This paper also includes an evaluation of the performance of the system under different configurations.

References

Huriye Atilgan, Stephen M. Town, Katherine C. Wood, Gareth P. Jones, Ross K. Maddox, Adrian K.C. Lee, and Jennifer K. Bizley. 2018. Integration of Visual Information in Auditory Cortex Promotes Auditory Scene Analysis through Multisensory Binding. Neuron 97, 3 (Feb. 2018), 640–655.e4. https://doi.org/10.1016/j.neuron.2017.12.034Google ScholarCross Ref
Jiansong Chao, Haofen Wang, Wenlei Zhou, Weinan Zhang, and Yong Yu. 2011. Tunesensor: A Semantic-Driven Music Recommendation Service for Digital Photo Albums. In Proceedings of the 10th International Semantic Web Conference. ISWC2011 (October 2011).Google Scholar
Tristan A. Chaplin, Marcello G. P. Rosa, and Leo L. Lui. 2018. Auditory and Visual Motion Processing and Integration in the Primate Cerebral Cortex. Frontiers in Neural Circuits 12 (2018), 93. https://doi.org/10.3389/fncir.2018.00093Google ScholarCross Ref
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, 248–255. https://doi.org/10.1109/cvpr.2009.5206848Google Scholar
Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, and David Bamman. 2019. Learning to Groove with Inverse Sequence Transformations. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2269–2279.Google Scholar
Patrik N. Juslin and Daniel Västfjäll. 2008. Emotional Responses to Music: The Need to Consider Underlying Mechanisms. Behavioral and Brain Sciences 31, 5 (Oct. 2008), 559–575. https://doi.org/10.1017/s0140525x08005293Google ScholarCross Ref
Janis Libeks and Douglas Turnbull. 2011. You Can Judge an Artist by an Album Cover: Using Images for Music Annotation. IEEE Multimedia 18, 4 (April 2011), 30–37. https://doi.org/10.1109/mmul.2011.1Google ScholarDigital Library
David Massard. 2010. Pony Pony Run Run Playing @ Francofolies de Spa.Google Scholar
Kevin F. McCarthy (Ed.). 2001. The Performing Arts in a New Era. Rand, Santa Monica, CA.Google Scholar
Leonard B. Meyer. 1990. Emotion and Meaning in Music(17. impr ed.). Univ. of Chicago Pr, Chicago.Google Scholar
Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra. 2017. Multi-Label Music Genre Classification from Audio, Text, and Images Using Deep Features. arXiv:1707.04916 [cs] (July 2017). arxiv:1707.04916 [cs]Google Scholar
John W. Osborne. 1981. The Mapping of Thoughts, Emotions, Sensations, and Images as Responses to Music.Journal of Mental Imagery 5, 1 (1981), 133–136.Google Scholar
Alexandra Quittner and Robert Glueckauf. 1983. The Facilitative Effects of Music on Visual Imagery: A Multiple Measures Approach.Journal of Mental Imagery 7, 1 (1983), 105–119.Google Scholar
Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. 2018. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 4364–4373.Google Scholar
Melissa Saenz and Christof Koch. 2008. The Sound of Change: Visually-Induced Auditory Synesthesia. Current biology: CB 18, 15 (Aug. 2008), R650–R651. https://doi.org/10.1016/j.cub.2008.06.014Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 2818–2826. https://doi.org/10.1109/cvpr.2016.308Google Scholar
Haohan Wang and Bhiksha Raj. 2017. On the Origin of Deep Learning. arXiv:1702.07800 [cs, stat] (March 2017). arxiv:1702.07800 [cs, stat]Google Scholar
Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, and Hsin-Min Wang. 2012. The Acousticvisual Emotion Guassians Model for Automatic Generation of Music Video. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 1379. https://doi.org/10.1145/2393347.2396494Google ScholarDigital Library
Xixuan Wu, Yu Qiao, Xiaogang Wang, and Xiaoou Tang. 2012. Cross Matching of Music and Image. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 837. https://doi.org/10.1145/2393347.2396325Google ScholarDigital Library
Xixuan Wu, Yu Qiao, Xiaogang Wang, and Xiaoou Tang. 2016. Bridging Music and Image via Cross-Modal Ranking Analysis. IEEE Transactions on Multimedia 18, 7 (July 2016), 1305–1318. https://doi.org/10.1109/TMM.2016.2557722Google ScholarDigital Library
Yi Yu, Zhijie Shen, and Roger Zimmermann. 2012. Automatic Music Soundtrack Generation for Outdoor Videos from Contextual Sensor Information. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 1377. https://doi.org/10.1145/2393347.2396493Google ScholarDigital Library

Index Terms

Groovy Pixels: Generating Drum Set Rhythms from Images

Index terms have been assigned to the content through auto-classification.

Recommendations

Comparing Three Data Representations for Music with a Sequence-to-Sequence Model
AI 2020: Advances in Artificial Intelligence
Abstract
The choices of neural network model and data representation, a mapping between musical notation and input signals for a neural network, have emerged as a major challenge in creating convincing models for melody generation. Music generation can ...
Read More
A novel Xi’an drum music generation method based on Bi-LSTM deep reinforcement learning
Abstract
Chinese Folk Drum music is an excellent traditional cultural resource, it has brilliant historical and cultural heritage and excellent traditional cultural connotation. However, the survey found that the social and cultural values, tourism ...
Read More
XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music
KDD '18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

With the development of knowledge of music composition and the recent increase in demand, an increasing number of companies and research institutes have begun to study the automatic generation of music. However, previous models have limitations when ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System
November 2021
526 pages
ISBN:9781450385862
DOI:10.1145/3503047

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 January 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
computer vision
machine learning
music generation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate41of95submissions,43%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 40
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Groovy Pixels: Generating Drum Set Rhythms from Images

AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System

ABSTRACT

References

Cited By

Index Terms

Recommendations

Comparing Three Data Representations for Music with a Sequence-to-Sequence Model

A novel Xi’an drum music generation method based on Bi-LSTM deep reinforcement learning

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Groovy Pixels: Generating Drum Set Rhythms from Images

AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System

ABSTRACT

References

Cited By

Index Terms

Recommendations

Comparing Three Data Representations for Music with a Sequence-to-Sequence Model

A novel Xi’an drum music generation method based on Bi-LSTM deep reinforcement learning

XiaoIce Band: A Melody and Arrangement Generation Framework for Pop Music

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media