skip to main content
10.1145/3503047.3503119acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaissConference Proceedingsconference-collections
research-article

Groovy Pixels: Generating Drum Set Rhythms from Images

Published:19 January 2022Publication History

ABSTRACT

It is a consensus that auditory and visual information can be quite similar in terms of the expression of emotions and knowledge. To explore this relationship with machine learning, this paper proposes a feasible system to generate drum beats from images. Specifically, the model converts the input image to an embedding vector, calculates a corresponding music embedding of a 4-bar drum set performance for this image embedding, and converts it to a playable MIDI file. The training process of the model is implemented by categorising the source dataset into the same set of genres and training with different combinations of images and drum beat for each genre. This paper also includes an evaluation of the performance of the system under different configurations.

References

  1. Huriye Atilgan, Stephen M. Town, Katherine C. Wood, Gareth P. Jones, Ross K. Maddox, Adrian K.C. Lee, and Jennifer K. Bizley. 2018. Integration of Visual Information in Auditory Cortex Promotes Auditory Scene Analysis through Multisensory Binding. Neuron 97, 3 (Feb. 2018), 640–655.e4. https://doi.org/10.1016/j.neuron.2017.12.034Google ScholarGoogle ScholarCross RefCross Ref
  2. Jiansong Chao, Haofen Wang, Wenlei Zhou, Weinan Zhang, and Yong Yu. 2011. Tunesensor: A Semantic-Driven Music Recommendation Service for Digital Photo Albums. In Proceedings of the 10th International Semantic Web Conference. ISWC2011 (October 2011).Google ScholarGoogle Scholar
  3. Tristan A. Chaplin, Marcello G. P. Rosa, and Leo L. Lui. 2018. Auditory and Visual Motion Processing and Integration in the Primate Cerebral Cortex. Frontiers in Neural Circuits 12 (2018), 93. https://doi.org/10.3389/fncir.2018.00093Google ScholarGoogle ScholarCross RefCross Ref
  4. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Miami, FL, 248–255. https://doi.org/10.1109/cvpr.2009.5206848Google ScholarGoogle Scholar
  5. Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, and David Bamman. 2019. Learning to Groove with Inverse Sequence Transformations. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA(Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, 2269–2279.Google ScholarGoogle Scholar
  6. Patrik N. Juslin and Daniel Västfjäll. 2008. Emotional Responses to Music: The Need to Consider Underlying Mechanisms. Behavioral and Brain Sciences 31, 5 (Oct. 2008), 559–575. https://doi.org/10.1017/s0140525x08005293Google ScholarGoogle ScholarCross RefCross Ref
  7. Janis Libeks and Douglas Turnbull. 2011. You Can Judge an Artist by an Album Cover: Using Images for Music Annotation. IEEE Multimedia 18, 4 (April 2011), 30–37. https://doi.org/10.1109/mmul.2011.1Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. David Massard. 2010. Pony Pony Run Run Playing @ Francofolies de Spa.Google ScholarGoogle Scholar
  9. Kevin F. McCarthy (Ed.). 2001. The Performing Arts in a New Era. Rand, Santa Monica, CA.Google ScholarGoogle Scholar
  10. Leonard B. Meyer. 1990. Emotion and Meaning in Music(17. impr ed.). Univ. of Chicago Pr, Chicago.Google ScholarGoogle Scholar
  11. Sergio Oramas, Oriol Nieto, Francesco Barbieri, and Xavier Serra. 2017. Multi-Label Music Genre Classification from Audio, Text, and Images Using Deep Features. arXiv:1707.04916 [cs] (July 2017). arxiv:1707.04916 [cs]Google ScholarGoogle Scholar
  12. John W. Osborne. 1981. The Mapping of Thoughts, Emotions, Sensations, and Images as Responses to Music.Journal of Mental Imagery 5, 1 (1981), 133–136.Google ScholarGoogle Scholar
  13. Alexandra Quittner and Robert Glueckauf. 1983. The Facilitative Effects of Music on Visual Imagery: A Multiple Measures Approach.Journal of Mental Imagery 7, 1 (1983), 105–119.Google ScholarGoogle Scholar
  14. Adam Roberts, Jesse Engel, Colin Raffel, Curtis Hawthorne, and Douglas Eck. 2018. A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 4364–4373.Google ScholarGoogle Scholar
  15. Melissa Saenz and Christof Koch. 2008. The Sound of Change: Visually-Induced Auditory Synesthesia. Current biology: CB 18, 15 (Aug. 2008), R650–R651. https://doi.org/10.1016/j.cub.2008.06.014Google ScholarGoogle Scholar
  16. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 2818–2826. https://doi.org/10.1109/cvpr.2016.308Google ScholarGoogle Scholar
  17. Haohan Wang and Bhiksha Raj. 2017. On the Origin of Deep Learning. arXiv:1702.07800 [cs, stat] (March 2017). arxiv:1702.07800 [cs, stat]Google ScholarGoogle Scholar
  18. Ju-Chiang Wang, Yi-Hsuan Yang, I-Hong Jhuo, Yen-Yu Lin, and Hsin-Min Wang. 2012. The Acousticvisual Emotion Guassians Model for Automatic Generation of Music Video. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 1379. https://doi.org/10.1145/2393347.2396494Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Xixuan Wu, Yu Qiao, Xiaogang Wang, and Xiaoou Tang. 2012. Cross Matching of Music and Image. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 837. https://doi.org/10.1145/2393347.2396325Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Xixuan Wu, Yu Qiao, Xiaogang Wang, and Xiaoou Tang. 2016. Bridging Music and Image via Cross-Modal Ranking Analysis. IEEE Transactions on Multimedia 18, 7 (July 2016), 1305–1318. https://doi.org/10.1109/TMM.2016.2557722Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yi Yu, Zhijie Shen, and Roger Zimmermann. 2012. Automatic Music Soundtrack Generation for Outdoor Videos from Contextual Sensor Information. In Proceedings of the 20th ACM International Conference on Multimedia - MM ’12. ACM Press, Nara, Japan, 1377. https://doi.org/10.1145/2393347.2396493Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Groovy Pixels: Generating Drum Set Rhythms from Images
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System
            November 2021
            526 pages
            ISBN:9781450385862
            DOI:10.1145/3503047

            Copyright © 2021 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 January 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate41of95submissions,43%
          • Article Metrics

            • Downloads (Last 12 months)18
            • Downloads (Last 6 weeks)2

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format