ABSTRACT
Mosquitoes are a major global health problem. They are responsible for the transmission of diseases and can have a large impact on local economies. Monitoring mosquitoes is therefore helpful in preventing the outbreak of mosquito-borne diseases. In this paper, we propose a novel data-driven approach that leverages Transformer-based models for the identification of mosquitoes in audio recordings. The task aims at detecting the time intervals corresponding to the acoustic mosquito events in an audio signal. We formulate the problem as a sequence tagging task and train a Transformer-based model using a real-world dataset collecting mosquito recordings. By leveraging the sequential nature of mosquito recordings, we formulate the training objective so that the input recordings do not require fine-grained annotations. We show that our approach is able to outperform baseline methods using standard evaluation metrics, albeit suffering from unexpectedly high false negatives detection rates. In view of the achieved results, we propose future directions for the design of more effective mosquito detection models.
Supplemental Material
- Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems, Vol. 33 (2020), 12449--12460.Google Scholar
- Çauğdacs Bilen, Giacomo Ferroni, Francesco Tuveri, Juan Azcarreta, and Sacha Krstulović. 2020. A framework for the robust evaluation of sound event detection. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 61--65.Google ScholarCross Ref
- Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, et al. 2021. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. arXiv preprint arXiv:2110.13900 (2021).Google Scholar
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19-1423Google Scholar
- Jonathan G Fiscus, Jerome Ajot, Martial Michel, and John S Garofolo. 2006. The rich transcription 2006 spring meeting recognition evaluation. In International Workshop on Machine Learning for Multimodal Interaction. Springer, 309--322.Google ScholarDigital Library
- Eduardo Fonseca, Xavier Favory, Jordi Pons, Frederic Font, and Xavier Serra. 2021. Fsd50k: an open dataset of human-labeled sound events. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 30 (2021), 829--852.Google ScholarDigital Library
- Eduardo Fonseca, Manoj Plakal, Daniel PW Ellis, Frederic Font, Xavier Favory, and Xavier Serra. 2019. Learning sound event classifiers from web audio with noisy labels. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 21--25.Google ScholarCross Ref
- Ivan Kiskin, Adam D Cobb, Marianne Sinka, Kathy Willis, and Stephen J Roberts. 2021a. Automatic Acoustic Mosquito Tagging with Bayesian Neural Networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 351--366.Google Scholar
- Ivan Kiskin, Marianne Sinka, Adam D Cobb, Waqas Rafique, Lawrence Wang, Davide Zilli, Benjamin Gutteridge, Rinita Dam, Theodoros Marinos, Yunpeng Li, et al. 2021b. HumBugDB: A Large-scale Acoustic Mosquito Dataset. arXiv preprint arXiv:2110.07607 (2021).Google Scholar
- Ivan Kiskin, Lawrence Wang, Marianne Sinka, Adam D. Cobb, Benjamin Gutteridge, Davide Zilli, Waqas Rafique, Rinita Dam, Theodoros Marinos, Yunpeng Li, Gerard Killeen, Dickson Msaky, Emmanuel Kaindoa, Kathy Willis, and Steve J. Roberts. 2021c. HumBugDB: a large-scale acoustic mosquito dataset. https://doi.org/10.5281/zenodo.4904800 Funding from the 2014 Google Impact Challenge Award, and The Bill and Melinda Gates Foundation (https://www.gatesfoundation.org/about/committed-grants/2019/07/opp1209888).Google Scholar
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In International Conference on Learning Representations. https://openreview.net/forum?id=Bkg6RiCqY7Google Scholar
- Daniel Povey, Hossein Hadian, Pegah Ghahremani, Ke Li, and Sanjeev Khudanpur. 2018. A time-restricted self-attention layer for ASR. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5874--5878.Google ScholarDigital Library
- Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia. 1041--1044.Google ScholarDigital Library
- Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, and Stephen Roberts. 2022. The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitos. In Proceedings ACM Multimedia 2022. ISCA, Lisbon, Portugal. to appear.Google ScholarDigital Library
- Tito Spadini. 2019. Sound Events for Surveillance Applications. https://doi.org/10.5281/zenodo.3519845Google Scholar
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdfGoogle Scholar
- Sarthak Yadav and Mary Ellen Foster. 2021. GISE-51: A scalable isolated sound events dataset. https://doi.org/10.48550/ARXIV.2103.12306Google Scholar
Index Terms
- How Much Attention Should we Pay to Mosquitoes?
Recommendations
Dynamics of Mosquitoes Populations with Different Strategies for Releasing Sterile Mosquitoes
To prevent the transmissions of malaria, dengue fever, or other mosquito-borne diseases, one of the effective weapons is the sterile insect technique in which sterile mosquitoes are released to reduce or eradicate the wild mosquito population. To study the ...
Machine vision for low-cost remote control of mosquitoes by power laser
AbstractIn this paper, we present an innovative and effective method for remote monitoring of mosquitoes and their neutralization. We explain in detail how we leverage modern advances in neural networks to use a powerful laser to neutralize mosquitoes. ...
Counting Mosquitoes in the Wild: An Internet of Things Approach
GoodIT '21: Proceedings of the Conference on Information Technology for Social GoodCounting mosquitoes in the wild is a crucial capability for monitoring, prediction, and control of vector-borne diseases. Current approaches are mainly manual, where specially designed mosquito traps or ovitraps are placed in areas of interest and ...
Comments