skip to main content
10.1145/3477314.3507019acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Automatic disease vector mosquitoes identification via hierarchical data stream classification

Published: 06 May 2022 Publication History

Abstract

Vector-borne diseases (VBDs), such as Dengue or Malaria, are one of the main concerns of public health agencies and governments. These diseases are mainly spread by mosquitoes acting as vectors by transmitting infected blood between humans. Machine learning can be used to design and improve control strategies of VBDs by providing models able to recognize disease vector mosquitoes and automatically capture or kill harmful species. The automatic identification of disease vector mosquitoes was not yet addressed concerning the hierarchical classification of data streams. Thus, reliable information has not been used to improve learning models, such as mosquitoes' hierarchical taxonomy. In this study, we propose a framework for the automatic identification of disease vector mosquitoes in the context of the hierarchical classification of data streams area. To this end, we propose a hierarchical adaptation of a disease vector mosquitoes' dataset to include their taxonomy and introduce kNC and Dribble, two novel classification methods fitted to hierarchical data streams representing the mosquitoes. Results depicted that our framework, using summarization techniques, achieves significantly better prediction and processing speed rates when compared to existing state-of-the-art models.

References

[1]
Charu C Aggarwal, Jiawei Han, Jianyong Wang, and Philip S Yu. 2006. A framework for on-demand classification of evolving data streams. IEEE Transactions on Knowledge and Data Engineering 18, 5 (2006), 577--589.
[2]
David W Aha, Dennis Kibler, and Marc K Albert. 1991. Instance-based learning algorithms. Machine learning 6, 1 (1991), 37--66.
[3]
Jean Paul Barddal, Heitor Murilo Gomes, Fabrício Enembreck, Bernhard Pfahringer, and Albert Bifet. 2016. On dynamic feature weighting for feature drifting data streams. In Joint european conference on machine learning and knowledge discovery in databases. Springer, 129--144.
[4]
Albert Bifet and Richard Kirkby. 2009. Data Stream Mining: A Practical Approach.
[5]
Maha Bouzid, Julii Brainard, Lee Hooper, and Paul R Hunter. 2016. Public health interventions for Aedes control in the time of Zikavirus-A meta-review on effectiveness of vector control strategies. PLoS neglected tropical diseases 10, 12 (2016), e0005176.
[6]
Liang Cao, Yufeng Wang, Bo Zhang, Qun Jin, and Athanasios V Vasilakos. 2018. GCHAR: An efficient Group-based Context---Aware human activity recognition on smartphone. J. Parallel and Distrib. Comput. 118 (2018), 67--80.
[7]
Daniel da Silva Motta, Roberto Badaró, Alex Santos, and Frank Kirchner. 2018. Use of Artificial Intelligence on the Control of Vector-Borne Diseases. Vectors and Vector-Borne Zoonotic Diseases (2018).
[8]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, Jan (2006), 1--30.
[9]
Milton Friedman. 1937. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Amer. Statist. Assoc. 32, 200 (1937), 675--701.
[10]
Joao Gama. 2010. Knowledge discovery from data streams. Chapman and Hall/CRC.
[11]
João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2013. On evaluating stream learning algorithms. Machine learning 90, 3 (2013), 317--346.
[12]
João Gama, Indré Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. ACM computing surveys (CSUR) 46, 4 (2014), 44.
[13]
Kun-Yi Huang, Chung-Hsien Wu, Qian-Bei Hong, Ming-Hsiang Su, and Yi-Hsuan Chen. 2019. Speech Emotion Recognition Using Deep Neural Network Considering Verbal and Nonverbal Speech Sounds. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 5866--5870.
[14]
Ananya Joshi and Clayton Miller. 2021. Review of machine learning techniques for mosquito control in urban environments. Ecological Informatics (2021), 101241.
[15]
Svetlana Kiritchenko and Fazel Famili. 2005. Functional Annotation of Genes Using Hierarchical Text Categorization. Proceedings of BioLink SIG, ISMB (2005).
[16]
Aris Kosmopoulos, Ioannis Partalas, Eric Gaussier, Georgios Paliouras, and Ion Androutsopoulos. 2015. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Mining and Knowledge Discovery 29, 3 (2015), 820--865.
[17]
Moritz UG Kraemer, Marianne E Sinka, Kirsten A Duda, Adrian QN Mylne, Freya M Shearer, Christopher M Barker, Chester G Moore, Roberta G Carvalho, Giovanini E Coelho, Wim Van Bortel, et al. 2015. The global distribution of the arbovirus vectors Aedes aegypti and Ae. albopictus. elife 4 (2015), e08347.
[18]
Peter Nemenyi. 1962. Distribution-free multiple comparisons. In Biometrics, Vol. 18. International Biometric Society, 263.
[19]
Hai-Long Nguyen, Yew-Kwong Woon, and Wee-Keong Ng. 2015. A survey on data stream clustering and classification. Knowledge and information systems 45, 3 (2015), 535--569.
[20]
Liadan O'callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani. 2002. Streaming-data algorithms for high-quality clustering. In Proceedings 18th International Conference on Data Engineering. IEEE, 685--694.
[21]
World Health Organization et al. 2014. A global brief on vector-borne diseases. Technical Report. World Health Organization.
[22]
Antonio Rafael Sabino Parmezan, Vinicius MA Souza, and Gustavo EAPA Batista. 2018. Towards Hierarchical Classification of Data Streams. In Iberoamerican Congress on Pattern Recognition. Springer, 314--322.
[23]
Sergio Ramírez-Gallego, Bartosz Krawczyk, Salvador García, Michał Woźniak, and Francisco Herrera. 2017. A survey on data preprocessing for data stream mining: Current status and future directions. Neurocomputing 239 (2017), 39--57.
[24]
Carlos N Silla and Alex A Freitas. 2011. A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery 22, 1--2 (2011), 31--72.
[25]
V. M. A. Souza, D. M. Reis, A. G. Maletzke, and G. E. A. P. A. Batista. 2020. Challenges in Benchmarking Stream Learning Algorithms with Real-world Data. Data Mining and Knowledge Discovery (2020), 1--54.
[26]
Michael Steinbach, Levent Ertöz, and Vipin Kumar. 2004. The challenges of clustering high dimensional data. In New directions in statistical physics. Springer, 273--309.
[27]
Eduardo Tieppo, Roger Robson dos Santos, Jean Paul Barddal, and Júlio Cesar Nievola. 2021. Hierarchical classification of data streams: a systematic literature review. Artificial Intelligence Review (2021), 1--40.
[28]
Alexey Tsymbal. 2004. The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin 106, 2 (2004), 58.
[29]
Feihong Wu, Jun Zhang, and Vasant Honavar. 2005. Learning classifiers using hierarchically structured class taxonomies. In International Symposium on Abstraction, Reformulation, and Approximation. Springer, 313--320.
[30]
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: an efficient data clustering method for very large databases. ACM sigmod record 25, 2 (1996), 103--114.

Cited By

View all
  • (2024)Adaptive learning on hierarchical data streams using window-weighted Gaussian probabilitiesApplied Soft Computing10.1016/j.asoc.2024.111271152(111271)Online publication date: Feb-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data stream classification
  2. data summarization
  3. hierarchical classification
  4. vector-borne diseases

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive learning on hierarchical data streams using window-weighted Gaussian probabilitiesApplied Soft Computing10.1016/j.asoc.2024.111271152(111271)Online publication date: Feb-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media