Skip to main content

Augmented Query Strategies for Active Learning in Stream Data Mining

  • Conference paper
Neural Information Processing (ICONIP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8836))

Included in the following conference series:

  • 4406 Accesses

Abstract

Active learning is used in situations where the amount of unlabeled data is abundant but it is costly to manually label the data. So, depending on our available budget, from all unlabeled instances we are to select only a subset of them to ask the oracle for manual labeling. Thus, the query strategy, i.e., how relevant instances are selected to be sent to the oracle, plays an important role in active learning. Though active learning is a very established research area, only a few research works have been done on it in the context of stream data mining. Active learning for stream data is more challenging than for static data because the repetition of queries is not feasible as revisiting the data is almost impossible. In this paper, we propose two augmented query strategies for active learning in stream data mining, namely, Margin Sampling with Variable Uncertainty (MSVU) and Entropy Sampling with Uncertainty using Randomization (ESUR). These two strategies are derived and improved from the existing methods of Variable Uncertainty (VU) and Uncertainty using Randomization (UR) respectively. We evaluate the effectiveness of our proposed MSVU and ESUR strategies by comparing them against the original VU and UR on 6 different datasets using two base classifiers: Leveraging Bagging (LB) and Single Classifier Drift (SCD). Experimental results show that our proposed strategies offer promising outcomes for various datasets and detecting concept drift in the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bin, S., Yuan, L., Xiaoyi, W.: Research on data mining models for the internet of things. In: Proc. 2010 International Conference on Image Analysis and Signal Processing (IASP), pp. 127–132 (2010)

    Google Scholar 

  2. Tripathy, A.K., Adinarayana, J., Merchant, S.N., Desai, U.B., Ninomiya, S., Hirafuji, M., Kiura, T.: Data mining and wireless sensor network for groundnut pest/disease precision protection. In: Proc. 2013 National Conference on Parallel Computing Technologies (PARCOMPTECH), pp. 1–8 (2013)

    Google Scholar 

  3. Faisal, M.A., Aung, Z., Williams, J., Sanchez, A.: Data-stream-based intrusion detection system for advanced metering infrastructure in smart grid: A feasibility study. IEEE Systems Journal (in press, 2014)

    Google Scholar 

  4. Settles, B., Craven, M.: An analysis of active learning strategies for sequence labeling tasks. In: Proc. 2008 Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 1070–1079 (2008)

    Google Scholar 

  5. Žliobaitė, I.e., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with evolving streaming data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 597–612. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: Proc. 17th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 3–12 (1994)

    Google Scholar 

  7. Scheffer, T., Decomain, C., Wrobel, S.: Active Hidden Markov Models for information extraction. In: Proc. 4th International Conference on Advances in Intelligent Data Analysis (IDA), pp. 309–318 (2001)

    Google Scholar 

  8. Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  9. Bifet, A., Holmes, G., Pfahringer, B.: Leveraging bagging for evolving data streams. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part I. LNCS, vol. 6321, pp. 135–150. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  10. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Proc. 17th Brazilian Symposium on Artificial Intelligence (SBIA), pp. 286–295 (2004)

    Google Scholar 

  11. Baena-García, M., Campo-Avila, J.D., Fidalgo, R., Bifet, A., Gavaldà, R., Morales-Bueno, R.: Early drift detection method. In: Proc. 4th International Workshop on Knowledge Discovery from Data Streams (IWKDDS), pp. 77–86 (2006)

    Google Scholar 

  12. Bifet, A., et al.: Massive Online Analysis (2012), http://moa.cs.waikato.ac.nz (release March 2012)

  13. Bifet, A., Kirkby, R.: MOA (Massive Online Analysis) datastream (2012), http://sourceforge.net/projects/moa-datastream/files/Datasets/Classification/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Faisal, M.A., Aung, Z., Woon, W.L., Svetinovic, D. (2014). Augmented Query Strategies for Active Learning in Stream Data Mining. In: Loo, C.K., Yap, K.S., Wong, K.W., Beng Jin, A.T., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8836. Springer, Cham. https://doi.org/10.1007/978-3-319-12643-2_53

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12643-2_53

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12642-5

  • Online ISBN: 978-3-319-12643-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics