Skip to main content

Active Learning with Automatic Soft Labeling for Induction of Decision Trees

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5549))

Abstract

Decision trees have been widely used in many data mining applications due to their interpretable representation. However, learning an accurate decision tree model often requires a large amount of labeled training data. Labeling data is costly and time consuming. In this paper, we study learning decision trees with lesser labeling cost from two perspectives: data quality and data quantity. At each step of active learning process we learn a random forest and then use it to label a large quantity of unlabeled data. To overcome the large tree size caused by the machine labeling, we generate weighted (soft) labeled data using the prediction confidence of the labeling classifier. Empirical studies show that our method can significantly improve active learning in terms of labeling cost for decision tree learning, and the improvement does not sacrifice the size of decision trees.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kohavi, R.: Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. AAAI Press, Menlo Park (1996)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT, pp. 287–294 (1992)

    Google Scholar 

  4. Domingos, P.: Knowledge acquisition from examples via multiple models. In: Proc. 14th International Conference on Machine Learning, pp. 98–106. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  5. Zhou, Z.H., Jiang, Y.: Nec4.5: Neural ensemble based c4.5. IEEE Trans. Knowl. Data Eng. 16(6), 770–773 (2004)

    Article  Google Scholar 

  6. McCallum, A., Nigam, K.: Employing em and pool-based active learning for text classification. In: ICML, pp. 350–358 (1998)

    Google Scholar 

  7. Muslea, I., Minton, S., Knoblock, C.A.: Active + semi-supervised learning = robust multi-view learning. In: ICML, pp. 435–442 (2002)

    Google Scholar 

  8. Witten, I.H., Frank, E.: Data Mining –Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  9. Quinlan, J.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Su, J., Jelber, S.S., Matwin, S., Huang, J. (2009). Active Learning with Automatic Soft Labeling for Induction of Decision Trees. In: Gao, Y., Japkowicz, N. (eds) Advances in Artificial Intelligence. Canadian AI 2009. Lecture Notes in Computer Science(), vol 5549. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01818-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01818-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01817-6

  • Online ISBN: 978-3-642-01818-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics