Skip to main content

Enhancing Labeled Data Using Unlabeled Data for Topic Tracking

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

  • 850 Accesses

Abstract

In this paper, we address the problem of dealing with a collection of negative training examples used in the topic tracking task, and propose a method for enhancing positive training examples by using negative training data based on supervised machine learning techniques. We present an algorithm which combines positive example based learning (PEBL) and boosting to learn a set of negative data to train classifiers. The results using Japanese corpus showed that our method attained at 0.161 \(MIN\), and PEBL was 0.332. Similarly, the result using TDT3 showed that the improvement was 0.225 \(MIN\) compared to PEBL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.itl.nist.gov/iad/mig/tests/tdt/2000/

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)

    Google Scholar 

  2. Allan, J.: Topic Detection and Tracking. Kluwer Academic Publishers, Boston (2003)

    Google Scholar 

  3. AlSumait, L., Barbara, D., Domeniconi, C.: On-line LDA: adaptive topic models for mining text streams with application to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 3–12 (2008)

    Google Scholar 

  4. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)

    MATH  MathSciNet  Google Scholar 

  5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  6. Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100 (1998)

    Google Scholar 

  7. Blum, A., Lafferty, J., Rwebangira, M., Reddy, R.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning (ICML’01), pp. 19–26 (2001)

    Google Scholar 

  8. Connell, M., Feng, A., Kumaran, G., Raghavan, H., Shah, C., Allan, J.: UMass at TDT 2004. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (2004)

    Google Scholar 

  9. Cselle, G., Albrecht, K., Wattenhofer, R.: BuzzTrack; topic detection and tracking in email. In: Proceedings of the 12th International Conference on Intelligent User Interfaces, pp. 190–197 (2007)

    Google Scholar 

  10. Denis, F.: Pac learning from positive statistical queries. In: Proceedings of the 9th International Conference on Algorithmic Learning Theory (ALT’98), pp. 112–126 (1998)

    Google Scholar 

  11. Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU report on TDT-2: segmentation, detection and tracking. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 117–120 (1999)

    Google Scholar 

  12. Fiscus, J.G., Doddington, G.R.: Topic detection and tracking evaluation overview. In: Allan, J. (ed.) Topic Detection and Tracking, pp. 17–31. Kluwer Academic Publisher, Boston (2002)

    Chapter  Google Scholar 

  13. Franz, M., McCarley, J.S.: Unsupervised and supervised clusteringfor topic tracking. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), pp. 310–317 (2001)

    Google Scholar 

  14. Getz, G., Shental, N., Domany, E.: Semi-supervised learning - a statistical physics approach. In: Proceedings of the ICML Workshop on Learning with Partially Classified Training Data (2005)

    Google Scholar 

  15. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the ICML’09, pp. 200–209 (1999)

    Google Scholar 

  16. Joachims, T.: SVM Light Support Vector Machine. Dept. of Computer Science Cornell University (1998)

    Google Scholar 

  17. Larkey, L.S., Feng, F., Connell, M., Lavrenko, V.: Language-specific model in multilingual topic tracking. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04), pp. 402–409 (2004)

    Google Scholar 

  18. Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 127–131 (1999)

    Google Scholar 

  19. Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL’04), pp. 203–206 (2004)

    Google Scholar 

  20. Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)

    Google Scholar 

  21. Markert, K., Nissim, M.: Comparing knowledge sources for nonimal anaphora resolution. J. Assoc. Comput. Linguist. 31(3), 367–401 (2005)

    Article  Google Scholar 

  22. Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, Y., Takaoka, K., Asahara, M.: Japanese morphological analysis system chasen version 2.2.1. In Naist Technical report (2000)

    Google Scholar 

  23. Nigam, K.: Text classification from labeled and unlabeled documents using EM. J. Mach. Learn. Res. 32(2), 103–134 (2000)

    Article  Google Scholar 

  24. Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM’00), pp. 86–93 (2000)

    Google Scholar 

  25. Oard, D.W.: Topic tracking with the PRISE information retrieval system. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 94–101 (1999)

    Google Scholar 

  26. Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. J. Mach. Learn. Res. 39(2), 135–168 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  27. Allan, J.: (2000). http://www.itl.nist.gov/iad/mig/tests/tdt/2000/

  28. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In Proceedings of the 6th International Conference on Information and Knowledge Management (ICML’97), pp. 412–230 (1997)

    Google Scholar 

  29. Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the European Chapter of the Association for Computational Linguistics SIGDAT Workshop, pp. 47–50 (1995)

    Google Scholar 

  30. Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving text categorization methods for event tracking. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00), pp. 65–72 (2000)

    Google Scholar 

  31. Yu, H., Han, H., Chang, K.C.-C.: PEBL: positive example based learning for web page classification using SVM. In: Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining, pp. 239–248 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fumiyo Fukumoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Fukumoto, F., Suzuki, Y., Yamamoto, T. (2014). Enhancing Labeled Data Using Unlabeled Data for Topic Tracking. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics