Abstract
In this paper, we address the problem of dealing with a collection of negative training examples used in the topic tracking task, and propose a method for enhancing positive training examples by using negative training data based on supervised machine learning techniques. We present an algorithm which combines positive example based learning (PEBL) and boosting to learn a set of negative data to train classifiers. The results using Japanese corpus showed that our method attained at 0.161 \(MIN\), and PEBL was 0.332. Similarly, the result using TDT3 showed that the improvement was 0.225 \(MIN\) compared to PEBL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)
Allan, J.: Topic Detection and Tracking. Kluwer Academic Publishers, Boston (2003)
AlSumait, L., Barbara, D., Domeniconi, C.: On-line LDA: adaptive topic models for mining text streams with application to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 3–12 (2008)
Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Workshop on Computational Learning Theory, pp. 92–100 (1998)
Blum, A., Lafferty, J., Rwebangira, M., Reddy, R.: Learning from labeled and unlabeled data using graph mincuts. In: Proceedings of the 18th International Conference on Machine Learning (ICML’01), pp. 19–26 (2001)
Connell, M., Feng, A., Kumaran, G., Raghavan, H., Shah, C., Allan, J.: UMass at TDT 2004. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (2004)
Cselle, G., Albrecht, K., Wattenhofer, R.: BuzzTrack; topic detection and tracking in email. In: Proceedings of the 12th International Conference on Intelligent User Interfaces, pp. 190–197 (2007)
Denis, F.: Pac learning from positive statistical queries. In: Proceedings of the 9th International Conference on Algorithmic Learning Theory (ALT’98), pp. 112–126 (1998)
Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU report on TDT-2: segmentation, detection and tracking. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 117–120 (1999)
Fiscus, J.G., Doddington, G.R.: Topic detection and tracking evaluation overview. In: Allan, J. (ed.) Topic Detection and Tracking, pp. 17–31. Kluwer Academic Publisher, Boston (2002)
Franz, M., McCarley, J.S.: Unsupervised and supervised clusteringfor topic tracking. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), pp. 310–317 (2001)
Getz, G., Shental, N., Domany, E.: Semi-supervised learning - a statistical physics approach. In: Proceedings of the ICML Workshop on Learning with Partially Classified Training Data (2005)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of the ICML’09, pp. 200–209 (1999)
Joachims, T.: SVM Light Support Vector Machine. Dept. of Computer Science Cornell University (1998)
Larkey, L.S., Feng, F., Connell, M., Lavrenko, V.: Language-specific model in multilingual topic tracking. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’04), pp. 402–409 (2004)
Lowe, S.A.: The beta-binomial mixture model and its application to TDT tracking and detection. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 127–131 (1999)
Maeireizo, B., Litman, D., Hwa, R.: Co-training for predicting emotions with spoken dialogue data. The Companion Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL’04), pp. 203–206 (2004)
Manevitz, L.M., Yousef, M.: One-class SVMs for document classification. J. Mach. Learn. Res. 2, 139–154 (2001)
Markert, K., Nissim, M.: Comparing knowledge sources for nonimal anaphora resolution. J. Assoc. Comput. Linguist. 31(3), 367–401 (2005)
Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, Y., Takaoka, K., Asahara, M.: Japanese morphological analysis system chasen version 2.2.1. In Naist Technical report (2000)
Nigam, K.: Text classification from labeled and unlabeled documents using EM. J. Mach. Learn. Res. 32(2), 103–134 (2000)
Nigam, K., Ghani, R.: Analyzing the effectiveness and applicability of co-training. In: Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM’00), pp. 86–93 (2000)
Oard, D.W.: Topic tracking with the PRISE information retrieval system. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 94–101 (1999)
Schapire, R.E., Singer, Y.: BoosTexter: a boosting-based system for text categorization. J. Mach. Learn. Res. 39(2), 135–168 (2000)
Allan, J.: (2000). http://www.itl.nist.gov/iad/mig/tests/tdt/2000/
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In Proceedings of the 6th International Conference on Information and Knowledge Management (ICML’97), pp. 412–230 (1997)
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the European Chapter of the Association for Computational Linguistics SIGDAT Workshop, pp. 47–50 (1995)
Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving text categorization methods for event tracking. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00), pp. 65–72 (2000)
Yu, H., Han, H., Chang, K.C.-C.: PEBL: positive example based learning for web page classification using SVM. In: Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining, pp. 239–248 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Fukumoto, F., Suzuki, Y., Yamamoto, T. (2014). Enhancing Labeled Data Using Unlabeled Data for Topic Tracking. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)