Abstract
New event detection (NED) involves monitoring one or multiple news streams to detect the stories that report on new events. With the overwhelming volume of news available today, NED has become a challenging task. In this paper, we proposed a new NED model based on incremental PLSA(IPLSA), and it can handle new document arriving in a stream and update parameters with less time complexity. Moreover, to avoid the limitation of TF-IDF method, a new approach of term reweighting is proposed. By dynamically exploiting importance of documents in discrimination of terms and documents’ topic information, this approach is more accurate. Experimental results on Linguistic Data Consortium (LDC) datasets TDT4 show that the proposed model can improve both recall and precision of NED task significantly, compared to the baseline system and other existing systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J.: Topic Detection and Tracking: Event-Based Information Organization. Kluwer Academic Publishers, Dordrecht (2002)
Papka, R., Allan, J.: On-line New Event Detection Using Single Pass Clustering TITLE2: Technical Report UM-CS-1998-021 (1998)
Allan, J., Lavrenko, V., Jin, H.: First story detection in tdt is hard. Washiongton DC. In: Proceedings of the Ninth International Conference on Informaiton and Knowledge Management (2000)
Giridhar, K., Allan, J., Andrew, M.: Classification Models for New Event Detection. In: Proceeding of CIKM (2004)
Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-line Event Detection. In: Proceedings of SIGIR, Melbourne, Australia, pp. 28–36 (1998)
Allan, J., Lavrenko, V., Malin, D., Swan, R.: Detections, Bounds, and Timelines: Umass and tdt-3. In: Proceedings of Topic Detection and Tracking Workshop (TDT-3), Vienna, VA, pp. 167–174 (2000)
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic-conditioned Novelty Detection. In: Proceedings of the 8th ACM SIGKDD International Conference, pp. 688–693 (2002)
Juha, M., Helena, A.M., Marko, S.: Applying Semantic Classes in Event Detection and Tracking. In: Proceedings of International Conference on Natural Language Processing, pp. 175–183 (2002)
Juha, M., Helena, A.M., Marko, S.: Simple Semantics in Topic Detection and Tracking. Information Retrieval, 347–368 (2004)
Giridhar, K., Allan, J.: Text Classification and Named Entities for New Event Detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference, New York, NY, USA, pp. 297–304 (2004)
Papka, R., Allan, J.: On-line New Event Detection Using Single Pass Clustering TITLE2: Technical Report UM-CS-1998-021 (1998)
Lam, W., Meng, H., Wong, K., Yen, J.: Using Contextual Analysis for News Event Detection. International Journal on Intelligent Systems, 525–546 (2001)
Thorsten, B., Francine, C., Ayman, F.: A System for New Event Detection. In: Proceedings of the 26th AnnualInternational ACM SIGIR Conference, pp. 330–337. ACM Press, New York (2003)
Nicol, S.a., Joe, C.: Combining Semantic and Syntactic Document Classifiers to Improve First Story Detection. In: Proceedings of the 24th Annual International ACM SIGIR Conference, pp. 424–425. ACM Press, New York (2001)
Luo, G., Tang, C., Yu, P.S.: Resource-Adaptive Real-Time New Event Detection. In: SIGMOD, pp. 497–508 (2007)
Kuo, Z., Zi, L.J., Gang, W.: New Event Detection Based on Indexing-tree and Named Entity. In: Proceedings of SIGIR, pp. 215–222 (2007)
Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Applying semantic classes in event detection and tracking. In: Proceedings of International Conference on Natural Language Processing, pp. 175–183 (2002)
Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Simple semantics in topic detection and tracking. In: Information Retrieval, pp. 347–368 (2004)
Zhang, J., Ghahramani, Z., Yang, Y.: A probabilistic model for online document clustering with application to novelty detection. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 1617–1624. MIT Press, Cambridge (2005)
Hofmann, T.: Probabilistic Latent Semantic Indexing. In: Proc. ACMSIGIR 1999 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Royal Statistical Soc. B 39, 1–38 (1977)
Brants, T., Chen, F., Tsochantaridis, I.: Topic-Based Document Segmentation with Probabilistic Latent Semantic Analysis. In: Proc. 11th ACM Int’l Conf. Information and Knowledge Management (2002)
Girolami, M., Kaban, A.: On an Equivalence Between PLSI and LDA. In: Proc. of SIGIR, pp. 433–434 (2003)
Thomas, H.: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Maching Learning Journal 42(1-2), 177–196 (2001)
Chou, T.C., Chen, M.C.: Using Incremental PLSA for Threshold Resilient Online Event Anlysis. IEEE Transaction on Knowledge and Data Engineering 20(3), 289–299 (2008)
Chien, J.T., Wu, M.S.: Adaptive Bayesian Latent Semantic Analysis. IEEE Transactions on Audio, Speech, and Language Processing 16(1), 198–207 (2008)
Wu, H., Yongji, W., Xiang, C.: Incremental probabilistic latent semantic analysis for automatic question recommendation. In: Proceedings of ACM conference on Recommender systems, Lausanne, Switzerland, October 23-25 (2008)
Yang, Y., Pedersen, J.: A Comparative Study on Feature Selection in Text Categorization. In: Fisher, J.D.H. (ed.) The Fourteenth International Conference on MachineLearning, pp. 412–420. Morgan Kaufmann, San Francisco (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, X., Li, Z. (2009). Online New Event Detection Based on IPLSA. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-03348-3_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03347-6
Online ISBN: 978-3-642-03348-3
eBook Packages: Computer ScienceComputer Science (R0)