Abstract
To improve the efficiency and accuracy of on-line news event detection (ONED) method, we select the words that their term frequency (TF) is greater than a threshold to create the vector space model of the news document, and propose a two-stage clustering method for ONED. This method divides the detection process into two stages. In the first stage, the similar documents collected in a certain period of time are clustered into micro-clusters. In the second stage, the micro-clusters are compared with previous event clusters. The experimental results show that the proposed method has fewer computation load, higher computing rate, and less loss of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allan, J., Papka, R., Lavrenko, V.: On-line news event detection and tracking. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 37–45. ACM Press, New York (1998)
Papka, R., Allan, J.: On-line new event detection using single pass clustering TITLE2. Technical Report (1998)
Yang, Y., Pierce, T., Carbonell, J.: A study on Retrospective and On-Line Event detection. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 28–36. ACM Press, New York (1998)
Lam, W., Meng, H., Wong, K., Yen, J.: Using contextual analysis for news event detection. Int’l Journal on Intelligent Systems 16(4), 525–546 (2001)
Brants, T., Chen, F., Farahat, A.: A system for new event detection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 330–337. ACM Press, New York (2003)
Nieola, S., Joe, C.: Combining semantic and syntactic document classifiers to improve first story detection. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 424–425. ACM Press, New York (2001)
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 297–304. ACM Press, New York (2004)
Yang, Y., Zhang, J., Carbonell, J., Jin, C.: Topic conditioned Novelty Detection. In: Proceedings of the 8th ACM SIGKDD International Conference, pp. 688–693. ACM Press, New York (2002)
Juha, M., Helena, A.M., Marko, S.: Applying Semantic Classes in Event Detection and Tracking. In: Proceedings of International Conference on Natural Language Processing (ICON 2002), pp. 175–183 (2002)
Juha, M., Helena, A.M., Marko, S.: Simple Semantics in Topic Detection and Tracking. Information Retrieval 7(3-4), 347–368 (2004)
Kuo, Z., Juan-Zi, L., Gang, W.: A new event detection model based on term reweighting. Journal of Software 19(4), 817–828 (2008) (in Chinese)
Yan, F., Ming-quan, Z., Xue-song, W.: On-Line Event Detection from Web News Stream. Journal of Software 21(suppl.), 363–372 (2010) (in Chinese)
Hua-ping, Z., Qun, L.: Calculation of the Chinese lexical analysis system LCTCLAS. Institute of Computing. Chinese Academy of Sciences (2002) (in Chinese), http://sewm.pku.edu.cn/QA/reference/LCTCLAS/FreeICTCLAS/
Xiao-yan, Z.: Research on the Representation Model and Technologies of Link Detection and Tracking on News Topic. National University of Defense Technology (2010) (in Chinese)
The linguistic data consortium, http://www.ldc.upenn.edu/
NIST. The 2003 Topic Detection and Tracking Task Definition and Evaluation Plan. National Institute of Standards and Technology (NIST) (2003), http://www.itl.nist.gov/iaui/894.01/tests/tdt/tdt2003/evalplan.html
Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths, London (1979)
Rong-lu, L.: Chinese Text Classification Corpus, http://www.nlp.org.cn/docs/docredirect.php?doc_id=281
SougouCA corpus, http://www.sogou.com/labs/dl/ca.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, H., Li, Gh., Xu, Xw. (2012). A On-Line News Documents Clustering Method. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds) Active Media Technology. AMT 2012. Lecture Notes in Computer Science, vol 7669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35236-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-35236-2_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35235-5
Online ISBN: 978-3-642-35236-2
eBook Packages: Computer ScienceComputer Science (R0)