Dynamic Topic Mining from News Stream Data

Chung, Seokkyung; McLeod, Dennis

doi:10.1007/978-3-540-39964-3_42

Dynamic Topic Mining from News Stream Data

Seokkyung Chung⁷ &
Dennis McLeod⁷

Conference paper

3067 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2888))

Abstract

Given the popularity of Web news services, we propose a topic mining framework that supports the identification of meaningful topics (themes) from news stream data. News articles are retrieved from Web news services and processed by data mining tools to produce useful higher-level knowledge, which is stored in a content description database. Instead of interacting with a Web news service directly, by exploiting the knowledge in the database, an information delivery agent can present an answer in response to a user request. A key challenging issue within news repository management is the high rate of documents update. That is, since several hundred news articles are published everyday by a single Web news service, it is essential to develop incremental data mining tools to cope with such dynamic environments. To this end, we present a sophisticated incremental hierarchical document clustering algorithm using a neighborhood search. The novelty of our proposed algorithm lies in exploiting locality information to reduce the amount of computation while producing high-quality clusters. Other components of topic mining (e.g., learning topic ontologies) can be performed based on the obtained document hierarchy. Experimental results show that our proposed incremental clustering produces high-quality clusters, and topic ontology provides an interpretation of the data at different levels of abstraction.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of using supervised clustering for building categorization systems. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)
Google Scholar
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)
Google Scholar
Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD Record 19(2), 322–331 (1990)
Article Google Scholar
Berchtold, S., Keim, D.A., Kreigel, H.P.: The X-tree: An index structure for high dimensional data. In: Proceedings of the 22nd International Conference on Very Large Data Bases (1996)
Google Scholar
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)
Article MATH MathSciNet Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM International Conference on Data Mining (2003)
Google Scholar
Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1985)
Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers C22, 1025–1034 (1973)
Article Google Scholar
Khan, L.: Ontology-based information selection. Ph.D. Thesis, University of Southern California (2000)
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)
Google Scholar
Maedche, A., Staab, S.: Ontology learning for the Semantic Web. IEEE Intelligent Systems 16(2) (2001)
Google Scholar
Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the 3rd Workshop on Very Large Corpora (1995)
Google Scholar
Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3(4), 235–312 (1990)
Article Google Scholar
Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Sahami, M.: Using machine learning to improve information access. Ph.D. Thesis, Stanford University (1999)
Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. IEEE Intelligent Systems: Special Issue on Applications of Intelligent Information Retrieval 14(4), 32–43 (1999)
Google Scholar
Zadeh, L.A.: Similarity relations and fuzzy orderings. Information Sciences 3, 177–200 (1971)
Article MATH MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1996)
Google Scholar
Zhao, Y., Karypis, G.: Evaluations of hierarchical clustering algorithms for document datasets. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, and Integrated Media System Center, University of Southern California, Los Angeles, California, 90089–0781
Seokkyung Chung & Dennis McLeod

Authors

Seokkyung Chung
View author publications
You can also search for this author in PubMed Google Scholar
Dennis McLeod
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, VIC 3001, Melbourne, Australia
Zahir Tari
Department of Electrical Engineering and Computer Science, Vanderbilt University, TN 37203, Nashville, USA
Douglas C. Schmidt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chung, S., McLeod, D. (2003). Dynamic Topic Mining from News Stream Data. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-540-39964-3_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20498-5
Online ISBN: 978-3-540-39964-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics