Skip to main content

Dynamic Topic Mining from News Stream Data

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2888))

Abstract

Given the popularity of Web news services, we propose a topic mining framework that supports the identification of meaningful topics (themes) from news stream data. News articles are retrieved from Web news services and processed by data mining tools to produce useful higher-level knowledge, which is stored in a content description database. Instead of interacting with a Web news service directly, by exploiting the knowledge in the database, an information delivery agent can present an answer in response to a user request. A key challenging issue within news repository management is the high rate of documents update. That is, since several hundred news articles are published everyday by a single Web news service, it is essential to develop incremental data mining tools to cope with such dynamic environments. To this end, we present a sophisticated incremental hierarchical document clustering algorithm using a neighborhood search. The novelty of our proposed algorithm lies in exploiting locality information to reduce the amount of computation while producing high-quality clusters. Other components of topic mining (e.g., learning topic ontologies) can be performed based on the obtained document hierarchy. Experimental results show that our proposed incremental clustering produces high-quality clusters, and topic ontology provides an interpretation of the data at different levels of abstraction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of using supervised clustering for building categorization systems. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)

    Google Scholar 

  2. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)

    Google Scholar 

  3. Beckmann, N., Kriegel, H.P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. ACM SIGMOD Record 19(2), 322–331 (1990)

    Article  Google Scholar 

  4. Berchtold, S., Keim, D.A., Kreigel, H.P.: The X-tree: An index structure for high dimensional data. In: Proceedings of the 22nd International Conference on Very Large Data Bases (1996)

    Google Scholar 

  5. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using linear algebra for intelligent information retrieval. SIAM Review 37(4), 573–595 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  6. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  7. Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 3rd SIAM International Conference on Data Mining (2003)

    Google Scholar 

  8. Guttman, A.: R-Trees: A dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1985)

    Google Scholar 

  9. Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers C22, 1025–1034 (1973)

    Article  Google Scholar 

  10. Khan, L.: Ontology-based information selection. Ph.D. Thesis, University of Southern California (2000)

    Google Scholar 

  11. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)

    Google Scholar 

  12. Maedche, A., Staab, S.: Ontology learning for the Semantic Web. IEEE Intelligent Systems 16(2) (2001)

    Google Scholar 

  13. Melamed, I.D.: Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In: Proceedings of the 3rd Workshop on Very Large Corpora (1995)

    Google Scholar 

  14. Miller, G.: Wordnet: An on-line lexical database. International Journal of Lexicography 3(4), 235–312 (1990)

    Article  Google Scholar 

  15. Pelleg, D., Moore, A.: X-means: Extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning

    Google Scholar 

  16. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

  17. Sahami, M.: Using machine learning to improve information access. Ph.D. Thesis, Stanford University (1999)

    Google Scholar 

  18. Salton, G., McGill, M.J.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  19. Yang, Y., Carbonell, J., Brown, R., Pierce, T., Archibald, B.T., Liu, X.: Learning approaches for detecting and tracking news events. IEEE Intelligent Systems: Special Issue on Applications of Intelligent Information Retrieval 14(4), 32–43 (1999)

    Google Scholar 

  20. Zadeh, L.A.: Similarity relations and fuzzy orderings. Information Sciences 3, 177–200 (1971)

    Article  MATH  MathSciNet  Google Scholar 

  21. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (1996)

    Google Scholar 

  22. Zhao, Y., Karypis, G.: Evaluations of hierarchical clustering algorithms for document datasets. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Chung, S., McLeod, D. (2003). Dynamic Topic Mining from News Stream Data. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds) On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, vol 2888. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39964-3_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20498-5

  • Online ISBN: 978-3-540-39964-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics