Skip to main content

Extracting Positive and Negative Keywords for Web Communities

  • Conference paper
  • First Online:
Discovery Science (DS 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1967))

Included in the following conference series:

Abstract

The linkage information is shown to be useful to find goodWeb pages at a search engine [5],[10]. But, in general, a search result contains several topics. Clustering Web pages enables a user to browse them easily. There are several works on clustering Web pages [7,10-12]. In [9], we visualizedWeb graphs using spring model. But clustering is not enough to understand the topics of the clusters. Extraction of meta-data that explains communities is an important subject. Chakrabarti et al [6] used the terms in the small neighborhood around a document. Our approach is to combine the clustering and keyword extraction to interpret the communities.

To find communities, we solve the eigensystem of the matrix made from the link structure of Web pages. To get characteristic keywords from found communities, we use the algorithm developed in [1, 2, 8]. The input for the algorithm are two sets of documents - positive and negative documents. The algorithm outputs a pattern which well classifies them. This algorithm is robust for errors and noises, so that it is suitable for Web pages. The novelty of the keyword extraction algorithm is that keywords not only characterize one community but also distinguish the community from others. Thus, even if we fix a community, we have different characteristic keywords for the community according to the counter part.

We found good characteristic keywords from two communities without seeing Web pages in them. We also show an experimental result in which different keywords are extracted according to the counter part.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Arimura and S. Shimozono, Maximizing Agreement with a Classification by Bounded or Unbounded Number of Associated Words. Proc. the 9th International Symposium on Algorithms and Computation (1998).

    Google Scholar 

  2. H. Arimura, A. Wataki, R. Fujino, and S. Arikawa, A Fast Algorithm for Discovering Optimal String Patterns in Large Text Databases. Proc. the 8th International Workshop on Algorithmic Learning Theory, Otzenhausen, Germany, Lecture Notes in Artificial Intelligence 1501, Springer-Verlag, pp. 247–261, 1998.

    Google Scholar 

  3. M. W. Berry, Z. Drmac, and E. R. Jessup, Matrices, Vector Spaces, and Information Retrieval, SIAM Review, 41 (1999) pp. 335–362.

    Article  MATH  MathSciNet  Google Scholar 

  4. M. Berry, S. T. Dumains, G. W. O’brien, Using Linear Algebra for Intelligent Information Retrieval, SIAM Rev., 37 (1995), pp. 573–595.

    Article  MATH  MathSciNet  Google Scholar 

  5. S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, Proc. WWW7, 1998.

    Google Scholar 

  6. S. Chakrabarti, B. Dom, and P. Indyk, Enhanced Hypertext Categorization Using Hyperlinks, Proc. ACM SIGMOD (1998) pp.307–318.

    Google Scholar 

  7. J. Dean and M. R. Henzinger, Finding Related Pages in the World Wide Web, Proc. WWW8, 1999.

    Google Scholar 

  8. D. Ikeda, Characteristic Sets of Strings Common to Semi-Structured Documents, Proc. the 2nd International Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1721, Springer-Verlag, pp. 139–147, 1999.

    Google Scholar 

  9. D. Ikeda, T. Taguchi, and S. Hirokawa, Developing a Knowledge Network of URLs, Proc. the 2nd International Conference on Discovery Science, Lecture Notes in Artificial Intelligence 1721, Springer-Verlag, pp. 328–329, 1999.

    Google Scholar 

  10. J. M. Kleinberg, Authoritative Sources in a Hyperlinked Environment, Proc. ACM-SIAM Symp. on Discrete Algorithms, 668–677, 1998.

    Google Scholar 

  11. J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. S. Tomkins, The Web as a Graph: Measurements, Models, and Methods, Proc. 5th Annual International Conference on Computing and Combinatorics, Lecture Notes in Computer Science 1627, Springer-Verlag, pp. 1–17, 1999.

    Google Scholar 

  12. O. Zamir and O. Etzioni, Web Document Clustering: a Feasibility Demonstration, Proc. ACM SIGIR’98 (1998) pp. 46–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ikeda, D., Hirokawa, S. (2000). Extracting Positive and Negative Keywords for Web Communities. In: Arikawa, S., Morishita, S. (eds) Discovery Science. DS 2000. Lecture Notes in Computer Science(), vol 1967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44418-1_34

Download citation

  • DOI: https://doi.org/10.1007/3-540-44418-1_34

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41352-3

  • Online ISBN: 978-3-540-44418-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics