Skip to main content

Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web

  • Conference paper
Formal Concept Analysis (ICFCA 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3403))

Included in the following conference series:

Abstract

An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so called Web communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a highly interconnected aggregate of hubs and authorities. We define a community core to be a maximally connected bipartite subgraph of the Web graph.

We observe that the web subgraph can be viewed as a formal context and that web communities can be modeled by formal concepts. Additionally, the notions of hub and authority are captured by the extent and intent, respectively, of a concept. Though Formal Concept Analysis (FCA) has previously been applied to the Web, none of the FCA based approaches that we are aware of consider the link structure of the Web pages. We utilize notions from FCA to explore the community structure of the Web graph. We discuss the problem of utilizing this structure to locate and organize communities in the form of a knowledge base built from the resulting concept lattice and discuss methods to reduce the complexity of the knowledge base by coalescing similar Web communities. We present preliminary experimental results obtained from real Web data that demonstrate the usefulness of FCA for improving Web search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  2. Dean, J., Henzinger, M.R.: Finding related pages in world wide web. In: Proceedings of the Eighth International World Wide Web Conference (1999)

    Google Scholar 

  3. Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne Australia, pp. 104–111 (1998)

    Google Scholar 

  4. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.M.: Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems 30, 65–74 (1998)

    Article  Google Scholar 

  5. Chakrabarti, S., Dom, B.E., Gibson, D., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Experiments in topic distillation. In: ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia (1998)

    Google Scholar 

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)

    Article  Google Scholar 

  7. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report Stanford Digital Libraries Working Paper SIDL-WP-1999-0120, Stanford University (1999)

    Google Scholar 

  8. Henzinger, M.R.: Algorithmic Challenges in Web Search Engines. Internet Mathematics 1, 115–126 (2004)

    Article  MathSciNet  Google Scholar 

  9. Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting Large-Scale Knowledge Bases from the Web. In: The VLDB Journal, pp. 639–650 (1999)

    Google Scholar 

  10. Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. WWW8 / Computer Networks 31, 1481–1493 (1999)

    Article  Google Scholar 

  11. Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring Web Communities from Link Topology. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, pp. 225–234 (1998)

    Google Scholar 

  12. Flake, G.W., Lawrence, S.R., Giles, C.L.: Efficient Identification of Web Communities. In: Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 150–160 (2000)

    Google Scholar 

  13. Flake, G.W., Lawrence, S.R., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. IEEE Computer 33 (2002)

    Google Scholar 

  14. Deshpande, A., Huang, R., Raman, R.T., Song, D., Subramanian, L.: A study of the structure of the web. Technical Report 284, EECS Computer Science Division, University of California, Berkeley, Berkely, CA (1999)

    Google Scholar 

  15. Botafogo, R.A., Shneiderman, B.: Identifying Aggregates in Hypertext Structures. In: Third ACM Conference on Hypertext, San Antonio, TX, pp. 63–74 (1991)

    Google Scholar 

  16. Chung, F.R.K.: Spectral Graph Theory, CBMS, vol. 92. American Mathematical Society, Providence (1997)

    Google Scholar 

  17. Pirolli, P., Pitkow, J.E., Rao, R.: Silk from a Sow’s Ear: Extracting Usable Structures from the Web. In: Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI. ACM Press, New York (1996)

    Google Scholar 

  18. He, X., Ding, C.H.Q., Zha, H., Simon, H.D.: Automatic Topic Identification Using Webpage Clustering. In: ICDM, pp. 195–202 (2001)

    Google Scholar 

  19. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000)

    Google Scholar 

  20. Modha, D.S., Spangler, W.S.: Clustering Hypertext with Applications to Web Searching. In: Proceedings of the eleventh ACM Conference on Hypertext and Hypermedia, San Antonio, TX USA, pp. 143–152. ACM Press, New York (2000)

    Chapter  Google Scholar 

  21. Almeida, R.B., Almeida, V.A.: A community-aware search engine. In: WWW 2004 (2004)

    Google Scholar 

  22. Cole, R., Eklund, P.: Browsing Semi-structured Web Texts Using Formal Concept Analysis. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, p. 319. Springer, Heidelberg (2001), citeseer.nj.nec.com/550858.html

    Chapter  Google Scholar 

  23. Cole, R.J., Eklund, P.W.: Analyzing an Email Collection Using Formal Concept Analysis. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 309–315. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  24. Kim, M., Compton, P.: A Web-based Browsing Mechanism based on Conceptual Structure. In: Mineau, G.W. (ed.) The 9th International Conference on Conceptual Structures (ICCS 2001) (2001)

    Google Scholar 

  25. Kim, M., Compton, P.: Formal Concept Analysis for Domain-Specific Document Retrieval Analysis. In: AI 2001: Advances in Artificial Intelligence: Australian Joint Conference on Artificial Intelligence, Adelaide, Australia (2001)

    Google Scholar 

  26. Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 264. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  27. Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. Wiley, Chichester (2004)

    Book  MATH  Google Scholar 

  28. Kalfoglou, Y., Dasmahaptra, S., Chen-Burger, J.: Fca in knowledge technologies: experiences and opportunities. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 252–260. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  29. Alani, H., Dasmahapatra, S., O’Hara, K., Shadbolt, N.: Identifying communities of practice through ontology network analysis. IEEE Intelligent Systems 18 (2003)

    Google Scholar 

  30. Tilley, T.A., Cole, R.J., Becker, P., Eklund, P.: A survey of formal concept analysis support for software engineering activities. In: 1st International Conference on Formal Concept Analysis, ICFCA 2003 (2003)

    Google Scholar 

  31. Haralick, R.M.: The diclique representation and decomposition of binary relations. Journal of the ACM 21, 356–366 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  32. Snelting, G., Tip, F.: Reengineering Class Hierarchies Using Concept Analysis. In: Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (1998)

    Google Scholar 

  33. Funk, P., Lewien, A., Snelting, G.: Algorithms for Concept Lattice Decomposition and their Application. Technical report, TU Braunschweig, FB Informatik (1998)

    Google Scholar 

  34. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)

    MATH  Google Scholar 

  35. Agrawal, R., Imielinski, T., Swami, A.N.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C. (1993)

    Google Scholar 

  36. Lindig, C.: Fast Concept Analysis. In: Working with Conceptual Structures - Contributions to ICCS 2000 (2000)

    Google Scholar 

  37. Kuznetsov, S., Obedkov, S.A.: Comparing Performance of Algorithms for Generating Concept Lattices. In: ICCS 2001. International Workshop on Concept Lattices-based KDD (2001)

    Google Scholar 

  38. Hotho, A., Stumme, G.: Conceptual Clustering of Text Clusters. In: Proceedings FGML Workshop, Hannover (2002)

    Google Scholar 

  39. Porter, M.F.: An algorithm for suffix stripping. Program 14 (1980)

    Google Scholar 

  40. Chekuri, C., Goldwasser, M., Raghavan, P., Upfal, E.: Web Search Using Automatic Classification. In: Proceedings of WWW 1996, 6th International Conference on the World Wide Web, San Jose, US (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rome, J.E., Haralick, R.M. (2005). Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web. In: Ganter, B., Godin, R. (eds) Formal Concept Analysis. ICFCA 2005. Lecture Notes in Computer Science(), vol 3403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32262-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-32262-7_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24525-4

  • Online ISBN: 978-3-540-32262-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics