Abstract
An interesting problem associated with the World Wide Web (Web) is the definition and delineation of so called Web communities. The Web can be characterized as a directed graph whose nodes represent Web pages and whose edges represent hyperlinks. An authority is a page that is linked to by high quality hubs, while a hub is a page that links to high quality authorities. A Web community is a highly interconnected aggregate of hubs and authorities. We define a community core to be a maximally connected bipartite subgraph of the Web graph.
We observe that the web subgraph can be viewed as a formal context and that web communities can be modeled by formal concepts. Additionally, the notions of hub and authority are captured by the extent and intent, respectively, of a concept. Though Formal Concept Analysis (FCA) has previously been applied to the Web, none of the FCA based approaches that we are aware of consider the link structure of the Web pages. We utilize notions from FCA to explore the community structure of the Web graph. We discuss the problem of utilizing this structure to locate and organize communities in the form of a knowledge base built from the resulting concept lattice and discuss methods to reduce the complexity of the knowledge base by coalescing similar Web communities. We present preliminary experimental results obtained from real Web data that demonstrate the usefulness of FCA for improving Web search.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Dean, J., Henzinger, M.R.: Finding related pages in world wide web. In: Proceedings of the Eighth International World Wide Web Conference (1999)
Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, Melbourne Australia, pp. 104–111 (1998)
Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.M.: Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems 30, 65–74 (1998)
Chakrabarti, S., Dom, B.E., Gibson, D., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Experiments in topic distillation. In: ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia (1998)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report Stanford Digital Libraries Working Paper SIDL-WP-1999-0120, Stanford University (1999)
Henzinger, M.R.: Algorithmic Challenges in Web Search Engines. Internet Mathematics 1, 115–126 (2004)
Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Extracting Large-Scale Knowledge Bases from the Web. In: The VLDB Journal, pp. 639–650 (1999)
Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Trawling the Web for Emerging Cyber-Communities. WWW8 / Computer Networks 31, 1481–1493 (1999)
Gibson, D., Kleinberg, J.M., Raghavan, P.: Inferring Web Communities from Link Topology. In: Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, pp. 225–234 (1998)
Flake, G.W., Lawrence, S.R., Giles, C.L.: Efficient Identification of Web Communities. In: Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 150–160 (2000)
Flake, G.W., Lawrence, S.R., Giles, C.L., Coetzee, F.M.: Self-Organization and Identification of Web Communities. IEEE Computer 33 (2002)
Deshpande, A., Huang, R., Raman, R.T., Song, D., Subramanian, L.: A study of the structure of the web. Technical Report 284, EECS Computer Science Division, University of California, Berkeley, Berkely, CA (1999)
Botafogo, R.A., Shneiderman, B.: Identifying Aggregates in Hypertext Structures. In: Third ACM Conference on Hypertext, San Antonio, TX, pp. 63–74 (1991)
Chung, F.R.K.: Spectral Graph Theory, CBMS, vol. 92. American Mathematical Society, Providence (1997)
Pirolli, P., Pitkow, J.E., Rao, R.: Silk from a Sow’s Ear: Extracting Usable Structures from the Web. In: Proceedings of the ACM Conference on Human Factors in Computing Systems, CHI. ACM Press, New York (1996)
He, X., Ding, C.H.Q., Zha, H., Simon, H.D.: Automatic Topic Identification Using Webpage Clustering. In: ICDM, pp. 195–202 (2001)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000)
Modha, D.S., Spangler, W.S.: Clustering Hypertext with Applications to Web Searching. In: Proceedings of the eleventh ACM Conference on Hypertext and Hypermedia, San Antonio, TX USA, pp. 143–152. ACM Press, New York (2000)
Almeida, R.B., Almeida, V.A.: A community-aware search engine. In: WWW 2004 (2004)
Cole, R., Eklund, P.: Browsing Semi-structured Web Texts Using Formal Concept Analysis. In: Delugach, H.S., Stumme, G. (eds.) ICCS 2001. LNCS (LNAI), vol. 2120, p. 319. Springer, Heidelberg (2001), citeseer.nj.nec.com/550858.html
Cole, R.J., Eklund, P.W.: Analyzing an Email Collection Using Formal Concept Analysis. In: Żytkow, J.M., Rauch, J. (eds.) PKDD 1999. LNCS (LNAI), vol. 1704, pp. 309–315. Springer, Heidelberg (1999)
Kim, M., Compton, P.: A Web-based Browsing Mechanism based on Conceptual Structure. In: Mineau, G.W. (ed.) The 9th International Conference on Conceptual Structures (ICCS 2001) (2001)
Kim, M., Compton, P.: Formal Concept Analysis for Domain-Specific Document Retrieval Analysis. In: AI 2001: Advances in Artificial Intelligence: Australian Joint Conference on Artificial Intelligence, Adelaide, Australia (2001)
Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 264. Springer, Heidelberg (2002)
Carpineto, C., Romano, G.: Concept Data Analysis: Theory and Applications. Wiley, Chichester (2004)
Kalfoglou, Y., Dasmahaptra, S., Chen-Burger, J.: Fca in knowledge technologies: experiences and opportunities. In: Eklund, P. (ed.) ICFCA 2004. LNCS (LNAI), vol. 2961, pp. 252–260. Springer, Heidelberg (2004)
Alani, H., Dasmahapatra, S., O’Hara, K., Shadbolt, N.: Identifying communities of practice through ontology network analysis. IEEE Intelligent Systems 18 (2003)
Tilley, T.A., Cole, R.J., Becker, P., Eklund, P.: A survey of formal concept analysis support for software engineering activities. In: 1st International Conference on Formal Concept Analysis, ICFCA 2003 (2003)
Haralick, R.M.: The diclique representation and decomposition of binary relations. Journal of the ACM 21, 356–366 (1974)
Snelting, G., Tip, F.: Reengineering Class Hierarchies Using Concept Analysis. In: Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering (1998)
Funk, P., Lewien, A., Snelting, G.: Algorithms for Concept Lattice Decomposition and their Application. Technical report, TU Braunschweig, FB Informatik (1998)
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C. (1993)
Lindig, C.: Fast Concept Analysis. In: Working with Conceptual Structures - Contributions to ICCS 2000 (2000)
Kuznetsov, S., Obedkov, S.A.: Comparing Performance of Algorithms for Generating Concept Lattices. In: ICCS 2001. International Workshop on Concept Lattices-based KDD (2001)
Hotho, A., Stumme, G.: Conceptual Clustering of Text Clusters. In: Proceedings FGML Workshop, Hannover (2002)
Porter, M.F.: An algorithm for suffix stripping. Program 14 (1980)
Chekuri, C., Goldwasser, M., Raghavan, P., Upfal, E.: Web Search Using Automatic Classification. In: Proceedings of WWW 1996, 6th International Conference on the World Wide Web, San Jose, US (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rome, J.E., Haralick, R.M. (2005). Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web. In: Ganter, B., Godin, R. (eds) Formal Concept Analysis. ICFCA 2005. Lecture Notes in Computer Science(), vol 3403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-32262-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-32262-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24525-4
Online ISBN: 978-3-540-32262-7
eBook Packages: Computer ScienceComputer Science (R0)