Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search

Okubo, Yoshiaki; Haraguchi, Makoto; Shi, Bin

doi:10.1007/11563983_30

Yoshiaki Okubo²¹,
Makoto Haraguchi²¹ &
Bin Shi²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3735))

Included in the following conference series:

International Conference on Discovery Science

721 Accesses
3 Citations

Abstract

In this paper, we discuss a method of finding useful clusters of web pages which are significant in the sense that their contents are similar or closely related to ones of higher-ranked pages. Since we are usually careless of pages with lower ranks, they are unconditionally discarded even if their contents are similar to some pages with high ranks. We try to extract such hidden pages together with significant higher-ranked pages as a cluster.

In order to obtain such clusters, we first extract semantic correlations among terms by applying Singular Value Decomposition(SVD) to the term-document matrix generated from a corpus w.r.t. a specific topic. Based on the correlations, we can evaluate potential similarities among web pages from which we try to obtain clusters. The set of web pages is represented as a weighted graph G based on the similarities and their ranks. Our clusters can be found as pseudo-cliques in G. We present an algorithm for finding Top-N weighted pseudo-cliques. Our experimental result shows that quite valuable clusters can be actually extracted according to our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A competitive optimization approach for data clustering and orthogonal non-negative matrix factorization

Article 01 December 2020

DC-NMF: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling

Article 04 April 2017

Hierarchical community detection via rank-2 symmetric nonnegative matrix factorization

Article Open access 08 September 2017

References

Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web (1999), http://dbpubs.stanford.edu/pub/1999-66
Vakali, A., Pokorný, J., Dalamagas, T.: An Overview of Web Data Clustering Practices. In: Lindner, W., Mesiti, M., Türker, C., Tzitzikas, Y., Vakali, A.I. (eds.) EDBT 2004. LNCS, vol. 3268, pp. 597–606. Springer, Heidelberg (2004)
Chapter Google Scholar
Kita, K., Tsuda, K., Shishibori, M.: Information Retrieval Algorithms. Kyoritsu Shuppan (2002) (in Japanese)
Google Scholar
Tomita, E., Seki, T.: An Efficient Branch-and-Bound Algorithm for Finding a Maximum Clique. In: Calude, C.S., Dinneen, M.J., Vajnovszki, V. (eds.) DMTCS 2003. LNCS, vol. 2731, pp. 278–289. Springer, Heidelberg (2003)
Chapter Google Scholar
Fahle, T.: Simple and Fast: Improving a Branch-and-Bound Algorithm for Maximum Clique. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 485–498. Springer, Heidelberg (2002)
Chapter Google Scholar
Satoh, K.: A Method for Generating Data Abstraction Based on Optimal Clique Search, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2003) (in Japanese)
Google Scholar
Masuda, S.: Analysis of Ascidian Gene Expression Data by Clique Search, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2005) (in Japanese)
Google Scholar
Shi, B.: Top-N Clique Search of Web Pages, Master’s Thesis, Graduate School of Eng., Hokkaido Univ. (March 2005) (in Japanese)
Google Scholar
Okubo, Y., Haraguchi, M.: Creating Abstract Concepts for Classification by Finding Top-N Maximal Weighted Cliques. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 418–425. Springer, Heidelberg (2003)
Chapter Google Scholar
Okubo, Y., Haraguchi, M.: Finding Top-N Pseudo-Cliques in Simple Graph. In: Proceedings of the 9th World Multiconference on Systemics, Cybernetics and Informatics - WMSCI 2005, vol. III, pp. 215–220 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, N-14 W-9, Sapporo, 060-0814, Japan
Yoshiaki Okubo, Makoto Haraguchi & Bin Shi

Authors

Yoshiaki Okubo
View author publications
You can also search for this author in PubMed Google Scholar
Makoto Haraguchi
View author publications
You can also search for this author in PubMed Google Scholar
Bin Shi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science & Engineering, The University of New South Wales, Sydney, Australia
Achim Hoffmann
Institute of Scientific and Industrial Research, Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Hiroshi Motoda
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Okubo, Y., Haraguchi, M., Shi, B. (2005). Finding Significant Web Pages with Lower Ranks by Pseudo-Clique Search. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_30

Download citation

DOI: https://doi.org/10.1007/11563983_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics