Abstract
With the rapid development of the Web, how to add structural guidance (in the form of concept hierarchies) for Web document navigation becomes a hot research topic. In this paper, we present a method for the automatic acquisition of concept hierarchies. Given a set of concepts, each concept is regarded as a vertex in an undirected, weighted graph. The problem of concept hierarchy construction is then transformed into a modified graph partitioning problem and solved by spectral methods. As the undirected graph cannot accurately depict the hyponymy information regarding the concepts, subsumption estimation is introduced to guide the spectral clustering algorithm. Experiments on real data show very encouraging results.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yahoo Directory, http://www.yahoo.com
Open Directory, http://www.dmoz.org
Chuang, S.-L., Chien, L.-F.: A Practical Web-based Approach to Generating Topic Hierarchy for Text Segments. In: Proceedings of the 2004 ACM CIKM International Conference on Information and Knowledge Management, Washington, pp. 127–136 (2004)
Fellbaum, C.: WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)
Cimiano, P., Pivk, A., Schmidt-Thieme, L., Staab, S.: Learning Taxonomic Relations from Heterogeneous Sources of Evidence. In: Ontology Learning from Text: Methods, Evaluation and Applications, pp. 59–73. IOS Press, Amsterdam (2005)
Caraballo, S.A.: Automatic Construction of a Hypernym-labeled Noun Hierarchy from Text. In: Proceedings of 27th Annual Meeting of the Association for Computational Linguistics, Univeristy of Maryland, College Park, Maryland (1999)
Velardi, P., Fabriani, P., Missikoff, M.: Using Text Processing Techniques to Automatically Enrich a Domain Ontology. In: Proceedings of 2nd International Conference on Formal Ontology in Information Systems, pp. 270–284 (2001)
Hearst, M.A.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: Proceedings of the 14th International Conference on Computational Linguistics, pp. 539–545 (1992)
Zamir, O., Etzioni, O.: Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks 31(11-16), 1361–1374 (1999)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y.: Learning To Cluster Web Search Results. In: Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval, Sheffield, United Kingdom, pp. 210–217 (2004)
Vivisimo, http://vivisimo.com/html/index
Gao, B., Liu, T.-Y., Feng, G., Qin, T., Cheng, Q.S., Ma, W.-Y.: Hierarchical Taxonomy Preparation for Text Categorization Using Consistent Bipartite Spectral Graph Copartitioning. IEEE Trans. Knowl. Data Eng. 17(9), 1263–1273 (2005)
Sanderson, M., Croft, B.: Deriving Concept Hierarchies from Text. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206–213 (1999)
Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Dunning, T.: Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics 19(1), 61–74 (1993)
Manning, C., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, pp. 412–420 (1997)
Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks 30(1-7), 107–117 (1998)
Zhang, Z., Chen, J., Li, X.: A Preprocessing Framework and Approach for Web Applications. Journal of Web Engineering 2(3), 175–191 (2004)
Grady, L., Schwartz, E.L.: The Graph Analysis Toolbox: Image Processing on Arbitrary Graphs. Technical Report, Boston University, Boston, MA (2003)
Golub, G.H., Van Loan, C.F.: Matrix Computations. John Hopkins Press (1989)
Chen, J., Li, Q., Jia, W.: Automatically Generating an E-textbook on the Web. World Wide Web 8(4), 377–394 (2005)
Chung, F.: Spectral Graph Theory. American Mathematical Society (1997)
Church, K.W., Hanks, P.: Word Association Norms, Mutual Information and Lexicography. Computational Linguistics 16(1), 22–29 (1990)
Snedecor, G.W., Cochran, W.G.: Statistical Methods, 8th edn. Iowa State University Press (1989)
Grady, L., Schwartz, E.L.: Isoperimetric Graph Partitioning for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Li, Q. (2006). Concept Hierarchy Construction by Combining Spectral Clustering and Subsumption Estimation. In: Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y., Li, X. (eds) Web Information Systems – WISE 2006. WISE 2006. Lecture Notes in Computer Science, vol 4255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11912873_22
Download citation
DOI: https://doi.org/10.1007/11912873_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48105-8
Online ISBN: 978-3-540-48107-2
eBook Packages: Computer ScienceComputer Science (R0)