Abstract
This paper firstly utilizes the ontology such as WordNet to build the semantic structures of text documents, and then enhance the semantic similarity among them. Because the correlations between documents make them lie on or close to a smooth low-dimensional manifold so that documents can be well characterized by a manifold within the space of documents, we calculate the similarity between any two semantically structured documents with respect to the intrinsic global manifold structure. This idea has been validated in the conducted text categorization experiments on patent documents.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)
He, J., Tan, A.H., Tan, C.L.: On Machine Learning Methods for Chinese Document Categorization. Applied Intelligence 18, 613–617 (2003)
Bell, D.A., Guan, J.W., Bi, Y.: On Combining Classifier Mass Functions for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 17, 1307–1319 (2005)
Aggarwal, C.C., Gates, S.C., Yu, P.S.: On Using Partial Supervision for Text Categorization. IEEE Transactions on Knowledge and Data Engineering 16, 245–255 (2004)
Lam, W., Han, Y.Q.: Automatic textual document categorization based on generalized instance sets and a matamodel. IEEE Transactions on Pattern Analysis and Machine Intelligence 25, 628–633 (2003)
Sun, A., Lim, E.P., Ng, W.K., Srivastava, A.: Blocking Reduction Strategies in Hierarchical Text Classification. IEEE Transactions on Knowledge and Data Engineering 16, 1305–1308 (2004)
Roweis, S.T., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290, 2323–2326 (2000)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290, 2319–2323 (2000)
de Ridder, D., Kouropteva, O., Okun, O., et al.: Supervised locally linear embedding. LNCS (LNAI), vol. 2714, pp. 333–341 (2003)
Geng, X., Zhan, D.C., Zhou, Z.H.: Supervised Nonlinear Dimensionality Reduction for Visualization and Classification. IEEE Transactions on Systems, Man and Cybernetics 35, 1098–1107 (2005)
Zhang, D., Chen, X., Lee, W.: Categorization and supervised machine learning: Text classification with kernels on the multinomial manifold. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR 2005, Brazil, pp. 266–273 (2005)
Wen, G.H.: Rotating dynamics for computational creativity. National Defence Industry Press book, Beijing (2005)
Ganesan, P., Molina, H.G., Widom, J.: Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems 21, 64–93 (2003)
Yuan, S.T., Sun, J.: Ontology-Based Structured Cosine Similarity in Document Summarization: With Applications to Mobile Audio-Based Knowledge Management. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 35, 1028–1040 (2005)
Oleshchuk, V., Pedersen, A.: Ontology Based Semantic Similarity Comparison of Documents. In: 14th International Workshop on Database and Expert Systems Applications, p. 735 (2003)
Rodriguez, M.A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Transactions on Knowledge and Data Engineering 15, 442–456 (2003)
Li, Y., Bandar, Z.A., Mclean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering 15, 871–882 (2003)
Navigli, R., Velardi, P., Gangemi, A.: Ontology learning and its application to automated terminology translation. IEEE Intelligent Systems 18, 22–31 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wen, G., Jiang, L., Shadbolt, N.R. (2006). Ontology-Based Similarity Between Text Documents on Manifold. In: Mizoguchi, R., Shi, Z., Giunchiglia, F. (eds) The Semantic Web – ASWC 2006. ASWC 2006. Lecture Notes in Computer Science, vol 4185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11836025_12
Download citation
DOI: https://doi.org/10.1007/11836025_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-38329-1
Online ISBN: 978-3-540-38331-4
eBook Packages: Computer ScienceComputer Science (R0)