Abstract
In traditional Information Retrieval (IR) system, the document is represented by the set of words or terms. If the words or terms are regarded as the components of a vector, the model is called the vector space model (VSM). VSM has been widely used in IR systems in recently decades. As the the new words appear dramatically in the Internet era, the amount of computation is very large and it draws back the IR system’s performance. This paper puts forward a new approach according to the relations among the words, concepts and the document by using the concept of the ontology. The new approach has two levels, the Word-Concept (WC) level and the Concept-Document (CD) level. In the WC level, the transition probability matrix is constructed by using the word-word pairs appeared in the same paragraph, and the biggest eigenvector of matrix is computed. The eigenvector reflects the importance of the word to the concept. In the CD level, the distance matrix is constructed by using the distance between words in the concept, and the average variance values of elements is computed. The value determines the relevance of the document to the concept. In order to expand the query sentence, the Personal Information Profile (PIP) of the user is defined by using the query history of the user. It is proofed to be more effective than previous one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web, pp. 107–117 (1998)
Bianchini, M., Gori, M., Scarselli, F.: Inside pagerank. ACM Transactions on Internet Technology 5(1), 92–128 (2005)
Altman, A., Tennenholtz, M.: Ranking systems: the pagerank axioms. In: Proceedings of the 6th ACM Conference on Electronic Commerce, EC 2005, pp. 1–8. ACM, New York (2005)
Wang, H.-m., Rajman, M., Guo, Y., Feng, B.-q.: NewPR-Combining TFIDF with Pagerank. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 932–942. Springer, Heidelberg (2006)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)
Qu, S., Wang, S., Zou, Y.: Improvement of text feature selection method based on tfidf. In: International Seminar on Future Information Technology and Management Engineering, FITME 2008, pp. 79–81 (2008)
Kayed, A., Colomb, R.M.: Using ontologies to index conceptual structures for tendering automation. In: Proceedings of the 13th Australasian Database Conference, ADC 2002, vol. 5, pp. 95–101. Australian Computer Society, Inc., Darlinghurst (2002)
Kara, S., Alan, O., Sabuncu, O., Akpnar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Information Systems 37(4), 294–305 (2012)
Kang, X., Li, D., Wang, S.: Research on domain ontology in different granulations based on concept lattice. Knowledge-Based Systems 27, 152–161 (2012)
Myoung-Cheol Kima, K.S.C.: A comparison of collocation-based similarity measures in query expansion. Information Processing and Management 35(1), 19–30 (1999)
Efthimiadis, E.N.: Query expansion. Annual Review of Information Science and Technology 31, 121–187 (1996)
Cronen-townsend, S., Zhou, Y., Croft, W.B.: A framework for selective query expansion. In: Proceedings of Thirteenth International Conference on Information and Knowledge Management, pp. 236–237. Press (2004)
Wu, C.C., Chou, C.H., Chang, F.: A machine-learning approach for analyzing document layout structures with two reading orders. Pattern Recognition 41(10), 3200–3213 (2008)
Gardiner, C.: Stochastic Methods: A Handbook for the Natural and Social Sciences. Springer Series in Synergetics. Springer (2009)
Mian, R., Khan, S.: Markov Chain. VDM Verlag Dr Muller (2010)
Serre, D.: Matrices: theory and applications. Graduate texts in mathematics. Springer (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, H., Guo, Y., Shi, X. (2012). WCD-New Approach Combining Words, Concepts and Documents Based on Ontology. In: Li, Z., Li, X., Liu, Y., Cai, Z. (eds) Computational Intelligence and Intelligent Systems. ISICA 2012. Communications in Computer and Information Science, vol 316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34289-9_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-34289-9_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34288-2
Online ISBN: 978-3-642-34289-9
eBook Packages: Computer ScienceComputer Science (R0)