Skip to main content

WCD-New Approach Combining Words, Concepts and Documents Based on Ontology

  • Conference paper
Computational Intelligence and Intelligent Systems (ISICA 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 316))

Included in the following conference series:

  • 2206 Accesses

Abstract

In traditional Information Retrieval (IR) system, the document is represented by the set of words or terms. If the words or terms are regarded as the components of a vector, the model is called the vector space model (VSM). VSM has been widely used in IR systems in recently decades. As the the new words appear dramatically in the Internet era, the amount of computation is very large and it draws back the IR system’s performance. This paper puts forward a new approach according to the relations among the words, concepts and the document by using the concept of the ontology. The new approach has two levels, the Word-Concept (WC) level and the Concept-Document (CD) level. In the WC level, the transition probability matrix is constructed by using the word-word pairs appeared in the same paragraph, and the biggest eigenvector of matrix is computed. The eigenvector reflects the importance of the word to the concept. In the CD level, the distance matrix is constructed by using the distance between words in the concept, and the average variance values of elements is computed. The value determines the relevance of the document to the concept. In order to expand the query sentence, the Personal Information Profile (PIP) of the user is defined by using the query history of the user. It is proofed to be more effective than previous one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: Proceedings of the 7th International Conference on World Wide Web, pp. 107–117 (1998)

    Google Scholar 

  2. Bianchini, M., Gori, M., Scarselli, F.: Inside pagerank. ACM Transactions on Internet Technology 5(1), 92–128 (2005)

    Article  Google Scholar 

  3. Altman, A., Tennenholtz, M.: Ranking systems: the pagerank axioms. In: Proceedings of the 6th ACM Conference on Electronic Commerce, EC 2005, pp. 1–8. ACM, New York (2005)

    Google Scholar 

  4. Wang, H.-m., Rajman, M., Guo, Y., Feng, B.-q.: NewPR-Combining TFIDF with Pagerank. In: Kollias, S.D., Stafylopatis, A., Duch, W., Oja, E. (eds.) ICANN 2006. LNCS, vol. 4132, pp. 932–942. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28, 11–21 (1972)

    Article  Google Scholar 

  6. Qu, S., Wang, S., Zou, Y.: Improvement of text feature selection method based on tfidf. In: International Seminar on Future Information Technology and Management Engineering, FITME 2008, pp. 79–81 (2008)

    Google Scholar 

  7. Kayed, A., Colomb, R.M.: Using ontologies to index conceptual structures for tendering automation. In: Proceedings of the 13th Australasian Database Conference, ADC 2002, vol. 5, pp. 95–101. Australian Computer Society, Inc., Darlinghurst (2002)

    Google Scholar 

  8. Kara, S., Alan, O., Sabuncu, O., Akpnar, S., Cicekli, N.K., Alpaslan, F.N.: An ontology-based retrieval system using semantic indexing. Information Systems 37(4), 294–305 (2012)

    Article  Google Scholar 

  9. Kang, X., Li, D., Wang, S.: Research on domain ontology in different granulations based on concept lattice. Knowledge-Based Systems 27, 152–161 (2012)

    Article  Google Scholar 

  10. Myoung-Cheol Kima, K.S.C.: A comparison of collocation-based similarity measures in query expansion. Information Processing and Management 35(1), 19–30 (1999)

    Google Scholar 

  11. Efthimiadis, E.N.: Query expansion. Annual Review of Information Science and Technology 31, 121–187 (1996)

    Google Scholar 

  12. Cronen-townsend, S., Zhou, Y., Croft, W.B.: A framework for selective query expansion. In: Proceedings of Thirteenth International Conference on Information and Knowledge Management, pp. 236–237. Press (2004)

    Google Scholar 

  13. Wu, C.C., Chou, C.H., Chang, F.: A machine-learning approach for analyzing document layout structures with two reading orders. Pattern Recognition 41(10), 3200–3213 (2008)

    Article  MATH  Google Scholar 

  14. Gardiner, C.: Stochastic Methods: A Handbook for the Natural and Social Sciences. Springer Series in Synergetics. Springer (2009)

    Google Scholar 

  15. Mian, R., Khan, S.: Markov Chain. VDM Verlag Dr Muller (2010)

    Google Scholar 

  16. Serre, D.: Matrices: theory and applications. Graduate texts in mathematics. Springer (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, H., Guo, Y., Shi, X. (2012). WCD-New Approach Combining Words, Concepts and Documents Based on Ontology. In: Li, Z., Li, X., Liu, Y., Cai, Z. (eds) Computational Intelligence and Intelligent Systems. ISICA 2012. Communications in Computer and Information Science, vol 316. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34289-9_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34289-9_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34288-2

  • Online ISBN: 978-3-642-34289-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics