Skip to main content
Log in

A framework for a feedback process to analyze and personalize a document vector space in a feature extraction model

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

In this paper, we present a framework for a feedback process to implement a highly accurate document retrieval system. In the system, a document vector space is created dynamically to implement retrieval processing. The retrieval accuracy of the system depends on the vector space. When the vector space is created based on a specific purpose and interest of a user, highly accurate retrieval results can be obtained. In this paper, we present a method for analyzing and personalizing the vector space according to the purposes and interests of users. In order to optimize the document vector space, we defined and implemented functions for the operations of adding, deleting and weighting the terms that were used to create the vector space. By exploiting effectively and dynamically the classified-document information related to the queries, our methods allow users to retrieve relevant documents for their interests and purposes. Even if the search results of the initial retrieval space are not appropriate, by applying the proposed feedback operations, our proposed method effectively improves the search results. We also implemented an experimental search system for semantic document retrieval. Several experimental results including comparisons of our method with the traditional relevance feedback method is presented to clarify how retrieval accuracy was improved by the feedback process and how accurately documents that satisfied the purpose and interests of users were extracted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  1. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA

    Google Scholar 

  2. Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595

    Article  Google Scholar 

  3. Bilenko M, White RW (2008) Mining the search trails of surfing crowds: identifying relevant websites from user activity. In: Proceedings of the 17th International World Wide Web Conference WWW 2008, Beijing, China, 21–25 April 2008, pp 51–60

  4. Buckley C, Salton G, Allan J, Singhal A (1994) Automatic query expansion using SMART. In: Proceedings of the Third Text Retrieval Conference

  5. Chen X, Kiyoki Y (2005) A dynamic retrieval space creation method for semantic information retrieval. In: Kiyoki Y, Wangler B, Jaakkola H, Kangassalo H (eds) Information modelling and knowledge bases, vol XVI. IOS Press, Amsterdam, The Netherlands, pp 46–63

  6. Chen X, Takano K, Masuda K, Kiyoki Y (2007) A semantic space creation method with an adaptive axis adjustment mechanism for media data retrieval. In: Proceedings of the 17th European Japanese Conference on Information Modelling and Knowledge Bases (EJC 2007), Pori, Finland, June 2007, pp 40–58

  7. Chirita P-A, Firan CS, Nejdl W (2007) Personalized query expansion for the web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp 7–14

  8. Deerwester S, Dumais S, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  9. Dumais S, Cutrell E, Cadiz JJ, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 72–79

  10. Dupret G, Piwowarski B (2008) A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp 331–338

  11. Harman D (1988) Towards interactive query expansion. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 321–331

  12. Harman D (1992) Relevance feedback and other query, modification techniques. In: Information retrieval, data, structures and algorithms, pp 231–263

  13. Ide E (1971) New experiments in relevance feedback, the SMART retrieval system: experiments in automatic document processing, chapter 16. Prentice Hall, Englewood Cliffs

    Google Scholar 

  14. Kelly D, Fu X (2006) Elicitation of term relevance feedback: an investigation of term source and context. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval

  15. Khan L, McLeod D, Hovy E (2004) Retrieval effectiveness of an ontology-based model for information selection. VLDB J – Int J Very Large Data Bases 13(1):71–85

    Article  Google Scholar 

  16. Last M, Kandel A, Maimon O, Eberbach E (2001) Anytime algorithm for feature selection. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, pp 532–539

  17. Lavrenko V, Croft WB (2001) Relevance-based language models. In: Proceedings of 24th ACM SIGIR conference on Research and Development in Information Retrieval, pp 120–127

  18. Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. SIGIR, pp 298–315

  19. Limbu DK, Connor A, Pears R, MacDonell S (2006) Contextual relevance feedback in web information retrieval. In: Proceedings of the 1st International Conference on Information interaction in Context, pp 138–143

  20. Marcialis I, Vita ED (2008) SEARCHY: an agent to personalize search results. In: Proceedings of the 3rd International Conference on Internet and Web Applications and Services (ICIW 2008), pp 512–517

  21. Nejdl W, Paiu R (2005) Desktop search. How contextual information influences search results and rankings. In: Proceedings of the IRiX Workshop at the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)

  22. NTCIR: http://research.nii.ac.jp/ntcir/

  23. Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146

    Article  Google Scholar 

  24. Rocchio J (1971) Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing, Chapter 14, Prentice-Hall, pp 313–323

  25. Salton G (1971) The SMART retrieval system—experiments in automatic document processing. Prentice-Hall Inc, Englewood Cliffs

    Google Scholar 

  26. Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297

    Article  Google Scholar 

  27. Seo Y-W, Zhang B-T (2000) Learning user’s preferences by analyzing web-browsing behaviors. In: Proceedings of the 4th International Conference on Autonomous Agents, Barcelona, Spain, pp 381–387

  28. Teevan J, Jones W, Bederson BB (2006) Personal information management. Commun ACM 49(1):40–43

    Article  Google Scholar 

  29. Vassilvitskii S, Brill E (2006) Using web-graph distance for relevance feedback in web search. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 6–11 August 2006

  30. Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector space model in information retrieval. SIGIR, pp 18–25

  31. Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 4–11

  32. Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224

    Google Scholar 

  33. Yu S, Cai D, Wen J-R, Ma W-Y (2003) Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary, pp 11–18

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kosuke Takano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takano, K., Chen, X. & Masuda, K. A framework for a feedback process to analyze and personalize a document vector space in a feature extraction model. Inf Technol Manag 10, 151–176 (2009). https://doi.org/10.1007/s10799-009-0055-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-009-0055-4

Keywords

Navigation