Abstract
In this paper, we present a framework for a feedback process to implement a highly accurate document retrieval system. In the system, a document vector space is created dynamically to implement retrieval processing. The retrieval accuracy of the system depends on the vector space. When the vector space is created based on a specific purpose and interest of a user, highly accurate retrieval results can be obtained. In this paper, we present a method for analyzing and personalizing the vector space according to the purposes and interests of users. In order to optimize the document vector space, we defined and implemented functions for the operations of adding, deleting and weighting the terms that were used to create the vector space. By exploiting effectively and dynamically the classified-document information related to the queries, our methods allow users to retrieve relevant documents for their interests and purposes. Even if the search results of the initial retrieval space are not appropriate, by applying the proposed feedback operations, our proposed method effectively improves the search results. We also implemented an experimental search system for semantic document retrieval. Several experimental results including comparisons of our method with the traditional relevance feedback method is presented to clarify how retrieval accuracy was improved by the feedback process and how accurately documents that satisfied the purpose and interests of users were extracted.
Similar content being viewed by others
References
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MA
Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37(4):573–595
Bilenko M, White RW (2008) Mining the search trails of surfing crowds: identifying relevant websites from user activity. In: Proceedings of the 17th International World Wide Web Conference WWW 2008, Beijing, China, 21–25 April 2008, pp 51–60
Buckley C, Salton G, Allan J, Singhal A (1994) Automatic query expansion using SMART. In: Proceedings of the Third Text Retrieval Conference
Chen X, Kiyoki Y (2005) A dynamic retrieval space creation method for semantic information retrieval. In: Kiyoki Y, Wangler B, Jaakkola H, Kangassalo H (eds) Information modelling and knowledge bases, vol XVI. IOS Press, Amsterdam, The Netherlands, pp 46–63
Chen X, Takano K, Masuda K, Kiyoki Y (2007) A semantic space creation method with an adaptive axis adjustment mechanism for media data retrieval. In: Proceedings of the 17th European Japanese Conference on Information Modelling and Knowledge Bases (EJC 2007), Pori, Finland, June 2007, pp 40–58
Chirita P-A, Firan CS, Nejdl W (2007) Personalized query expansion for the web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Amsterdam, The Netherlands, pp 7–14
Deerwester S, Dumais S, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Dumais S, Cutrell E, Cadiz JJ, Jancke G, Sarin R, Robbins DC (2003) Stuff I’ve seen: a system for personal information retrieval and re-use. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 72–79
Dupret G, Piwowarski B (2008) A user browsing model to predict search engine click data from past observations. In: Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp 331–338
Harman D (1988) Towards interactive query expansion. In: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pp 321–331
Harman D (1992) Relevance feedback and other query, modification techniques. In: Information retrieval, data, structures and algorithms, pp 231–263
Ide E (1971) New experiments in relevance feedback, the SMART retrieval system: experiments in automatic document processing, chapter 16. Prentice Hall, Englewood Cliffs
Kelly D, Fu X (2006) Elicitation of term relevance feedback: an investigation of term source and context. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval
Khan L, McLeod D, Hovy E (2004) Retrieval effectiveness of an ontology-based model for information selection. VLDB J – Int J Very Large Data Bases 13(1):71–85
Last M, Kandel A, Maimon O, Eberbach E (2001) Anytime algorithm for feature selection. Rough Sets and Current Trends in Computing, Lecture Notes in Computer Science, pp 532–539
Lavrenko V, Croft WB (2001) Relevance-based language models. In: Proceedings of 24th ACM SIGIR conference on Research and Development in Information Retrieval, pp 120–127
Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. SIGIR, pp 298–315
Limbu DK, Connor A, Pears R, MacDonell S (2006) Contextual relevance feedback in web information retrieval. In: Proceedings of the 1st International Conference on Information interaction in Context, pp 138–143
Marcialis I, Vita ED (2008) SEARCHY: an agent to personalize search results. In: Proceedings of the 3rd International Conference on Internet and Web Applications and Services (ICIW 2008), pp 512–517
Nejdl W, Paiu R (2005) Desktop search. How contextual information influences search results and rankings. In: Proceedings of the IRiX Workshop at the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)
Robertson SE, Jones KS (1976) Relevance weighting of search terms. J Am Soc Inf Sci 27:129–146
Rocchio J (1971) Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing, Chapter 14, Prentice-Hall, pp 313–323
Salton G (1971) The SMART retrieval system—experiments in automatic document processing. Prentice-Hall Inc, Englewood Cliffs
Salton G, Buckley C (1990) Improving retrieval performance by relevance feedback. J Am Soc Inf Sci 41(4):288–297
Seo Y-W, Zhang B-T (2000) Learning user’s preferences by analyzing web-browsing behaviors. In: Proceedings of the 4th International Conference on Autonomous Agents, Barcelona, Spain, pp 381–387
Teevan J, Jones W, Bederson BB (2006) Personal information management. Commun ACM 49(1):40–43
Vassilvitskii S, Brill E (2006) Using web-graph distance for relevance feedback in web search. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 6–11 August 2006
Wong SKM, Ziarko W, Wong PCN (1985) Generalized vector space model in information retrieval. SIGIR, pp 18–25
Xu J, Croft WB (1996) Query expansion using local and global document analysis. In: Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval, pp 4–11
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Yu S, Cai D, Wen J-R, Ma W-Y (2003) Improving pseudo-relevance feedback in web information retrieval using web page segmentation. In: Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary, pp 11–18
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Takano, K., Chen, X. & Masuda, K. A framework for a feedback process to analyze and personalize a document vector space in a feature extraction model. Inf Technol Manag 10, 151–176 (2009). https://doi.org/10.1007/s10799-009-0055-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-009-0055-4