Abstract
The dramatic proliferation of information on the web and the tremendous growth in the number of files published and uploaded online each day have led to the appearance of new words in the Internet. Due to the difficulty of reaching the meanings of these new terms, which play a central role in retrieving the desired information, it becomes necessary to give more importance to the sites and topics where these new words appear, or rather, to give value to the words that occur frequently with them. For this aim, in this paper, we propose a novel term-term similarity score based on the co-occurrence and closeness of words for retrieval performance improvement. A novel efficiency/effectiveness measure based on the principle of optimal information forager is also proposed in order to assess the quality of the obtained results. Our experiments were performed using the OHSUMED test collection and show significant effectiveness enhancement over the state-of-the-art.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bharat, K., Broder, A.: A technique for measuring the relative size and overlap of public web search engines. Comput. Netw. ISDN Syst. 30(1), 379–388 (1998)
Cambazoglu, B.B., Aykanat, C.: Performance of query processing implementations in ranking-based text retrieval systems using inverted indices. Inf. Process. Manage. 42(4), 875–898 (2006)
Cambazoglu, B.B., Baeza-Yates, R.: Scalability Challenges in Web Search Engines. In: Melucci, M., Baeza-Yates, R. (eds.) Advanced Topics in Information Retrieval. The Information Retrieval Series, vol. 33, pp. 27–50. Springer, Heidelberg (2011)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1–50 (2012)
Chen, Q., Li, M., Zhou, M.: Improving query spelling correction using web search results. In: EMNLP-CoNLL 2007: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 181–189. ACL, Stroudsburg (2007)
Dix, A., Howes, A., Payne, S.: Post-web cognition: evolving knowledge strategies for global information environments. Int. J. Web Eng. Technol. 1(1), 112–126 (2003)
Dominich, S.: The Modern Algebra of Information Retrieval. Springer, Heidelberg (2008)
Eisenstein, J., OConnor, B., Smith, N.A., Xing, E.P.: Mapping the geographical diffusion of new words. In: NIPS 2012: Workshop on Social Network and Social Media Analysis: Methods, Models and Applications (2012)
Frøkjær, E., Hertzum, M., Hornbæk, K.: Measuring usability: are effectiveness, efficiency, and satisfaction really correlated? In: CHI 2000: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 345–352. ACM, New York (2000)
Khennak, I.: Classification non supervisée floue des termes basée sur la proximité pour les systèmes de recherche d’information. In: CORIA 2013: Proceedings of the 10th French Information Retrieval Conference, pp. 341–346. Unine, Neuchâtel (2013)
Khennak, I., Drias, H.: Term proximity and data mining techniques for information retrieval systems. In: Rocha, Á., Correia, A.M., Wilson, T., Stroetmann, K.A. (eds.) Advances in Information Systems and Technologies. AISC, vol. 206, pp. 477–486. Springer, Heidelberg (2013)
Ntoulas, A., Cho, J., C. Olston.: What’s new on the web?: the evolution of the web from a search engine perspective. In: WWW 2004: Proceedings of the 13th International Conference on World Wide Web, pp. 1–12. ACM, New York (2004)
Pirolli, P.: Information Foraging Theory: Adaptive Interaction with Information. Oxford University Press, Oxford (2007)
Pirolli, P., Card, S.: Information foraging. Psychol. Rev. 106(4), 643–675 (1999)
Ranganathan, P.: From microprocessors to nanostores: rethinking data centric systems. IEEE Comput. 44(1), 39–48 (2011)
Ramos, C., Augusto, J.C., Shapiro, D.: Ambient intelligence the next step for artificial intelligence. IEEE Intell. Syst. 23(2), 15–18 (2008)
Robertson, S.E., Jones, K.S.: Relevance weighting of search terms. J. Am. Soc. Inform. Sci. 27(3), 129–146 (1976)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends Inf. Retrieval 3(4), 333–389 (2009)
Subramaniam, L.V., Roy, S., Faruquie, T.A., Negi, S.: A survey of types of text noise and techniques to handle noisy text. In: AND 2009: Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data, pp. 115–122. ACM, New York (2009)
Sun, H.M.: A study of the features of internet english from the linguistic perspective. Studies in Literature and Language 1(7), 9–103 (2010)
Williams, H.E., Zobel, J.: Searchable words on the web. Int. J. Digit. Libr. 5(2), 99–105 (2005)
Zhu, Y., Zhong, N., Xiong, Y.: Data explosion, data nature and dataology. In: Zhong, N., Li, K., Lu, S., Chen, L. (eds.) BI 2009. LNCS, vol. 5819, pp. 147–158. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Khennak, I., Drias, H., Mosteghanemi, H. (2015). A Novel Term-Term Similarity Score Based Information Foraging Assessment. In: Giaffreda, R., et al. Internet of Things. User-Centric IoT. IoT360 2014. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 150. Springer, Cham. https://doi.org/10.1007/978-3-319-19656-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-19656-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19655-8
Online ISBN: 978-3-319-19656-5
eBook Packages: Computer ScienceComputer Science (R0)