Abstract
The vector space model (VSM) emerged for almost three decades as one of the most effective approaches in the area of Information Retrieval (IR), thanks to its good compromise between expressivity, effectiveness and simplicity. Although Information Retrieval and Information Filtering (IF) undoubtedly represent two related research areas, the use of VSM in Information Filtering is much less analyzed, especially for content-based recommender systems.
The goal of this work is twofold: first, we investigate the impact of VSM in the area of content-based recommender systems; second, since VSM suffer from well-known problems, such as its high dimensionality and the inability to manage information coming from negative user preferences, we propose techniques able to effectively tackle these drawbacks. Specifically we exploited Random Indexing for dimensionality reduction and the negation operator implemented in the Semantic Vectors open source package to model negative user preferences. Results of an experimental evaluation performed on these enhanced vector space models (eVSM) and the potential applications of these approaches confirm the effectiveness of the model and lead us to further investigate these techniques.
This research was partially funded by MIUR (Ministero dell’Universita’ e della Ricerca) under the contract Legge 593/2000, DM 19410 MBLab “Laboratorio di Bioinformatica per la biodiversita’ molecolare” (2007-2010).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hanani, U., Shapira, B., Shoval, P.: Information filtering: Overview of issues, research and systems. User Model. User-Adapt. Interact. 11(3), 203–259 (2001)
Belkin, N., Croft, B.: Information filtering and information retrieval. Comm. ACM 35(12), 29–37 (1992)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education (2006)
Kim, S.-B., Han, K.-S., Rim, H.-C., Myaeng, S.-H.: Some effective techniques for naive bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Widdows, D.: Orthogonal negation in vector spaces for modelling word-meanings and document retrieval. In: ACL 2003, pp. 136–143 (2003)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Raghavan, V.V., Wong, S.K.M.: A critical analysis of vector space model for information retrieval. Journal of the American Society for Information Science 37(5), 279–287 (1986)
Lops, P., de Gemmis, M., Semeraro, G.: Content-based recommender systems: State of the art and trends. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recomender Systems Handbook, pp. 73–105. Springer, Heidelberg (2011)
Cohen, W.W., Hirsh, H.: Joins that generalize: Text classification using WHIRL. In: KDD 1998, pp. 169–173 (1998)
Nouali, O., Blache, P.: A semantic vector space and features-based approach for automatic information filtering. Expert Syst. Appl. 26(2), 171–179 (2004)
Berry, M.W., Drmac, Z., Jessup, E.R.: Matrices, Vector Spaces and Information Retrieval. SIAM Review 41(2), 335–362 (1999)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International SIGIR Conference (1999)
Sahlgren, M.: An introduction to random indexing. In: Methods and Applications of Semantic Indexing Workshop, TKE 2005 (2005)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: KDD 2001, pp. 245–250. ACM, New York (2001)
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. J. Artif. Intell. Res (JAIR) 37, 141–188 (2010)
van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, Cambridge (2004)
Basile, P., Caputo, A., Semeraro, G.: Semantic vectors: an information retrieval scenario. In: Melucci, M., Mizzaro, S., Pasi, G. (eds.) IIR 2010 - Proceedings of the First Italian Information Retrieval Workshop, Padua, Italy, January 27-28, pp. 1–5 (2010)
Musto, C.: Enhanced vector space models for content-based recommender systems. In: Proceedings of the Fourth ACM Conference on Recommender Systems, Ser. RecSys 2010, pp. 361–364. ACM, New York (2010), http://doi.acm.org/10.1145/1864708.1864791
Hecht-Nielsen, R.: Context vectors: general purpose approximate meaning representations self-organized from raw data. In: Computational Intelligence: Imitating Life, pp. 43–56. IEEE Press, Los Alamitos (1994)
Johnson, W., Lindenstauss, J.: Extensions of lipschitz maps into a hilbert space. Contemporary Mathematics (1984)
Lops, P., de Gemmis, M., Semeraro, G., Musto, C., Narducci, F., Bux, M.: A semantic content-based recommender system integrating folksonomies for personalized access. In: Castellano, G., Jain, L.C., Fanelli, A.M. (eds.) Web Personalization in Intelligent Environments. Studies in Computational Intelligence, vol. 229, pp. 27–47. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Musto, C., Semeraro, G., Lops, P., de Gemmis, M. (2011). Random Indexing and Negative User Preferences for Enhancing Content-Based Recommender Systems. In: Huemer, C., Setzer, T. (eds) E-Commerce and Web Technologies. EC-Web 2011. Lecture Notes in Business Information Processing, vol 85. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23014-1_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-23014-1_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23013-4
Online ISBN: 978-3-642-23014-1
eBook Packages: Computer ScienceComputer Science (R0)