Exploiting Big Data for Enhanced Representations in Content-Based Recommender Systems

Narducci, Fedelucio; Musto, Cataldo; Semeraro, Giovanni; Lops, Pasquale; de Gemmis, Marco

doi:10.1007/978-3-642-39878-0_17

Fedelucio Narducci⁹,
Cataldo Musto⁸,
Giovanni Semeraro⁸,
Pasquale Lops⁸ &
…
Marco de Gemmis⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 152))

Included in the following conference series:

International Conference on Electronic Commerce and Web Technologies

1455 Accesses
7 Citations
1 Altmetric

Abstract

The recent explosion of Big Data is offering new chances and challenges to all those platforms that provide personalized access to information sources, such as recommender systems and personalized search engines. In this context, social networks are gaining more and more interests since they represent a perfect source to trigger personalization tasks. Indeed, users naturally leave on these platforms a lot of data about their preferences, feelings, and friendships. Hence, those data are really valuable for addressing the cold start problem of recommender systems. On the other hand, since content shared on social networks is noisy and heterogeneous, information extracted must be hardly processed to build user profiles that can effectively mirror user interests and needs.

In this paper we investigated the effectiveness of external knowledge derived from Wikipedia in representing both documents and user profiles in a recommendation scenario. Specifically, we compared a classical keyword-based representation with two techniques that are able to map unstructured text with Wikipedia pages. The advantage of using this representation is that documents and user profiles become richer, more human-readable, less noisy, and potentially connected to the Linked Open Data (lod) cloud. The goal of our preliminary experimental evaluation was twofolds: 1) to define the representation that best reflects user preferences; 2) to define the representation that provides the best predictive accuracy.

We implemented a news recommender for a preliminary evaluation of our model. We involved more than 50 Facebook and Twitter users and we demonstrated that the encyclopedic-based representation is an effective way for modeling both user profiles and documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 72.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abel, F., Gao, Q., Houben, G.-J., Tao, K.: Analyzing user modeling on twitter for personalized news recommendations. In: Konstan, J.A., Conejo, R., Marzo, J.L., Oliver, N. (eds.) UMAP 2011. LNCS, vol. 6787, pp. 1–12. Springer, Heidelberg (2011)
Chapter Google Scholar
Egozi, O., Markovitch, S., Gabrilovich, E.: Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. 29(2), 8:1–8:34 (2011)
Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9, 1871–1874 (2008)
Google Scholar
Ferragina, P., Scaiella, U.: Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 1625–1628. ACM, New York (2010)
Chapter Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Google Scholar
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research (JAIR) 34, 443–498 (2009)
Google Scholar
Hannon, J., McCarthy, K., O’Mahony, M.P., Smyth, B.: A multi-faceted user model for twitter. In: Masthoff, J., Mobasher, B., Desmarais, M.C., Nkambou, R. (eds.) UMAP 2012. LNCS, vol. 7379, pp. 303–309. Springer, Heidelberg (2012)
Chapter Google Scholar
Hu, X., Zhang, X., Lu, C., Park, E.K., Zhou, X.: Exploiting wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 389–396. ACM, New York (2009)
Chapter Google Scholar
Huang, L., Milne, D., Frank, E., Witten, I.H.: Learning a concept-based document similarity measure. J. Am. Soc. Inf. Sci. Technol. 63(8), 1593–1608 (2012)
Article Google Scholar
Ma, Y., Zeng, Y., Ren, X., Zhong, N.: User interests modeling based on multi-source personal information fusion and semantic reasoning. In: Zhong, N., Callaghan, V., Ghorbani, A.A., Hu, B. (eds.) AMT 2011. LNCS, vol. 6890, pp. 195–205. Springer, Heidelberg (2011)
Chapter Google Scholar
Phelan, O., McCarthy, K., Bennett, M., Smyth, B.: Terms of a feather: Content-based news recommendation and discovery using twitter. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 448–459. Springer, Heidelberg (2011)
Chapter Google Scholar
Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: CHI 2002: CHI 2002 Extended Abstracts on Human Factors in Computing Systems, pp. 830–831. ACM, New York (2002)
Google Scholar
Sorg, P., Cimiano, P.: Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data Knowl. Eng. 74, 26–45 (2012)
Article Google Scholar
Szomszor, M., Alani, H., Cantador, I., O’Hara, K., Shadbolt, N.R.: Semantic modelling of user interests based on cross-folksonomy analysis. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 632–648. Springer, Heidelberg (2008)
Chapter Google Scholar
Yeh, E., Ramage, D., Manning, C.D., Agirre, E., Soroa, A.: Wikiwalk: random walks on wikipedia for semantic relatedness. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-4, Stroudsburg, PA, USA, pp. 41–49. Association for Computational Linguistics (2009)
Google Scholar
Zhang, T., Oles, F.J.: Text categorization based on regularized linear classification methods. Information Retrieval 4, 5–31 (2000)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bari Aldo Moro, Italy
Cataldo Musto, Giovanni Semeraro, Pasquale Lops & Marco de Gemmis
Department of Information Science, Systems Theory, and Communication, University of Milano-Bicocca, Italy
Fedelucio Narducci

Authors

Fedelucio Narducci
View author publications
You can also search for this author in PubMed Google Scholar
Cataldo Musto
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Semeraro
View author publications
You can also search for this author in PubMed Google Scholar
Pasquale Lops
View author publications
You can also search for this author in PubMed Google Scholar
Marco de Gemmis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Software Technology and Interactive Systems, , Business Informatics Group (BIG), Vienna University of Technology, Favoritenstrasse 9 - 11 / 188-3, 1040, Vienna, Austria
Christian Huemer
Dipartimento di Informatica, Università di Bari, Via E. Orabona, 4, I70126, Bari, Italy
Pasquale Lops

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Narducci, F., Musto, C., Semeraro, G., Lops, P., de Gemmis, M. (2013). Exploiting Big Data for Enhanced Representations in Content-Based Recommender Systems. In: Huemer, C., Lops, P. (eds) E-Commerce and Web Technologies. EC-Web 2013. Lecture Notes in Business Information Processing, vol 152. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39878-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-39878-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39877-3
Online ISBN: 978-3-642-39878-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics