Abstract
Social annotations are valuable resources generated by users on the Web, which encode abundant information on user preferences for certain documents. Social annotation-based information retrieval has been studied in recent years for personalizing search results and fulfilling user information needs. However, since social annotations are complicated and associated with users, documents and tags simultaneously, it remains a great challenge to fully capture the potentially useful information for improving retrieval performance. To meet the challenge, we propose a novel method to integrate social annotations into topic models for personalized document retrieval. Our method first reconstructs candidate documents for a given query using social tags of documents to capture user preferences. The reconstructed documents are tailored to user preferences for achieving better performance. We then generalize the latent Dirichlet allocation-based topic models by considering the relationship among users, social tags and documents from social annotations. The modified topic model optimizes the distribution of latent topics of documents for different users to meet user information needs. Experimental results show that our method can significantly outperform the state-of-the-art baseline models for improving the performance of personalized retrieval.
Similar content being viewed by others
References
Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2017) Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput 21(7):1785–1801
Bao S, Xue G, Wu X, Yu Y, Fei B, Su Z (2007) Optimizing web search using social annotations. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 501–510
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 127–134
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Bouadjenek MR, Hacid H, Bouzeghoub M, Vakali A (2013) Using social annotations to enhance document representation for personalized search. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 1049–1052
Chen X, Lu C, An Y, Achananuparp P (2009) Probabilistic models for topic learning from images and captions in online biomedical literatures. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, pp 495–504
Du Q, Xie H, Cai Y, Leung H, Li Q, Min H, Wang FL (2016) Folksonomy-based personalized search by hybrid user profiles in multiple levels. Neurocomputing 204:142–152
Erosheva E, Fienberg S, Lafferty J (2004) Mixed-membership models of scientific publications. Proc Natl Acad Sci 101(suppl 1):5220–5227
Godoy D, Corbellini A (2016) Folksonomy-based recommender systems: a state-of-the-art review. Int J Intell Syst 31(4):314–346
Golder SA, Huberman BA (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of the 15th conference on Uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 289–296
Hotho A, Jäschke R, Schmitz C, Stumme G (2006) Information retrieval in folksonomies: search and ranking. In European semantic web conference. Springer, pp 411–426
Ibrahim OAS, Landa-Silva D (2016) Term frequency with average term occurrences for textual information retrieval. Soft Comput 20(8):3045–3061
Laura L, Me G (2017) Searching the web for illegal content: the anatomy of a semantic search engine. Soft Comput 21(5):1245–1252
Lee S, Masoud M, Balaji J, Belkasim S, Sunderraman R, Moon S-J (2017) A survey of tag-based information retrieval. Int J Multimed Inf Retr 6(2):99–113
Lin Y, Lin H, Jin S, Ye Z (2011) Social annotation in query expansion: a machine learning approach. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 405–414
Liu M, Wan C, Wang L (2002) Content-based audio classification and retrieval using a fuzzy logic system: towards multimedia search engines. Soft Comput 6(5):357–364
Liu Y, Niculescu-Mizil A, Gryc W (2009) Topic-link lda: joint models of topic and author community. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 665–672
Lu C, Hu X, Chen X, Park J-R, He TT, Li Z (2010) The topic-perspective model for social tagging systems. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 683–692
Mahboob VA, Jalali M, Jahan MV, Barekati P (2017) Swallow: resource and tag recommender system based on heat diffusion algorithm in social annotation systems. Comput Intell 33(1):99–118
Martin-Bautista MJ, Kraft DH, Vila MA, Chen J, Cruz J (2002) User profiles and fuzzy logic for web retrieval issues. Soft Comput 6(5):365–372
Newman D, Chemudugunta C, Smyth P (2006) Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 680–686
Pantel P, Gamon M, Alonso O, Haas K (2012) Social annotations: utility and prediction modeling. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 285–294
Ramage D, Heymann P, Manning CD, Garcia-Molina H (2009) Clustering the tagged web. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 54–63
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence. AUAI Press, pp 487–494
Wang Y, Huang Y, Pang X, Lu M, Xie M, Liu J (2013) Supervised rank aggregation based on query similarity for document retrieval. Soft Comput 17(3):421–429
Wu X, Zhang L, Yu Y (2006) Exploring social annotations for the semantic web. In: Proceedings of the 15th international conference on World Wide Web. ACM, pp 417–426
Xie H, Li X, Wang T, Lau RYK, Wong T-L, Chen L, Wang FL, Li Q (2016) Incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy. Inf Process Manag 52(1):61–72
Xu S, Bao S, Fei B, Su Z, Yu Y (2008) Exploring folksonomy for personalized search. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 155–162
Yu H, Zhou B, Deng M, Hu F (2018) Tag recommendation method in folksonomy based on user tagging status. J Intell Inf Syst 50(3):479–500
Zhou D, Bian J, Zheng S, Zha H, Giles CL (2008) Exploring social annotations for information retrieval. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp 715–724
Zhou D, Wu X, Zhao W, Lawless S, Liu J (2017) Query expansion with enriched user profiles for personalized search utilizing folksonomy data. IEEE Trans Knowl Data Eng 29(7):1536–1548
Acknowledgements
This work is partially supported by Grant from the Natural Science Foundation of China (Nos. 61632011, 61572102, 61602078, 61572098), the Ministry of Education Humanities and Social Science Project (No. 19YJCZH199), the China Postdoctoral Science Foundation (No. 2018M641691), the Fundamental Research Funds for the Central Universities (No. DUT18ZD102) and the National Key Research Development Program of China (No. 2016YFB1001103).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and Animal Rights
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Communicated by V. Loia.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Xu, B., Lin, H., Lin, Y. et al. Integrating social annotations into topic models for personalized document retrieval. Soft Comput 24, 1707–1716 (2020). https://doi.org/10.1007/s00500-019-03998-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-03998-1