Abstract
This paper introduces a novel method for mining user profiles (e.g., age, gender) using the query log in a search engine. The proposed method combines the advantage of the neural network for representation learning and that of the topic model for interpretability. This is achieved by plugging a parametric Gaussian mixture distribution layer into the neural network. Specifically, it first uses the popular convolution neural network to model the query content, generating a dense vector presentation for each query. Based on this representation, it infers the searching topic of the query, by fitting a Gaussian mixture distribution, and obtains the query topic distribution. Then, it deduces the distribution of topics that the user cares about by aggregating the query topic distribution of all the queries of the user. Profile prediction is performed based on the resulting user topic distribution. We evaluated this framework using a real search engine data set, which contains 40,000 labeled users with age, gender, and education level profiles. The experiment results demonstrated the effectiveness of our proposed model.
The authors wish to thank the anonymous reviewers for their helpful comments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
We refer to the latent multinomial variables in the GMM as topics, so as to exploit query-oriented intuitions, but we make no epidemiological claims regarding these latent variables beyond their utility in representing probability distributions on queries.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Chirita, P.A., Firan, C.S., Nejdl, W.: Personalized query expansion for the web. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 7–14. ACM (2007)
Golemati, M., Katifori, A., Vassilakis, C., Lepouras, G., Halatsis, C.: Creating an ontology for the user profile: method and applications. In: Proceedings of the First RCIS Conference, No. 2007, pp. 407–412 (2007)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Larose, D.T.: K-nearest neighbor algorithm. In: Discovering Knowledge in Data: An Introduction to Data Mining, pp. 90–106 (2005)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, p. 3 (2010)
Porteous, I., Newman, D., Ihler, A., Asuncion, A., Smyth, P., Welling, M.: Fast collapsed GIBBS sampling for latent dirichlet allocation. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 569–577. ACM (2008)
Pretschner, A., Gauch, S.: Ontology based personalized search. In: 11th IEEE International Conference on Tools with Artificial Intelligence, 1999. Proceedings, pp. 391–398. IEEE (1999)
Shen, X., Tan, B., Zhai, C.: Implicit user Modeling for Personalized Search. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 824–831. ACM (2005)
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: VS@ HLT-NAACL, pp. 192–200 (2015)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567 (2015)
Tanudjaja, F., Mui, L.: Persona: A contextualized and personalized web search. In: Proceedings of the 35th Annual Hawaii International Conference on System Sciences, 2002. HICSS, pp. 1232–1240. IEEE (2002)
Thomas, C.G., Fischer, G.: Using agents to personalize the web. In: Proceedings of the 2nd International Conference on Intelligent User Interfaces, pp. 53–60. ACM (1997)
Tieleman, T., Hinton, G.: Divide the gradient by a running average of its recent magnitude. Lecture 6.5-rmsprop: COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Peng, M., Zhao, J., Zhang, Q., Gui, T., Huang, X., Fu, J. (2019). Mining User Profiles from Query Log. In: Zhang, Q., Liao, X., Ren, Z. (eds) Information Retrieval. CCIR 2019. Lecture Notes in Computer Science(), vol 11772. Springer, Cham. https://doi.org/10.1007/978-3-030-31624-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-31624-2_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31623-5
Online ISBN: 978-3-030-31624-2
eBook Packages: Computer ScienceComputer Science (R0)