A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings

Tian Nie, Yi Ding, Chen Zhao, Youchao Lin, Takehito Utsuro
Copyright: © 2018 |Volume: 6 |Issue: 3 |Pages: 12
ISSN: 2166-7160|EISSN: 2166-7179|EISBN13: 9781522546856|DOI: 10.4018/IJSI.2018070105
Cite Article Cite Article

MLA

Nie, Tian, et al. "A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings." IJSI vol.6, no.3 2018: pp.67-78. http://doi.org/10.4018/IJSI.2018070105

APA

Nie, T., Ding, Y., Zhao, C., Lin, Y., & Utsuro, T. (2018). A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings. International Journal of Software Innovation (IJSI), 6(3), 67-78. http://doi.org/10.4018/IJSI.2018070105

Chicago

Nie, Tian, et al. "A Method of Subtopic Classification of Search Engine Suggests by Integrating a Topic Model and Word Embeddings," International Journal of Software Innovation (IJSI) 6, no.3: 67-78. http://doi.org/10.4018/IJSI.2018070105

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

The background of this article is the issue of how to overview the knowledge of a given query keyword. Especially, the authors focus on concerns of those who search for web pages with a given query keyword. The Web search information needs of a given query keyword is collected through search engine suggests. Given a query keyword, the authors collect up to around 1,000 suggests, while many of them are redundant. They classify redundant search engine suggests based on a topic model. However, one limitation of the topic model based classification of search engine suggests is that the granularity of the topics, i.e., the clusters of search engine suggests, is too coarse. In order to overcome the problem of the coarse-grained classification of search engine suggests, this article further applies the word embedding technique to the webpages used during the training of the topic model, in addition to the text data of the whole Japanese version of Wikipedia. Then, the authors examine the word embedding based similarity between search engines suggests and further classify search engine suggests within a single topic into finer-grained subtopics based on the similarity of word embeddings. Evaluation results prove that the proposed approach performs well in the task of subtopic classification of search engine suggests.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.