A Deep Learning-based Ranking Approach for Microblog Retrieval

https://doi.org/10.1016/j.procs.2019.09.190Get rights and content
Under a Creative Commons license
open access

Abstract

Today, Twitter has become one of the most popular micro-blogging service with a large amount of information on various topics produced by millions of users every day. When searching for useful information in Twitter, users need to assess high quality content that meets their needs. However, the study of effective information retrieval in such microblog is still a challenge because there is a large difference in the quality level of relevant tweets returned in the search results for a given query. Therefore, looking for an effective microblog retrieval requires distinguishing the high-quality tweet content among thousands of results. Existing works are based on hand-crafted features (e.g., number of re-tweets, number of followers, etc.) using hard-hand-engineering to indicate the quality of tweets. In this paper, we focus on the problem of ranking tweets, and particularly on retrieving high quality content. We propose a ranking approach based on k-means clustering to distinguish high quality from low quality tweets. The clustering algorithm is based on learning features from deep learning autoencoder and hand-crafted features from tweets’ content and authors’ profiles. We used information gain as a feature importance measure to find the optimal set of features having stronger power in clustering the data. By conducting a pilot feature analysis study, we demonstrate the impact of the learned features to identify tweets’ quality in the clustering process. Our experimental results show that the integration of learned features has shown significant improvement in the quality of clustering and especially on the ranking performance compared to the use of hand-crafted features only.

Keywords

information retrieval
ranking microblogs
k-means clustering
deep learning
information gain

Cited by (0)