Abstract
Baidu, the most popular Chinese search engine, monitors what their users are currently searching and provides top 50 search terms, called trending search terms, in descending order of popularity ranking. The paper focused on predicting the popularity ranking trends of this top trending search terms in Baidu. Based on the data analysis, two issues were identified that could affect accuracy of using the ranking data for predicting the popularity of trending searched terms. Firstly, all trending terms are disappeared from the top 50 terms list when the popularity is getting lower. However, there are several trending terms that reappear to the top 50 terms list after they disappeared. New distinct search terms can be differentiated from reappearances of old terms so we proposed the term distinction model by using the related news articles of a trending search term provided by Baidu. Secondly, it is necessary to handle the missing value when the term is out of the trending term list. To achieve the goal of this paper, we collected top 50 trending search terms from Baidu engine and its related news articles hourly for 6 months (from 1st March 2013 to 31th August 2013). Based on the proposed model, we found that the optimal disappearing interval can be 9 h, and using rank 51 for the missing values was the most successful. We conducted evaluations by using 3 months data (from 1st September 2013 to 30th November 2013), and four machine learning techniques where compared to evaluate the most accurate for predicting the popularity rank of trending search terms. Feed Forward Neural Network was achieved 78.81 % the most highest prediction accuracy, and achieved 85.55 % accuracy in ±3 error range.




Similar content being viewed by others
References
Ahmed NK, Atiya AF, Gayar NE, El-Shishiny H (2010) An empirical comparison of machine learning models for time series forecasting. Econom Rev 29(5–6):594–621
Atsalakis GS, Valavanis KP (2009) Surveying stock market forecasting techniques-part II: soft computing methods. Expert Syst Appl 36(3):5932–5941
Box GE, Jenkins GM, Reinsel GC (2013) Time series analysis: forecasting and control. Wiley, New York
Cha M, Haddadi H, Benevenuto F, Gummadi PK (2010) Measuring user influence in twitter: the million follower fallacy. ICWSM 10:10–17
Green KC, Armstrong JS, Graefe A (2007) Methods to elicit forecasts from groups: delphi and prediction markets compared. Foresight (8)
Han SC, Chung H (2012) Social issue gives you an opportunity: discovering the personalised relevance of social issues. In: Richards D, Kang B (eds) Knowledge management and acquisition for intelligent systems, vol 7457. Springer, Berlin, Heidelberg, pp 272–284
Han SC, Chung H, Kang BH (2012) It is time to prepare for the future: forecasting social trends. In: Kim T-h, Ma J, Fang W-c, Zhang Y, Cuzzocrea A (eds) Computer applications for database, education, and ubiquitous computing, vol 352. Springer, Berlin, Heidelberg, pp 325–331
Han, SC, Chung H, Kim DH, Lee S, Kang BH (2014) Twitter trending topics meaning disambiguation. In: Kim Y, Kang B, Richards D (eds) Knowledge management and acquisition for smart systems and services, vol 8863. Springer International Publishing, pp 126–137
Huang J, Thornton KM, Efthimiadis EN (2010) Conversational tagging in twitter. In: Proceedings of the 21st ACM conference on hypertext and hypermedia. ACM, New York, pp 173–178
Inouye D, Kalita JK (2011) Comparing twitter summarization algorithms for multiple post summaries. In: Privacy, security, risk and trust (passat), 2011 IEEE third international conference on and 2011 IEEE third international conference on social computing (socialcom), pp 298–306. IEEE
Lean JL, Picone JM, Emmert JT (2009) Quantitative forecasting of near-term solar activity and upper atmospheric density. J Geophys Res 114(A07301):1–10
Lee C, Kwak H, Park H, Moon S (2010) Finding influentials based on the temporal order of information adoption in twitter. In: Proceedings of the 19th international conference on world wide web. ACM, New York, pp 1137–1138
Lee K, Palsetia D, Narayanan R, Patwary MMA, Agrawal A, Choudhary A (2011) Twitter trending topic classification. In: Data mining workshops (ICDMW), 2011 IEEE 11th international conference on. IEEE, pp 251–258
Li R, Lei KH, Khadiwala R, Chang KC (2012) Tedas: a twitter-based event detection and analysis system. In: Data engineering (ICDE), 2012 IEEE 28th international conference on. IEEE, pp 1273–1276
Pelat C, Turbelin C, Bar-Hen A, Flahault A, Valleron AJ (2009) More diseases tracked by using google trends. Emerg Infect Dis 15(8):1327
Rech J (2007) Discovering trends in software engineering with google trend. ACM SIGSOFT Softw Eng Notes 32(2):1–2
Vosen S, Schmidt T (2011) Forecasting private consumption: survey-based indicators vs. google trends. J Forecast 30(6):565–578
Wang C, Zhang M, Ru L, Ma S (2008) Automatic online news topic ranking using media focus and user attention based on aging theory. In: Proceedings of the 17th ACM conference on Information and knowledge management. ACM, New York, pp 1033–1042
Yu L, Asur S, Huberman BA (2011) What trends in chinese social media. arXiv preprint arXiv:1107.3522
Acknowledgments
This study was supported by Asian Office of Aerospace Research and Development (AOARD).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Han, S.C., Liang, Y., Chung, H. et al. Chinese trending search terms popularity rank prediction. Inf Technol Manag 17, 133–139 (2016). https://doi.org/10.1007/s10799-015-0238-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10799-015-0238-0