Abstract
Since the sina micro-blog is becoming an important place for more and more people to participate in the exchange activity, people showed great enthusiasm on micro-blog research in recent years. However, most of the research is carried out around the micro-blog hot topic and few research on hot micro-blog with higher total number of forwarding and comments for a period of time. To solve this problem, a feature discretization based on Extreme Gradient Boosting (XGBOOST) is proposed to disperse the feature by the prediction path of the tree, so as to improve the running rate and prediction accuracy of the model. Meanwhile, a stochastic forest classification algorithm based on constraint is proposed to solve the imbalance caused by random selection of features in traditional random forest (RF) algorithms. With the help of feature extraction, discretization, and classification, we realizes the hot micro-blog forecast in this paper. Finally, the experimental results show that by using the classification algorithm proposed in this paper, the classification accuracy has been improved to a large extent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gaonkar, S., Li, J., Choudhury, R.R., et al.: Micro-Blog: sharing and querying content through mobile phones and social participation. In: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, Breckenridge, CO, pp. 174–186 (2008)
Zhou, G.: MB-SinglePass: microblog topic detection based on combined similarity. Comput. Sci. 39, 198–202 (2012)
Ridene, Y., Belloir, N., Barbier, F., et al.: A DSML for mobile phone applications testing. In: Proceedings of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, pp. 1–6 (2010)
Wen, H., Li, Z.H.: The research of popular topic mining method based on microblogging text. In: Proceedings of the 4th International Conference on Instrumentation and Measurement, Computer, Communication and Control, pp. 888–892 (2014)
Xu, T., Xu, M., Ding, H.: BBS topic’s hotness forecast based on back-propagation neural network. In: Proceedings of the International Conference on Web Information Systems and Mining, pp. 57–61. IEEE Computer Society (2010)
Liu, R., Guo, W.: HMM-based state prediction for Internet hot topic. In: Proceedings of the IEEE International Conference on Computer Science and Automation Engineering, pp. 157–161 (2011)
Wu, Z., Liao, J., Zhang, L.: Predicting on retweeting of hot topic tweets in microblog. In: Proceedings of the 5th IEEE International Conference on Broadband Network & Multimedia Technology, pp. 119–123 (2014)
Fang, M.Y., Chen, Y.Z., Gao, P., et al.: Topic trend prediction based on wavelet transformation. In: Proceedings of the 11th Web Information System and Application Conference, pp. 157–162 (2014)
Ye, Y.M., Wu, Q.Y., Huang, J.Z.X., et al.: Stratified sampling for feature subspace selection in random forests for high dimensional data. Pattern Recogn. 46, 769–787 (2013)
Chen, T., Guestrin, C.: XGBOOST: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Friedman, J.: Greedy function approximation: a gradient boosting maching. Ann. Stat. 29, 1189–1231 (2001)
Desai, N., Meera Narvekar, P.: Normalization of noisy text data ☆. Procedia Comput. Sci. 45, 127–132 (2015)
Skorkovská, L., Zajíc, Z.: Score normalization methods applied to topic identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 133–140. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_17
Fuchs, C.A., Peres, A.: Quantum-state disturbance versus information gain: uncertainty relations for quantum information. Phys. Rev. A 53(4), 2038–2045 (1996)
Wells, G.L., Yang, Y.R., Smalarz, L.: Eyewitness identification: Bayesian information gain, base-rate effect equivalency curves, and reasonable suspicion. Law Hum Behav. 39, 99–122 (2015)
Liang, J., Shi, Z., Li, D., et al.: Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int. J. Gen. Syst. 35(6), 641–654 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, J. et al. (2018). Research on Hot Micro-blog Forecast Based on XGBOOST and Random Forest. In: Liu, W., Giunchiglia, F., Yang, B. (eds) Knowledge Science, Engineering and Management. KSEM 2018. Lecture Notes in Computer Science(), vol 11062. Springer, Cham. https://doi.org/10.1007/978-3-319-99247-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-99247-1_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99246-4
Online ISBN: 978-3-319-99247-1
eBook Packages: Computer ScienceComputer Science (R0)