Abstract
Public opinion monitoring refers to real-time first story detection (FSD) on a particular Internet news event. It play an important part in finding news propagation tendency. Current opinion monitoring methods are related to text matching. However, it has some limitations such as latent and hidden topic discovery and incorrect relevance ranking of matching results on large-scale data. In this paper, we propose one improved solution to live public opinion monitoring: stream-based live probabilistic topic computing and matching. Our method attempts to address the disadvantages such as semantic matching and low efficiency on timely big data. Topic real-time computing with stream processing paradigm and topic matching with query-time document and field boosting are proposed to make substantial improvements. Finally, our experimental evaluation on topic computing and matching using crawled historical Netease news records shows the high effectiveness and efficiency of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data. Mach. Learn. 94(2), 233–259 (2014)
De Francisci Morales, G., Gionis, A., Sozio, M.: Social content matching in mapreduce. Proc. VLDB Endow. 4(7), 460–469 (2011)
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)
Kononenko, O., Baysal, O., Holmes, R., Godfrey, M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 328–331. ACM (2014)
Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)
Ma, K., Dong, F., Yang, B.: Large-scale schema-free data deduplication approach with adaptive sliding window using mapreduce. Comput. J. 58(11), 3187–3201 (2015)
Ma, K., Tang, Z., Zhong, J., Yang, B.: LPSMon: a stream-based live public sentiment monitoring system. Lect. Notes Comput. Sci. 9659, 534–536 (2016)
Ma, K., Yang, B.: Stream-based live data replication approach of in-memory cache. Concurrency Comput. Pract. Exp. 29(11), 1–9 (2017)
Ma, K., Yang, B., Yang, Z., Yu, Z.: Segment access-aware dynamic semantic cache in cloud computing environment. J. Parallel Distrib. Comput., 1–10 (2017)
McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Cherry Hill (2010)
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)
Shahi, D.: Apache solr: an introduction. In: Shahi, D. (ed.) Apache Solr, pp. 1–9. Springer, Heidelberg (2015)
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)
Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13 (2008)
Zhai, Z., Xu, H., Kang, B., Jia, P.: Exploiting effective features for Chinese sentiment classification. Expert Syst. Appl. 38(8), 9139–9146 (2011)
Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and SVM perf. Expert Syst. Appl. 42(4), 1857–1863 (2015)
Zhang, M., Chakrabarti, K.: InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013)
Acknowledgments
This work was supported by the Science and Technology Program of University of Jinan (XKY1734), the Open Project Joint Funding of Information Science and Engineering School of Linyi University and Discipline Team of Intelligent Logistics and Information Engineering (LDXX2017KF155), the Shandong Provincial Natural Science Foundation (ZR201702170261), the Shandong Provincial Key R&D Program (2015GGX106007 & 2016ZDJS01A12), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LN13).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ma, K., Yu, Z., Ji, K., Yang, B. (2017). Stream-Based Live Probabilistic Topic Computing and Matching. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-65482-9_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65481-2
Online ISBN: 978-3-319-65482-9
eBook Packages: Computer ScienceComputer Science (R0)