Skip to main content

Stream-Based Live Probabilistic Topic Computing and Matching

  • Conference paper
  • First Online:
Book cover Algorithms and Architectures for Parallel Processing (ICA3PP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10393))

Abstract

Public opinion monitoring refers to real-time first story detection (FSD) on a particular Internet news event. It play an important part in finding news propagation tendency. Current opinion monitoring methods are related to text matching. However, it has some limitations such as latent and hidden topic discovery and incorrect relevance ranking of matching results on large-scale data. In this paper, we propose one improved solution to live public opinion monitoring: stream-based live probabilistic topic computing and matching. Our method attempts to address the disadvantages such as semantic matching and low efficiency on timely big data. Topic real-time computing with stream processing paradigm and topic matching with query-time document and field boosting are proposed to make substantial improvements. Finally, our experimental evaluation on topic computing and matching using crawled historical Netease news records shows the high effectiveness and efficiency of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bordes, A., Glorot, X., Weston, J., Bengio, Y.: A semantic matching energy function for learning with multi-relational data. Mach. Learn. 94(2), 233–259 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  2. De Francisci Morales, G., Gionis, A., Sozio, M.: Social content matching in mapreduce. Proc. VLDB Endow. 4(7), 460–469 (2011)

    Article  Google Scholar 

  3. Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64. ACM (2016)

    Google Scholar 

  4. Kononenko, O., Baysal, O., Holmes, R., Godfrey, M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, pp. 328–331. ACM (2014)

    Google Scholar 

  5. Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 26 (2011)

    Google Scholar 

  6. Ma, K., Dong, F., Yang, B.: Large-scale schema-free data deduplication approach with adaptive sliding window using mapreduce. Comput. J. 58(11), 3187–3201 (2015)

    Article  Google Scholar 

  7. Ma, K., Tang, Z., Zhong, J., Yang, B.: LPSMon: a stream-based live public sentiment monitoring system. Lect. Notes Comput. Sci. 9659, 534–536 (2016)

    Google Scholar 

  8. Ma, K., Yang, B.: Stream-based live data replication approach of in-memory cache. Concurrency Comput. Pract. Exp. 29(11), 1–9 (2017)

    Article  MathSciNet  Google Scholar 

  9. Ma, K., Yang, B., Yang, Z., Yu, Z.: Segment access-aware dynamic semantic cache in cloud computing environment. J. Parallel Distrib. Comput., 1–10 (2017)

    Google Scholar 

  10. McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Cherry Hill (2010)

    Google Scholar 

  11. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: HLT-NAACL, vol. 13, pp. 746–751 (2013)

    Google Scholar 

  12. Shahi, D.: Apache solr: an introduction. In: Shahi, D. (ed.) Apache Solr, pp. 1–9. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  13. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)

    Google Scholar 

  14. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF-IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. (TOIS) 26(3), 13 (2008)

    Article  Google Scholar 

  15. Zhai, Z., Xu, H., Kang, B., Jia, P.: Exploiting effective features for Chinese sentiment classification. Expert Syst. Appl. 38(8), 9139–9146 (2011)

    Article  Google Scholar 

  16. Zhang, D., Xu, H., Su, Z., Xu, Y.: Chinese comments sentiment classification based on word2vec and SVM perf. Expert Syst. Appl. 42(4), 1857–1863 (2015)

    Article  Google Scholar 

  17. Zhang, M., Chakrabarti, K.: InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 145–156. ACM (2013)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Science and Technology Program of University of Jinan (XKY1734), the Open Project Joint Funding of Information Science and Engineering School of Linyi University and Discipline Team of Intelligent Logistics and Information Engineering (LDXX2017KF155), the Shandong Provincial Natural Science Foundation (ZR201702170261), the Shandong Provincial Key R&D Program (2015GGX106007 & 2016ZDJS01A12), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LN13).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kun Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ma, K., Yu, Z., Ji, K., Yang, B. (2017). Stream-Based Live Probabilistic Topic Computing and Matching. In: Ibrahim, S., Choo, KK., Yan, Z., Pedrycz, W. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2017. Lecture Notes in Computer Science(), vol 10393. Springer, Cham. https://doi.org/10.1007/978-3-319-65482-9_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65482-9_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65481-2

  • Online ISBN: 978-3-319-65482-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics