Skip to main content

KPCA-WT: An Efficient Framework for High Quality Microblog Extraction in Time-Frequency Domain

  • Conference paper
  • First Online:
Web-Age Information Management (WAIM 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9659))

Included in the following conference series:

  • 1124 Accesses

Abstract

Massive social event relevant messages are generated in online social media, which makes the filtering and screening a great challenge. In order to obtain massages with high quality, a high quality information extraction framework based on kernel principal component analysis and wavelet transformation (KPCA-WT) is proposed. First, based on multiple features fusion, we design an algorithm to extract the microblogs of high quality, which transforms the features into wavelet domain to capture the detailed differences between the feature signals. Then the weights of the features are evaluated by EM algorithm and fused further to get a comprehensive value of each message. In addition, to reduce the effect of noisy features and speed up the operation, these features are processed through kernel principal component analysis before transforming into wavelet domain. Experimental results show that the proposed framework can extract information with higher quality, less redundancy, and greatly reduce the time consumption.

This research is supported by the Natural Science Foundation of China under contract No. 61472291, and Natural Science Foundation of Hubei Province, China under contract No. ZRY2014000901.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.199it.com/archives/101807.html.

  2. 2.

    http://open.weibo.com/.

References

  1. Peng, M., Zhu, J., Li, X., et al.: Central topic model for event-oriented topics mining in microblog stream. In: CIKM 2015, pp. 1611–1620 (2015)

    Google Scholar 

  2. Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part II. LNCS, vol. 8181, pp. 188–201. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  3. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  4. Scholkopf, B., Smola, A., Mller, K.R.: Kernel principal component analysis. In: ICANN 1997, pp. 583–588 (1997)

    Google Scholar 

  5. O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: exploratory search and topic summarization for twitter. In: ICWSM 2010, pp. 384–385 (2010)

    Google Scholar 

  6. Yang, X., Ghoting, A., Ruan, Y., et al.: A framework for summarizing and analyzing twitter feeds. In: KDD 2012, pp. 370–378 (2012)

    Google Scholar 

  7. Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: SocialCom 2010, pp. 49–56 (2010)

    Google Scholar 

  8. Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 177–188. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Zhu, J., et al.: Coherent topic hierarchy: a strategy for topic evolutionary analysis on microblog feeds. In: Li, J., Sun, Y., Yu, X., Sun, Y., Dong, X.L., Dong, X.L. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 70–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21042-1_6

    Chapter  Google Scholar 

  10. Chen, Y., Cheng, X., Yang, S.: Finding high quality threads in web forums. J. Softw. 22(8), 1785–1804 (2011)

    Article  Google Scholar 

  11. Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: SIGIR 2004, pp. 394–401 (2004)

    Google Scholar 

  12. Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. TKDE 23(10), 1498–1512 (2011)

    Google Scholar 

  13. Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: NIST SP, pp. 243–243 (1994)

    Google Scholar 

  14. Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150 (2003)

    Google Scholar 

  15. Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)

    Article  MATH  Google Scholar 

  16. He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR 2007, pp. 207–214 (2007)

    Google Scholar 

  17. Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematic, Philadelphia (1992)

    Book  MATH  Google Scholar 

  18. Chipman, H.A., Kolaczyk, E.D., McCulloch, R.E.: Adaptive bayesian wavelet shrinkage. J. Am. Stat. Assoc. 92(440), 1413–1421 (1977)

    Article  MATH  Google Scholar 

  19. Burstei, J., Wolska, M.: Toward evaluation of writing style: finding overly repetitive word use in student essays. In: EACL 2003, pp. 35–42 (2003)

    Google Scholar 

  20. Becker, H., Naaman, M., Gravano, L.: Selecting quality twitter content for events. In: ICWSM 2011 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Peng, M. et al. (2016). KPCA-WT: An Efficient Framework for High Quality Microblog Extraction in Time-Frequency Domain. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39958-4_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39957-7

  • Online ISBN: 978-3-319-39958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics