Skip to main content

Advertisement

Log in

Task-oriented keyphrase extraction from social media

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Keyphrase extraction from social media is a crucial and challenging task. Previous studies usually focus on extracting keyphrases that provide the summary of a corpus. However, they do not take users’ specific needs into consideration. In this paper, we propose a novel three-stage model to learn a keyphrase set that represents or related to a particular topic. Firstly, a phrase mining algorithm is applied to segment the documents into human-interpretable phrases. Secondly, we propose a weakly supervised model to extract candidate keyphrases, which uses a few pre-specific seed keyphrases to guide the model. The model consequently makes the extracted keyphrases more specific and related to the seed keyphrases (which reflect the user’s needs). Finally, to further identify the implicitly related phrases, the PMI-IR algorithm is employed to obtain the synonyms of the extracted candidate keyphrases. We conducted experiments on two publicly available datasets from news and Twitter. The experimental results demonstrate that our approach outperforms the state-of-the-art baselines and has the potential to extract high-quality task-oriented keyphrases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. Available at https://www.google.com/advanced_search.

  2. Available at http://qwone.com/~jason/20Newsgroups.

  3. Available at http://www.nltk.org.

  4. Available at http://www.ranks.nl/stopwords.

  5. Available at http://wordnet.princeton.edu/.

References

  1. Agrawal R, Srikant R et al (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference of very large data bases, VLDB, vol 1215, pp 487–499

    Google Scholar 

  2. Arora R, Ravindran B (2008) Latent dirichlet allocation based multi-document summarization. In: Proceedings of the second workshop on analytics for noisy unstructured text data. ACM, pp 91–97

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  4. Chang X, Nie F, Wang S, Yi Y, Zhou X, Zhang C (2016) Compound rank- k projections for bilinear analysis. IEEE Transactions on Neural Networks and Learning Systems 27(7):1502–1513

    Article  MathSciNet  Google Scholar 

  5. Chang X, Yu Y-L, Yi Y, Xing EP (2016) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell. doi:10.1109/TPAMI.2016.2608901

    Article  Google Scholar 

  6. Chang X, Yi Y (2017) Semisupervised feature analysis by mining correlations among multiple tasks. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2582746

    Article  MathSciNet  Google Scholar 

  7. Chang X, Ma Z, Lin M, Yi Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast kinect motion detection. IEEE Trans Image Process 26(8):3911–3920

    Article  MathSciNet  Google Scholar 

  8. Chang X, Ma Z, Yi Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybern 47(5):1180–1197

    Article  Google Scholar 

  9. Chen J, Zhang B, Shen D, Yang Q, Chen Z, Cheng Q (2006) Diverse topic phrase extraction from text collection

  10. Chien L-F (1997) Pat-tree-based keyword extraction for chinese information retrieval. In: ACM SIGIR forum, vol 31. ACM, pp 50–58

  11. Choi Y, Cardie C (2009) Adapting a polarity lexicon using integer linear programming for domain-specific sentiment classification. In: Proceedings of the 2009 conference on empirical methods in natural language processing, vol 2. Association for Computational Linguistics, pp 590–598

  12. El-Kishky A, Song Y, Wang C, Voss CR, Han J (2014) Scalable topical phrase mining from text corpora. Proceedings of the VLDB Endowment 8(3):305–316

    Article  Google Scholar 

  13. Feng X, Huang L, Tang D, Qin B, Ji H, Liu T (2016) A language-independent neural network for event detection. In: The 54th annual meeting of the association for computational linguistics, p 66

    Google Scholar 

  14. Firth JR (1957) A synopsis of linguistic theory, 1930-1955

  15. Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extraction

  16. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 19–25

  17. Lafferty J, McCallum A, Pereira F et al (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the eighteenth international conference on machine learning, vol 1, pp 282–289

    Google Scholar 

  18. Li J, Fan Q, Zhang K (2007) Keyword extraction based on tf/idf for chinese news document. Wuhan Univ J Nat Sci 12(5):917–921. doi:10.1007/s11859-007-0038-4

    Article  Google Scholar 

  19. Lott B (2012) Survey of keyword extraction techniques. UNM Education

  20. Ma Z, Chang X, Yi Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia 19(7):1558–1568

    Article  Google Scholar 

  21. Neto JL, Santos AD, Kaestner CAA, Alexandre N, Santos D et al (2000) Document clustering and text summarization

  22. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  23. Shamma DA, Kennedy L, Churchill EF (2009) Tweet the debates: understanding community annotation of uncollected sources. In: Proceedings of the first SIGMM workshop on social media. ACM, pp 3–10

  24. Tu W, Cheung DW-L, Mamoulis N, Yang M, Lu Z (2015) Real-time detection and sorting of news on microblogging platforms. In: PACLIC

    Google Scholar 

  25. Turney P (2001) Mining the web for synonyms: Pmi-ir versus lsa on toefl

  26. Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retr 2 (4):303–336

    Article  Google Scholar 

  27. Turney PD (2002) Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on association for computational linguistics. Association for Computational Linguistics, pp 417–424

  28. Yang M, Chow K-P (2015) An information extraction framework for digital forensic investigations. In: IFIP international conference on digital forensics. Springer, Cham, pp 61–76

    Google Scholar 

  29. Yang M, Peng B, Chen Z, Zhu D, Chow K-P (2014) A topic model for building fine-grained domain-specific emotion lexicon. pp 421–426. ACL

  30. Yang M, Zhu D, Rashed M, Chow K-P (2014) Learning domain-specific sentiment lexicon with supervised sentiment-aware lda. In: The 21st European conference on artificial intelligence (ECAI). IOS Press

  31. Yang M, Cui T, Tu W (2015) Ordering-sensitive and semantic-aware topic modeling. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2353–2359

    Google Scholar 

  32. Zhang C (2008) Automatic keyword extraction from documents using conditional random fields. J Comput Inf Syst 4(3):1169–1180

    Google Scholar 

  33. Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybern. doi:10.1109/TCYB.2016.2591068

    Article  Google Scholar 

  34. Zhu J, Xie Q, Yu S-I, Wong WH (2016) Exploiting link structure for web page genre identification. Data Min Knowl Disc 30(3):550–575

    Article  MathSciNet  Google Scholar 

  35. Zhu J, Xu C, Li Z, Fung G, Lin X, Huang J, Huang C (2016) An examination of on-line machine learning approaches for pseudo-random generated data. Clust Comput 19(3):1309–1321

    Article  Google Scholar 

  36. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Liang, Y., Zhao, W. et al. Task-oriented keyphrase extraction from social media. Multimed Tools Appl 77, 3171–3187 (2018). https://doi.org/10.1007/s11042-017-5041-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5041-y

Keywords

Navigation