Skip to main content

ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery

  • Conference paper
  • First Online:
AI 2021: Advances in Artificial Intelligence (AI 2022)

Abstract

Emerging topics, which often originate from the collaboration of two scientific subfields, can be represented by biterms (pairs of terms) where each term represents a distinct subfield. However, it is challenging to automatically find such two critical terms to represent an emerging topic exactly. First, existing term weighting models (such as TF-IDF, TextRank, RAKE, KECNW, and YAKE) may be effective for finding critical single-terms but not for critical biterms. Second, a potential biterm that may be suitable to represent the emerging topic has very low occurrences in a text (e.g., a corpus comprised of paper titles). So, even we combine two terms to generate a bag of biterms, the above term weighting models are still invalid, which will filter out these rare potential biterms. This paper proposes a novel Emerging Topic BiTerm Rank (ETBTRank) model to help automatically extract biterms for representing emerging topics, distinguishing emerging-topic biterms from unimportant biterms. In ETBTRank, we separately weigh the two terms in a biterm and find the emerging-topic biterms by a rule: if a biterm itself is rare, but each of the two terms in it has a high weight, then it is an emerging topic biterm. Experimental studies on paper title datasets demonstrate the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdi, H.: The Kendall rank correlation coefficient. In: Encyclopedia of Measurement and Statistics, pp. 508–510. Sage, Thousand Oaks (2007)

    Google Scholar 

  2. Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Noise Reduction in Speech Processing, pp. 1–4. Springer, Cham (2009). https://doi.org/10.1007/978-3-642-00296-0_5

  3. Biswas, S.K., Bordoloi, M., Shreya, J.: A graph based keyword extraction model using collective node weight. Expert Syst. Appl. 97, 51–59 (2018)

    Article  Google Scholar 

  4. Bogomolova, A., Ryazanova, M., Balk, I.: Cluster approach to analysis of publication titles. In: Journal of Physics: Conference Series, vol. 1727, p. 012016. IOP Publishing (2021)

    Google Scholar 

  5. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)

    Article  Google Scholar 

  6. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80

    Chapter  Google Scholar 

  7. Dridi, A., Gaber, M.M., Azad, R.M.A., Bhogal, J.: Leap2Trend: a temporal word embedding approach for instant detection of emerging scientific trends. IEEE Access 7, 176414–176428 (2019)

    Article  Google Scholar 

  8. Heylen, K., De Hertog, D.: Automatic term extraction. In: Handbook of Terminology, vol. 1, no. 01 (2015)

    Google Scholar 

  9. Li, W., Matsukawa, T., Saigo, H., Suzuki, E.: Context-aware latent Dirichlet allocation for topic segmentation. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 475–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_37

    Chapter  Google Scholar 

  10. Li, X., Zhang, A., Li, C., Guo, L., Wang, W., Ouyang, J.: Relational biterm topic model: short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)

    Article  Google Scholar 

  11. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)

    Google Scholar 

  12. Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 29–48. Citeseer (2003)

    Google Scholar 

  13. Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining: Applications and Theory, vol. 1, pp. 1–20 (2010)

    Google Scholar 

  14. Tuan, A.P., Tran, B., Nguyen, T.H., Van, L.N., Than, K.: Bag of biterms modeling for short texts. Knowl. Inf. Syst. 62(10), 4055–4090 (2020). https://doi.org/10.1007/s10115-020-01482-z

    Article  Google Scholar 

  15. Wu, D., Zhang, M., Shen, C., Huang, Z., Gu, M.: BTM and GloVe similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8, 32215–32225 (2020)

    Article  Google Scholar 

  16. Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)

    Google Scholar 

  17. Yang, S., Huang, G., Ofoghi, B.: Short text similarity measurement using context from bag of word pairs and word co-occurrence. In: He, J., et al. (eds.) ICDS 2019. CCIS, vol. 1179, pp. 221–231. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2810-1_22

    Chapter  Google Scholar 

  18. Yang, S., Huang, G., Ofoghi, B., Yearwood, J.: Short text similarity measurement using context-aware weighted biterms. Concurr. Comput. Pract. Exp., e5765 (2020)

    Google Scholar 

Download references

Acknowledgement

This work was partially supported by Australia Research Council (ARC) Discovery Project (DP190100587).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangyan Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, J., Huang, G., Zarei, R. (2022). ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-97546-3_63

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-97545-6

  • Online ISBN: 978-3-030-97546-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics