Abstract
Emerging topics, which often originate from the collaboration of two scientific subfields, can be represented by biterms (pairs of terms) where each term represents a distinct subfield. However, it is challenging to automatically find such two critical terms to represent an emerging topic exactly. First, existing term weighting models (such as TF-IDF, TextRank, RAKE, KECNW, and YAKE) may be effective for finding critical single-terms but not for critical biterms. Second, a potential biterm that may be suitable to represent the emerging topic has very low occurrences in a text (e.g., a corpus comprised of paper titles). So, even we combine two terms to generate a bag of biterms, the above term weighting models are still invalid, which will filter out these rare potential biterms. This paper proposes a novel Emerging Topic BiTerm Rank (ETBTRank) model to help automatically extract biterms for representing emerging topics, distinguishing emerging-topic biterms from unimportant biterms. In ETBTRank, we separately weigh the two terms in a biterm and find the emerging-topic biterms by a rule: if a biterm itself is rare, but each of the two terms in it has a high weight, then it is an emerging topic biterm. Experimental studies on paper title datasets demonstrate the effectiveness of the proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdi, H.: The Kendall rank correlation coefficient. In: Encyclopedia of Measurement and Statistics, pp. 508–510. Sage, Thousand Oaks (2007)
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Noise Reduction in Speech Processing, pp. 1–4. Springer, Cham (2009). https://doi.org/10.1007/978-3-642-00296-0_5
Biswas, S.K., Bordoloi, M., Shreya, J.: A graph based keyword extraction model using collective node weight. Expert Syst. Appl. 97, 51–59 (2018)
Bogomolova, A., Ryazanova, M., Balk, I.: Cluster approach to analysis of publication titles. In: Journal of Physics: Conference Series, vol. 1727, p. 012016. IOP Publishing (2021)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Dridi, A., Gaber, M.M., Azad, R.M.A., Bhogal, J.: Leap2Trend: a temporal word embedding approach for instant detection of emerging scientific trends. IEEE Access 7, 176414–176428 (2019)
Heylen, K., De Hertog, D.: Automatic term extraction. In: Handbook of Terminology, vol. 1, no. 01 (2015)
Li, W., Matsukawa, T., Saigo, H., Suzuki, E.: Context-aware latent Dirichlet allocation for topic segmentation. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 475–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_37
Li, X., Zhang, A., Li, C., Guo, L., Wang, W., Ouyang, J.: Relational biterm topic model: short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 29–48. Citeseer (2003)
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining: Applications and Theory, vol. 1, pp. 1–20 (2010)
Tuan, A.P., Tran, B., Nguyen, T.H., Van, L.N., Than, K.: Bag of biterms modeling for short texts. Knowl. Inf. Syst. 62(10), 4055–4090 (2020). https://doi.org/10.1007/s10115-020-01482-z
Wu, D., Zhang, M., Shen, C., Huang, Z., Gu, M.: BTM and GloVe similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8, 32215–32225 (2020)
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Yang, S., Huang, G., Ofoghi, B.: Short text similarity measurement using context from bag of word pairs and word co-occurrence. In: He, J., et al. (eds.) ICDS 2019. CCIS, vol. 1179, pp. 221–231. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2810-1_22
Yang, S., Huang, G., Ofoghi, B., Yearwood, J.: Short text similarity measurement using context-aware weighted biterms. Concurr. Comput. Pract. Exp., e5765 (2020)
Acknowledgement
This work was partially supported by Australia Research Council (ARC) Discovery Project (DP190100587).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Wu, J., Huang, G., Zarei, R. (2022). ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_63
Download citation
DOI: https://doi.org/10.1007/978-3-030-97546-3_63
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)