ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery

Wu, Junfeng; Huang, Guangyan; Zarei, Roozbeh

doi:10.1007/978-3-030-97546-3_63

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13151))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1780 Accesses
2 Citations

Abstract

Emerging topics, which often originate from the collaboration of two scientific subfields, can be represented by biterms (pairs of terms) where each term represents a distinct subfield. However, it is challenging to automatically find such two critical terms to represent an emerging topic exactly. First, existing term weighting models (such as TF-IDF, TextRank, RAKE, KECNW, and YAKE) may be effective for finding critical single-terms but not for critical biterms. Second, a potential biterm that may be suitable to represent the emerging topic has very low occurrences in a text (e.g., a corpus comprised of paper titles). So, even we combine two terms to generate a bag of biterms, the above term weighting models are still invalid, which will filter out these rare potential biterms. This paper proposes a novel Emerging Topic BiTerm Rank (ETBTRank) model to help automatically extract biterms for representing emerging topics, distinguishing emerging-topic biterms from unimportant biterms. In ETBTRank, we separately weigh the two terms in a biterm and find the emerging-topic biterms by a rule: if a biterm itself is rare, but each of the two terms in it has a high weight, then it is an emerging topic biterm. Experimental studies on paper title datasets demonstrate the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdi, H.: The Kendall rank correlation coefficient. In: Encyclopedia of Measurement and Statistics, pp. 508–510. Sage, Thousand Oaks (2007)
Google Scholar
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Noise Reduction in Speech Processing, pp. 1–4. Springer, Cham (2009). https://doi.org/10.1007/978-3-642-00296-0_5
Biswas, S.K., Bordoloi, M., Shreya, J.: A graph based keyword extraction model using collective node weight. Expert Syst. Appl. 97, 51–59 (2018)
Article Google Scholar
Bogomolova, A., Ryazanova, M., Balk, I.: Cluster approach to analysis of publication titles. In: Journal of Physics: Conference Series, vol. 1727, p. 012016. IOP Publishing (2021)
Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., Jatowt, A.: YAKE! Keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Article Google Scholar
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Chapter Google Scholar
Dridi, A., Gaber, M.M., Azad, R.M.A., Bhogal, J.: Leap2Trend: a temporal word embedding approach for instant detection of emerging scientific trends. IEEE Access 7, 176414–176428 (2019)
Article Google Scholar
Heylen, K., De Hertog, D.: Automatic term extraction. In: Handbook of Terminology, vol. 1, no. 01 (2015)
Google Scholar
Li, W., Matsukawa, T., Saigo, H., Suzuki, E.: Context-aware latent Dirichlet allocation for topic segmentation. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12084, pp. 475–486. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47426-3_37
Chapter Google Scholar
Li, X., Zhang, A., Li, C., Guo, L., Wang, W., Ouyang, J.: Relational biterm topic model: short-text topic modeling using word embeddings. Comput. J. 62(3), 359–372 (2019)
Article Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Google Scholar
Ramos, J., et al.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 29–48. Citeseer (2003)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Text Mining: Applications and Theory, vol. 1, pp. 1–20 (2010)
Google Scholar
Tuan, A.P., Tran, B., Nguyen, T.H., Van, L.N., Than, K.: Bag of biterms modeling for short texts. Knowl. Inf. Syst. 62(10), 4055–4090 (2020). https://doi.org/10.1007/s10115-020-01482-z
Article Google Scholar
Wu, D., Zhang, M., Shen, C., Huang, Z., Gu, M.: BTM and GloVe similarity linear fusion-based short text clustering algorithm for microblog hot topic discovery. IEEE Access 8, 32215–32225 (2020)
Article Google Scholar
Yan, X., Guo, J., Lan, Y., Cheng, X.: A biterm topic model for short texts. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1445–1456 (2013)
Google Scholar
Yang, S., Huang, G., Ofoghi, B.: Short text similarity measurement using context from bag of word pairs and word co-occurrence. In: He, J., et al. (eds.) ICDS 2019. CCIS, vol. 1179, pp. 221–231. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2810-1_22
Chapter Google Scholar
Yang, S., Huang, G., Ofoghi, B., Yearwood, J.: Short text similarity measurement using context-aware weighted biterms. Concurr. Comput. Pract. Exp., e5765 (2020)
Google Scholar

Download references

Acknowledgement

This work was partially supported by Australia Research Council (ARC) Discovery Project (DP190100587).

Author information

Authors and Affiliations

School of Information Technology, Deakin University, 211 Burwood Highway, Burwood, VIC, 3125, Australia
Junfeng Wu, Guangyan Huang & Roozbeh Zarei

Authors

Junfeng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Guangyan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Roozbeh Zarei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangyan Huang .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, NSW, Australia
Guodong Long
RMIT University, Melbourne, SA, Australia
Xinghuo Yu
University of Queensland, Brisbane, QLD, Australia
Sen Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, J., Huang, G., Zarei, R. (2022). ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_63

Download citation

DOI: https://doi.org/10.1007/978-3-030-97546-3_63
Published: 19 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery