Enhanced Topic Representation by Ambiguity Handling

Geeganage, Dakshi Kapugama; Xu, Yue; Koggalahewa, Darshika; Li, Yuefeng

doi:10.1007/978-3-031-20891-1_25

Dakshi Kapugama Geeganage¹²,
Yue Xu¹²,
Darshika Koggalahewa¹² &
…
Yuefeng Li¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13724))

Included in the following conference series:

International Conference on Web Information Systems Engineering

972 Accesses

Abstract

Most of the existing semantic-based topic models and topic generation approaches use external knowledgebases or ontology to interpret the meanings of the words. However, general ontologies do not cover many ambiguous or specific domain-related words in a text collection. Hence those ambiguous or domain-specific words are neglected in capturing the meanings in topic generation. In this paper, we introduce an approach to disambiguate the unmatched words in a text collection based on related and similar meaning words. Word embeddings are applied to discover similar or related words. We evaluated the topic generation approach with our ambiguity handling technique with a set of state-of-the-art systems which uses an external ontology. Our approach outperformed, and the generated topics were more meaningful. Our ambiguity handling approach interpreted all the important words and included them in the topic generation process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anderson, R.C., Nagy, W.E.: The vocabulary conundrum. Technical report. University of Illinois at Urbana-Champaign (1993)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Blei, D.M., Lafferty, J.D.: Correlated topic models. In: Proceedings of the 18th International Conference on Neural Information Processing Systems, NIPS 2005, pp. 147–154. MIT Press, Cambridge (2005). http://dl.acm.org/citation.cfm?id=2976248.2976267
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008. ACM Press (2008). https://doi.org/10.1145/1376616.1376746
Carnine, D., Kameenui, E.J., Coyle, G.: Utilization of contextual information in determining the meaning of unfamiliar words. Read. Res. Q. 19(2), 188 (1984). https://doi.org/10.2307/747362
Article Google Scholar
Geeganage, D.T.K., Xu, Y., Li, Y.: Semantic-based topic representation using frequent semantic patterns. Knowl.-Based Syst. 216, 106808 (2021). https://doi.org/10.1016/j.knosys.2021.106808
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000). https://doi.org/10.1145/335191.335372
Article Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999. ACM Press (1999). https://doi.org/10.1145/312624.312649
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004). http://dl.acm.org/citation.cfm?id=1005332.1005345
McGinnis, D., Zelinski, E.M.: Understanding unfamiliar words: the influence of processing resources, vocabulary knowledge, and age. Psychol. Aging 15(2), 335–350 (2000). https://doi.org/10.1037/0882-7974.15.2.335
Article Google Scholar
Allahyaria, M., Pouriyeha, S., Kochuta, K., Arabniaa, H.R.: OntoLDA: an ontology-based topic model for automatic topic labeling. In: IEEE 14th International Conference on Machine Learning and Applications (2015)
Google Scholar
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995). https://doi.org/10.1145/219717.219748
Article Google Scholar
Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Stroudsburg, PA, USA, pp. 262–272 (2011). http://dl.acm.org/citation.cfm?id=2145432.2145462
Navigli, R., Ponzetto, S.P.: BabelNet: building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 216–225 (2010)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). http://www.aclweb.org/anthology/D14-1162
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis. Routledge (2013). https://doi.org/10.4324/9780203936399.ch21
Suchanek, F.M., Kasneci, G., Weikum, G.: Yago. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007. ACM Press (2007). https://doi.org/10.1145/1242572.1242667
Tang, Y.-K., Mao, X.-L., Huang, H., Shi, X., Wen, G.: Conceptualization topic modeling. Multimed. Tools Appl. 77(3), 3455–3471 (2017). https://doi.org/10.1007/s11042-017-5145-4
Article Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase. In: Proceedings of the 2012 International Conference on Management of Data, SIGMOD 2012. ACM Press (2012). https://doi.org/10.1145/2213836.2213891
Yao, L., Zhang, Y., Wei, B., Qian, H., Wang, Y.: Incorporating probabilistic knowledge into topic models. In: Cao, T., Lim, E.-P., Zhou, Z.-H., Ho, T.-B., Cheung, D., Motoda, H. (eds.) PAKDD 2015. LNCS (LNAI), vol. 9078, pp. 586–597. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18032-8_46
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Queensland University of Technology, Brisbane, Australia
Dakshi Kapugama Geeganage, Yue Xu, Darshika Koggalahewa & Yuefeng Li

Authors

Dakshi Kapugama Geeganage
View author publications
You can also search for this author in PubMed Google Scholar
Yue Xu
View author publications
You can also search for this author in PubMed Google Scholar
Darshika Koggalahewa
View author publications
You can also search for this author in PubMed Google Scholar
Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dakshi Kapugama Geeganage .

Editor information

Editors and Affiliations

University of Pau and Pays de l'Adour, Anglet, France
Richard Chbeir
The University of Queensland, Brisbane, QLD, Australia
Helen Huang
Sapienza Università di Roma, Rome, Italy
Fabrizio Silvestri
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
The New Cyber Research Department, Peng Cheng Laboratory, Shenzhen, China
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geeganage, D.K., Xu, Y., Koggalahewa, D., Li, Y. (2022). Enhanced Topic Representation by Ambiguity Handling. In: Chbeir, R., Huang, H., Silvestri, F., Manolopoulos, Y., Zhang, Y. (eds) Web Information Systems Engineering – WISE 2022. WISE 2022. Lecture Notes in Computer Science, vol 13724. Springer, Cham. https://doi.org/10.1007/978-3-031-20891-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-20891-1_25
Published: 07 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20890-4
Online ISBN: 978-3-031-20891-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhanced Topic Representation by Ambiguity Handling