Abstract
In recent years, the research on topic modeling techniques has become a hot topic among researchers thanks to their ability to classify and understand a large text corpora which has a beneficial effect on information retrieval performance, but recently user queries are more complicated because they need to know not only which documents are most helpful to them, but also which parts of documents are more or less related to their request. Also, they need to search by topic or document, not merely by keywords.
In this context, we propose a new approach of automated text classification based on LDA topic modeling algorithm and the rich semantic document structure which helps to semantically enrich the generated classes by indexing them in the documents sections according to their probabilities distribution and visualize them through a hyper-graph.
Experiments have been conducted to measure the effectiveness of our solution compared to topic modeling classification approaches based on text content only. The results show the superiority of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Slimane, B., Mounsif, M., Ghada, I.D.: Topic modeling: comparison of LSA and LDA on scientific publications. In: DSDE 2021, Barcelona, Spain, 18–20 February 2021 (2021)
Luo, X.: Efficient English text classification using selected machine learning techniques. Alexandria Eng. J. 60, 3401–3409 (2021). https://doi.org/10.1016/j.aej.2021.02.009
Khemiri, A., Drissi, A., Tissaoui, A., Sassi, S., Chbier, R.: Learn2Construct: an automatic ontology construction based on LDA from texual data. In: MEDES 2021, Proceedings of the 13th International Conference on Management of Digital Ecosystems, November 2021, pp. 49–56 (2021)
Shaymaa, H.M., Al-augby, S.: LSA and LDA topic modeling classification: comparison study on E-books. Indones. J. Electr. Eng. Comput. Sci. 19(1), 353–362 (2020)
Kadhim, A.I.: Survey on supervised machine learning techniques for automatic text classification. Artif. Intell. Rev. 52(1), 273–292 (2019). https://doi.org/10.1007/s10462-018-09677-1
Devezas, J., Nunes, S.: Hypergraph-of-entity. Open Comput. Sci. 9, 103–127 (2019)
Kherwa, P., Bansal, P.: Topic modeling: a comprehensive review. Researchgate (2018). https://www.researchgate.net/publication/334667298-Topic-Modeling-A-Comprehensive-Review
Bitew, S.K.: Logical structure extraction of electronic documents using contextual information. University of Twente (2018)
Gong, H., You, F., Guan, X., Cao, Y., Lai, S.: Application of LDA topic model in E-mail subject classification. In: International Conference on Transportation & Logistics, Information & Communication, Smart City (TLICSC 2018) (2018)
Pavlinek, M., Podgorelec, V.: Text classification method based on Self-Training and LDA topic models. Expert Syst. Appl. J. 80, 83–93 (2017)
Boyd-Graber, J., Yuening, H., Mimno, D.: Applications of topic models. Found. Trends Inf. Retr. 11(2–3), 143–296 (2017). https://doi.org/10.1561/1500000030
Rani, M., Dhar, A.K., Vyas, OP.: Semi-automatic terminology ontology learning based on topic modeling. Semantic scholar (2017). https://www.semanticscholar.org/paper/Semi-automatic-terminology-ontology-learning-based-Rani-Dhar/4948d5f16cd1f6733f2d989577119fdd18c83d02
Rajasundari, T., Subathra, P., Kumar, P.: Performance analysis of topic modeling algorithms for news articles. J. Adv. Res. Dyn. Control Syst. 11, 175–183 (2017)
Chen, Q., Yao, L., Yang, J.: Short text classification based on LDA topic model. IEEE, ICALIP (2016)
Rubayyi, A., Khalid, A.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6(1), 147–194 (2015)
Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. ACM (2014). https://doi.org/10.1145/2661829.2661935
Tyagi, N., Rishi, R., Agarwal, R.P.: Semantic structure representation of HTML document suitable for semantic document retrieval. Int. J. Comput. Appl. 46(13), 0975–8887 (2012)
Bindra A.: SocialLDA: scalable topic modeling in social networks. Dissertation University of Washington (2012)
Keith, S., Philip, K., David, A., David, B.: Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 952–961 (2012)
David, M., Hanna, M. W., Edmund, T., Miriam, L., Andrew, M.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, pp. 262–272. Association for Computational Linguistics, USA (2011)
Kim, B.G., Park, S.I., Kim, H.J., Lee, S.H.: Automatic extraction of apparent semantic structure from text contents of a structural calculation document. J. Comput. Civ. Eng. 24(3), 312–324 (2010)
Wu, D., Wang, H.L.: Role of ontology in information retrieval. J. Electron. Sci. Technol. China 4(2), 148–154 (2006). https://www.researchgate.net/publication/301227711
Gonçalves, T., Quaresma, P.: Evaluating preprocessing techniques in a text classification problem. São Leopoldo, RS, Bras. SBC-Sociedade Brasilleira De Computacao, pp. 841–850 (2005)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Drissi, A., Tissaoui, A., Sassi, S., Chbeir, R., Jemai, A. (2022). S-LDA: Documents Classification Enrichment for Information Retrieval. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_56
Download citation
DOI: https://doi.org/10.1007/978-3-031-16210-7_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16209-1
Online ISBN: 978-3-031-16210-7
eBook Packages: Computer ScienceComputer Science (R0)