Skip to main content

S-LDA: Documents Classification Enrichment for Information Retrieval

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2022)

Abstract

In recent years, the research on topic modeling techniques has become a hot topic among researchers thanks to their ability to classify and understand a large text corpora which has a beneficial effect on information retrieval performance, but recently user queries are more complicated because they need to know not only which documents are most helpful to them, but also which parts of documents are more or less related to their request. Also, they need to search by topic or document, not merely by keywords.

In this context, we propose a new approach of automated text classification based on LDA topic modeling algorithm and the rich semantic document structure which helps to semantically enrich the generated classes by indexing them in the documents sections according to their probabilities distribution and visualize them through a hyper-graph.

Experiments have been conducted to measure the effectiveness of our solution compared to topic modeling classification approaches based on text content only. The results show the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nltk.org/.

  2. 2.

    https://pypi.org/project/gensim/.

  3. 3.

    https://dev.springernature.com/.

References

  1. Slimane, B., Mounsif, M., Ghada, I.D.: Topic modeling: comparison of LSA and LDA on scientific publications. In: DSDE 2021, Barcelona, Spain, 18–20 February 2021 (2021)

    Google Scholar 

  2. Luo, X.: Efficient English text classification using selected machine learning techniques. Alexandria Eng. J. 60, 3401–3409 (2021). https://doi.org/10.1016/j.aej.2021.02.009

    Article  Google Scholar 

  3. Khemiri, A., Drissi, A., Tissaoui, A., Sassi, S., Chbier, R.: Learn2Construct: an automatic ontology construction based on LDA from texual data. In: MEDES 2021, Proceedings of the 13th International Conference on Management of Digital Ecosystems, November 2021, pp. 49–56 (2021)

    Google Scholar 

  4. Shaymaa, H.M., Al-augby, S.: LSA and LDA topic modeling classification: comparison study on E-books. Indones. J. Electr. Eng. Comput. Sci. 19(1), 353–362 (2020)

    Google Scholar 

  5. Kadhim, A.I.: Survey on supervised machine learning techniques for automatic text classification. Artif. Intell. Rev. 52(1), 273–292 (2019). https://doi.org/10.1007/s10462-018-09677-1

    Article  MathSciNet  Google Scholar 

  6. Devezas, J., Nunes, S.: Hypergraph-of-entity. Open Comput. Sci. 9, 103–127 (2019)

    Article  Google Scholar 

  7. Kherwa, P., Bansal, P.: Topic modeling: a comprehensive review. Researchgate (2018). https://www.researchgate.net/publication/334667298-Topic-Modeling-A-Comprehensive-Review

  8. Bitew, S.K.: Logical structure extraction of electronic documents using contextual information. University of Twente (2018)

    Google Scholar 

  9. Gong, H., You, F., Guan, X., Cao, Y., Lai, S.: Application of LDA topic model in E-mail subject classification. In: International Conference on Transportation & Logistics, Information & Communication, Smart City (TLICSC 2018) (2018)

    Google Scholar 

  10. Pavlinek, M., Podgorelec, V.: Text classification method based on Self-Training and LDA topic models. Expert Syst. Appl. J. 80, 83–93 (2017)

    Article  Google Scholar 

  11. Boyd-Graber, J., Yuening, H., Mimno, D.: Applications of topic models. Found. Trends Inf. Retr. 11(2–3), 143–296 (2017). https://doi.org/10.1561/1500000030

    Article  Google Scholar 

  12. Rani, M., Dhar, A.K., Vyas, OP.: Semi-automatic terminology ontology learning based on topic modeling. Semantic scholar (2017). https://www.semanticscholar.org/paper/Semi-automatic-terminology-ontology-learning-based-Rani-Dhar/4948d5f16cd1f6733f2d989577119fdd18c83d02

  13. Rajasundari, T., Subathra, P., Kumar, P.: Performance analysis of topic modeling algorithms for news articles. J. Adv. Res. Dyn. Control Syst. 11, 175–183 (2017)

    Google Scholar 

  14. Chen, Q., Yao, L., Yang, J.: Short text classification based on LDA topic model. IEEE, ICALIP (2016)

    Google Scholar 

  15. Rubayyi, A., Khalid, A.: A survey of topic modeling in text mining. Int. J. Adv. Comput. Sci. Appl. 6(1), 147–194 (2015)

    Google Scholar 

  16. Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. ACM (2014). https://doi.org/10.1145/2661829.2661935

  17. Tyagi, N., Rishi, R., Agarwal, R.P.: Semantic structure representation of HTML document suitable for semantic document retrieval. Int. J. Comput. Appl. 46(13), 0975–8887 (2012)

    Google Scholar 

  18. Bindra A.: SocialLDA: scalable topic modeling in social networks. Dissertation University of Washington (2012)

    Google Scholar 

  19. Keith, S., Philip, K., David, A., David, B.: Exploring topic coherence over many models and many topics. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 952–961 (2012)

    Google Scholar 

  20. David, M., Hanna, M. W., Edmund, T., Miriam, L., Andrew, M.: Optimizing semantic coherence in topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, pp. 262–272. Association for Computational Linguistics, USA (2011)

    Google Scholar 

  21. Kim, B.G., Park, S.I., Kim, H.J., Lee, S.H.: Automatic extraction of apparent semantic structure from text contents of a structural calculation document. J. Comput. Civ. Eng. 24(3), 312–324 (2010)

    Article  Google Scholar 

  22. Wu, D., Wang, H.L.: Role of ontology in information retrieval. J. Electron. Sci. Technol. China 4(2), 148–154 (2006). https://www.researchgate.net/publication/301227711

  23. Gonçalves, T., Quaresma, P.: Evaluating preprocessing techniques in a text classification problem. São Leopoldo, RS, Bras. SBC-Sociedade Brasilleira De Computacao, pp. 841–850 (2005)

    Google Scholar 

  24. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amani Drissi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Drissi, A., Tissaoui, A., Sassi, S., Chbeir, R., Jemai, A. (2022). S-LDA: Documents Classification Enrichment for Information Retrieval. In: Bădică, C., Treur, J., Benslimane, D., Hnatkowska, B., Krótkiewicz, M. (eds) Advances in Computational Collective Intelligence. ICCCI 2022. Communications in Computer and Information Science, vol 1653. Springer, Cham. https://doi.org/10.1007/978-3-031-16210-7_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16210-7_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16209-1

  • Online ISBN: 978-3-031-16210-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics