Skip to main content

Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository

  • Conference paper
  • First Online:
Proceedings of the ICR’22 International Conference on Innovations in Computing Research (ICR 2022)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1431))

Included in the following conference series:

  • 492 Accesses

Abstract

In the era of digitalization, the number of electronic text documents has been rapidly increasing on the Internet. Organizing these documents into meaningful clusters is becoming a necessity by using several methods (i.e., TF-IDF, Word Embedding) and based on documents clustering. Document clustering is the process of dynamically arranging documents into clusters such that the documents contained within a cluster are very similar to those contained inside other clusters. Due to the fact that traditional clustering algorithms do not take semantic relationships between words into account and therefore do not accurately represent the meaning of documents. Semantic information has been widely used to improve the quality of document clusters by grouping documents according to their meaning rather than their keywords. In this paper, twenty-five papers have been systematically reviewed that are published in the last seven years (from 2016 to 2022) linked to semantic similarities which are based on document clustering. Algorithms, similarity measures, tools, and evaluation methods usage have been discussed as well. As result, the survey shows that researchers used different datasets for applying semantic similarity-based clustering regarding the text similarity. Hereby, this paper proposes methods of semantic similarity approach-based clustering that can be used for short text semantic similarity included in online laboratories repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mehta, V.: Stamantic clustering: combining statistical and semantic features for clustering of large text datasets. Expert Syst. Appl. 174, 9 (2021)

    Article  Google Scholar 

  2. Jalal, A.A., Ali, B.H.: Text documents clustering using data mining techniques. Int. J. Electr. Comput. Eng. IJECE 11(1), 664 (2021). https://doi.org/10.11591/ijece.v11i1.pp664-670

    Article  Google Scholar 

  3. Haji, S.H., Abdulazeez, A.M., Zeebaree, D.Q., Ahmed, F.Y.H., Zebari, D.A.: The impact of different data mining classification techniques in different datasets. In: 2021 IEEE Symposium on Industrial Electronics and Applications (ISIEA), Langkawi Island, Malaysia, pp. 1–6 (2021). https://doi.org/10.1109/ISIEA51897.2021.9510006

  4. Diallo, B.: Multi-view document clustering based on geometrical similarity measurement. Int. J. Mach. Learn. Cybern. 13, 663–675 (2022). https://doi.org/10.1007/s13042-021-01295-8

    Article  Google Scholar 

  5. Zandieh, P., Shakibapoor, E.: Clustering data text based on semantic. Int. J. Comput. 26(1), 8 (2017)

    Google Scholar 

  6. Saiyad, N.Y., Prajapati, H.B., Dabhi, V.K.: A survey of document clustering using semantic approach, p. 8 (2016)

    Google Scholar 

  7. Ali, I., Melton, A.: Semantic-based text document clustering using cognitive semantic learning and graph theory. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 243–247 (2018). https://doi.org/10.1109/ICSC.2018.00042

  8. Polus, M.E., Abbas, T.: Intelligent text clustering based on semantics similarity, p. 7 (2020)

    Google Scholar 

  9. Ibrahim, R.K., Zeebaree, S.R.M., Jacksi, K., Sadeeq, M.A.M., Shukur, H.M., Alkhayyat, A.: Clustering document based semantic similarity system using TFIDF and k-mean. In: 2021 International Conference on Advanced Computer Applications (ACA), Maysan, Iraq, pp. 28–33 (2021). https://doi.org/10.1109/ACA52198.2021.9626822

  10. Bafna, P., Pramod, D., Vaidya, A.: Document clustering: TF-IDF approach, p. 6 (2016)

    Google Scholar 

  11. Qona’ah, N., Devi, A.R., Dana, I.M.G.M.: Laboratory clustering using k-means, k-medoids, and model-based clustering. Indones. J. Appl. Stat. 3(1), 64 (2020). https://doi.org/10.13057/ijas.v3i1.40823

  12. Lakshmi, R., Baskar, S.: Efficient text document clustering with new similarity measures. Int. J. Bus. Intell. Data Min. 18, 24 (2021)

    Google Scholar 

  13. Fatimi, S., El, C., Alaoui, L.: A framework for semantic text clustering. Int. J. Adv. Comput. Sci. Appl. 11(6) (2020). https://doi.org/10.14569/IJACSA.2020.0110657

  14. Alian, M.: Semantic similarity for English and Arabic texts: a review, p. 29 (2020)

    Google Scholar 

  15. Jacksi, K., Ibrahim, R.K., Zeebaree, S.R., Zebari, R.R., Sadeeq, M.A.: Clustering documents based on semantic similarity using HAC and k-mean algorithms. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 205–210 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436570

  16. Desai, S.S., Laxminarayana, J.A.: WordNet and semantic similarity based approach for document clustering. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, pp. 312–317 (2016). https://doi.org/10.1109/CSITSS.2016.7779377

  17. Mohammed, S.M., Jacksi, K., Zeebaree, S.R.M.: Glove word embedding and DBSCAN algorithms for semantic document clustering. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436540

  18. Radu, R.-G., Radulescu, I.-M., Truica, C.-O., Apostol, E.-S., Mocanu, M.: Clustering documents using the document to vector model for dimensionality reduction. In: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, pp. 1–6 (2020). https://doi.org/10.1109/AQTR49680.2020.9129967

  19. Salih, N.M., Jacksi, K.: Semantic document clustering using k-means algorithm and ward’s method. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436588

  20. Stanchev, L.: Semantic document clustering using a similarity graph. In: 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 1–8 (2016). https://doi.org/10.1109/ICSC.2016.8

  21. Hssina, B., Bouikhalene, B., Merbouha, A.: Evaluation of semantic similarity using vector space model based on textual corpus. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), Beni Mellal, Morocco, pp. 295–300 (2016). https://doi.org/10.1109/CGiV.2016.64

  22. Stanchev, L.: Semantic document clustering using information from WordNet and DBPedia. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 100–107 (2018). https://doi.org/10.1109/ICSC.2018.00023

  23. Banik, P., Gaikwad, S., Awate, A., Shaikh, S., Gunjgur, P., Padiya, P.: Semantic analysis of Wikipedia documents using ontology. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), Pondicherry, pp. 1–6 (2018). https://doi.org/10.1109/ICSCAN.2018.8541162

  24. Zafar, A., Awais, M., Aftab, M.A.: Ontology based document data analysis, p. 7 (2018)

    Google Scholar 

  25. Wang, S., Koopman, R.: Clustering articles based on semantic similarity. Scientometrics 111(2), 1017–1031 (2017). https://doi.org/10.1007/s11192-017-2298-x

    Article  Google Scholar 

  26. Al-Azzawy, D.S., Al-Rufaye, F.M.L.: Arabic words clustering by using k-means algorithm. In: 2017 Annual Conference on New Trends in Information and Communications Technology Applications (NTICT), Baghdad, Iraq, pp. 263–267 (2017). https://doi.org/10.1109/NTICT.2017.7976098

  27. Blokh, I., Alexandrov, V.: News clustering based on similarity analysis. Procedia Comput. Sci. 122, 715–719 (2017). https://doi.org/10.1016/j.procs.2017.11.428

    Article  Google Scholar 

  28. Afreen, S., Srinivasu, D.B.: Semantic based document clustering using lexical chains, vol. 04, no. 01, p. 7 (2017)

    Google Scholar 

  29. Jang, J., Lee, Y., Lee, S., Shin, D., Kim, D., Rim, H.: A novel density-based clustering method using word embedding features for dialogue intention recognition. Cluster Comput. 19(4), 2315–2326 (2016). https://doi.org/10.1007/s10586-016-0649-7

    Article  Google Scholar 

  30. Lwin, W.: Impressive approach for documents clustering using semantics relations in feature extraction. In: 2019 the 9th International Workshop on Computer Science and Engineering (2019). https://doi.org/10.18178/wcse.2019.03.007

  31. Rafi, M., Naveed, M., Arshad, W., Rafay, H.: Exploiting document level semantics in document clustering. Int. J. Adv. Comput. Sci. Appl. 7(6) (2016). https://doi.org/10.14569/IJACSA.2016.070660

  32. Rafi, M., Sharif, M.N., Arshad, W., Mohsin, S., Rafay, H.: Multi-layer semantics based document clustering. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, Nîmes France, pp. 1–4, June 2016. https://doi.org/10.1145/2912845.2912880

  33. Singh, K.N., Devi, S.D., Devi, H.M., Mahanta, A.K.: A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int. J. Inf. Manag. Data Insights 2(1), 100061 (2022).https://doi.org/10.1016/j.jjimei.2022.100061

  34. Shan, C., Du, Y.: A web service clustering method based on semantic similarity and multidimensional scaling analysis. Sci. Program. 2021, 1–12 (2021). https://doi.org/10.1155/2021/6661035

    Article  Google Scholar 

  35. Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., Gao, C.: A survey on the techniques, applications, and performance of short text semantic similarity. Concur. Comput. Pract. Exp. 33, 17 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saad Hikmat Haji .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Haji, S.H., Jacksi, K., Salah, R.M. (2022). Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_23

Download citation

Publish with us

Policies and ethics