Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository

Haji, Saad Hikmat; Jacksi, Karwan; Salah, Razwan Mohmed

doi:10.1007/978-3-031-14054-9_23

Saad Hikmat Haji¹⁶,
Karwan Jacksi¹⁷ &
Razwan Mohmed Salah¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1431))

Included in the following conference series:

The International Conference on Innovations in Computing Research

492 Accesses

Abstract

In the era of digitalization, the number of electronic text documents has been rapidly increasing on the Internet. Organizing these documents into meaningful clusters is becoming a necessity by using several methods (i.e., TF-IDF, Word Embedding) and based on documents clustering. Document clustering is the process of dynamically arranging documents into clusters such that the documents contained within a cluster are very similar to those contained inside other clusters. Due to the fact that traditional clustering algorithms do not take semantic relationships between words into account and therefore do not accurately represent the meaning of documents. Semantic information has been widely used to improve the quality of document clusters by grouping documents according to their meaning rather than their keywords. In this paper, twenty-five papers have been systematically reviewed that are published in the last seven years (from 2016 to 2022) linked to semantic similarities which are based on document clustering. Algorithms, similarity measures, tools, and evaluation methods usage have been discussed as well. As result, the survey shows that researchers used different datasets for applying semantic similarity-based clustering regarding the text similarity. Hereby, this paper proposes methods of semantic similarity approach-based clustering that can be used for short text semantic similarity included in online laboratories repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Approach for Document Clustering Using Semantic Similarity and Whale Optimization

Semantic Document Classification Based on Semantic Similarity Computation and Correlation Analysis

Semantic Similarity Analysis of Urdu Documents

References

Mehta, V.: Stamantic clustering: combining statistical and semantic features for clustering of large text datasets. Expert Syst. Appl. 174, 9 (2021)
Article Google Scholar
Jalal, A.A., Ali, B.H.: Text documents clustering using data mining techniques. Int. J. Electr. Comput. Eng. IJECE 11(1), 664 (2021). https://doi.org/10.11591/ijece.v11i1.pp664-670
Article Google Scholar
Haji, S.H., Abdulazeez, A.M., Zeebaree, D.Q., Ahmed, F.Y.H., Zebari, D.A.: The impact of different data mining classification techniques in different datasets. In: 2021 IEEE Symposium on Industrial Electronics and Applications (ISIEA), Langkawi Island, Malaysia, pp. 1–6 (2021). https://doi.org/10.1109/ISIEA51897.2021.9510006
Diallo, B.: Multi-view document clustering based on geometrical similarity measurement. Int. J. Mach. Learn. Cybern. 13, 663–675 (2022). https://doi.org/10.1007/s13042-021-01295-8
Article Google Scholar
Zandieh, P., Shakibapoor, E.: Clustering data text based on semantic. Int. J. Comput. 26(1), 8 (2017)
Google Scholar
Saiyad, N.Y., Prajapati, H.B., Dabhi, V.K.: A survey of document clustering using semantic approach, p. 8 (2016)
Google Scholar
Ali, I., Melton, A.: Semantic-based text document clustering using cognitive semantic learning and graph theory. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 243–247 (2018). https://doi.org/10.1109/ICSC.2018.00042
Polus, M.E., Abbas, T.: Intelligent text clustering based on semantics similarity, p. 7 (2020)
Google Scholar
Ibrahim, R.K., Zeebaree, S.R.M., Jacksi, K., Sadeeq, M.A.M., Shukur, H.M., Alkhayyat, A.: Clustering document based semantic similarity system using TFIDF and k-mean. In: 2021 International Conference on Advanced Computer Applications (ACA), Maysan, Iraq, pp. 28–33 (2021). https://doi.org/10.1109/ACA52198.2021.9626822
Bafna, P., Pramod, D., Vaidya, A.: Document clustering: TF-IDF approach, p. 6 (2016)
Google Scholar
Qona’ah, N., Devi, A.R., Dana, I.M.G.M.: Laboratory clustering using k-means, k-medoids, and model-based clustering. Indones. J. Appl. Stat. 3(1), 64 (2020). https://doi.org/10.13057/ijas.v3i1.40823
Lakshmi, R., Baskar, S.: Efficient text document clustering with new similarity measures. Int. J. Bus. Intell. Data Min. 18, 24 (2021)
Google Scholar
Fatimi, S., El, C., Alaoui, L.: A framework for semantic text clustering. Int. J. Adv. Comput. Sci. Appl. 11(6) (2020). https://doi.org/10.14569/IJACSA.2020.0110657
Alian, M.: Semantic similarity for English and Arabic texts: a review, p. 29 (2020)
Google Scholar
Jacksi, K., Ibrahim, R.K., Zeebaree, S.R., Zebari, R.R., Sadeeq, M.A.: Clustering documents based on semantic similarity using HAC and k-mean algorithms. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 205–210 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436570
Desai, S.S., Laxminarayana, J.A.: WordNet and semantic similarity based approach for document clustering. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, pp. 312–317 (2016). https://doi.org/10.1109/CSITSS.2016.7779377
Mohammed, S.M., Jacksi, K., Zeebaree, S.R.M.: Glove word embedding and DBSCAN algorithms for semantic document clustering. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436540
Radu, R.-G., Radulescu, I.-M., Truica, C.-O., Apostol, E.-S., Mocanu, M.: Clustering documents using the document to vector model for dimensionality reduction. In: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, pp. 1–6 (2020). https://doi.org/10.1109/AQTR49680.2020.9129967
Salih, N.M., Jacksi, K.: Semantic document clustering using k-means algorithm and ward’s method. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436588
Stanchev, L.: Semantic document clustering using a similarity graph. In: 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 1–8 (2016). https://doi.org/10.1109/ICSC.2016.8
Hssina, B., Bouikhalene, B., Merbouha, A.: Evaluation of semantic similarity using vector space model based on textual corpus. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), Beni Mellal, Morocco, pp. 295–300 (2016). https://doi.org/10.1109/CGiV.2016.64
Stanchev, L.: Semantic document clustering using information from WordNet and DBPedia. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 100–107 (2018). https://doi.org/10.1109/ICSC.2018.00023
Banik, P., Gaikwad, S., Awate, A., Shaikh, S., Gunjgur, P., Padiya, P.: Semantic analysis of Wikipedia documents using ontology. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), Pondicherry, pp. 1–6 (2018). https://doi.org/10.1109/ICSCAN.2018.8541162
Zafar, A., Awais, M., Aftab, M.A.: Ontology based document data analysis, p. 7 (2018)
Google Scholar
Wang, S., Koopman, R.: Clustering articles based on semantic similarity. Scientometrics 111(2), 1017–1031 (2017). https://doi.org/10.1007/s11192-017-2298-x
Article Google Scholar
Al-Azzawy, D.S., Al-Rufaye, F.M.L.: Arabic words clustering by using k-means algorithm. In: 2017 Annual Conference on New Trends in Information and Communications Technology Applications (NTICT), Baghdad, Iraq, pp. 263–267 (2017). https://doi.org/10.1109/NTICT.2017.7976098
Blokh, I., Alexandrov, V.: News clustering based on similarity analysis. Procedia Comput. Sci. 122, 715–719 (2017). https://doi.org/10.1016/j.procs.2017.11.428
Article Google Scholar
Afreen, S., Srinivasu, D.B.: Semantic based document clustering using lexical chains, vol. 04, no. 01, p. 7 (2017)
Google Scholar
Jang, J., Lee, Y., Lee, S., Shin, D., Kim, D., Rim, H.: A novel density-based clustering method using word embedding features for dialogue intention recognition. Cluster Comput. 19(4), 2315–2326 (2016). https://doi.org/10.1007/s10586-016-0649-7
Article Google Scholar
Lwin, W.: Impressive approach for documents clustering using semantics relations in feature extraction. In: 2019 the 9th International Workshop on Computer Science and Engineering (2019). https://doi.org/10.18178/wcse.2019.03.007
Rafi, M., Naveed, M., Arshad, W., Rafay, H.: Exploiting document level semantics in document clustering. Int. J. Adv. Comput. Sci. Appl. 7(6) (2016). https://doi.org/10.14569/IJACSA.2016.070660
Rafi, M., Sharif, M.N., Arshad, W., Mohsin, S., Rafay, H.: Multi-layer semantics based document clustering. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, Nîmes France, pp. 1–4, June 2016. https://doi.org/10.1145/2912845.2912880
Singh, K.N., Devi, S.D., Devi, H.M., Mahanta, A.K.: A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int. J. Inf. Manag. Data Insights 2(1), 100061 (2022).https://doi.org/10.1016/j.jjimei.2022.100061
Shan, C., Du, Y.: A web service clustering method based on semantic similarity and multidimensional scaling analysis. Sci. Program. 2021, 1–12 (2021). https://doi.org/10.1155/2021/6661035
Article Google Scholar
Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., Gao, C.: A survey on the techniques, applications, and performance of short text semantic similarity. Concur. Comput. Pract. Exp. 33, 17 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Duhok Polytechnic University, KRA, Dahuk, Iraq
Saad Hikmat Haji
Department of Computer Science, University of Zakho, KRA, Zakho, Iraq
Karwan Jacksi
Department of Computer Science, University of Duhok, KRS, Dahuk, Iraq
Razwan Mohmed Salah

Authors

Saad Hikmat Haji
View author publications
You can also search for this author in PubMed Google Scholar
Karwan Jacksi
View author publications
You can also search for this author in PubMed Google Scholar
Razwan Mohmed Salah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saad Hikmat Haji .

Editor information

Editors and Affiliations

University of Detroit Mercy, Detroit, MI, USA
Kevin Daimi
Kent Institute Australia, Sydney, NSW, Australia
Abeer Al Sadoon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haji, S.H., Jacksi, K., Salah, R.M. (2022). Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-14054-9_23
Published: 11 August 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository

Abstract

Access this chapter

Similar content being viewed by others

An Approach for Document Clustering Using Semantic Similarity and Whale Optimization

Semantic Document Classification Based on Semantic Similarity Computation and Correlation Analysis

Semantic Similarity Analysis of Urdu Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository

Abstract

Access this chapter

Similar content being viewed by others

An Approach for Document Clustering Using Semantic Similarity and Whale Optimization

Semantic Document Classification Based on Semantic Similarity Computation and Correlation Analysis

Semantic Similarity Analysis of Urdu Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation