Abstract
In the era of digitalization, the number of electronic text documents has been rapidly increasing on the Internet. Organizing these documents into meaningful clusters is becoming a necessity by using several methods (i.e., TF-IDF, Word Embedding) and based on documents clustering. Document clustering is the process of dynamically arranging documents into clusters such that the documents contained within a cluster are very similar to those contained inside other clusters. Due to the fact that traditional clustering algorithms do not take semantic relationships between words into account and therefore do not accurately represent the meaning of documents. Semantic information has been widely used to improve the quality of document clusters by grouping documents according to their meaning rather than their keywords. In this paper, twenty-five papers have been systematically reviewed that are published in the last seven years (from 2016 to 2022) linked to semantic similarities which are based on document clustering. Algorithms, similarity measures, tools, and evaluation methods usage have been discussed as well. As result, the survey shows that researchers used different datasets for applying semantic similarity-based clustering regarding the text similarity. Hereby, this paper proposes methods of semantic similarity approach-based clustering that can be used for short text semantic similarity included in online laboratories repository.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mehta, V.: Stamantic clustering: combining statistical and semantic features for clustering of large text datasets. Expert Syst. Appl. 174, 9 (2021)
Jalal, A.A., Ali, B.H.: Text documents clustering using data mining techniques. Int. J. Electr. Comput. Eng. IJECE 11(1), 664 (2021). https://doi.org/10.11591/ijece.v11i1.pp664-670
Haji, S.H., Abdulazeez, A.M., Zeebaree, D.Q., Ahmed, F.Y.H., Zebari, D.A.: The impact of different data mining classification techniques in different datasets. In: 2021 IEEE Symposium on Industrial Electronics and Applications (ISIEA), Langkawi Island, Malaysia, pp. 1–6 (2021). https://doi.org/10.1109/ISIEA51897.2021.9510006
Diallo, B.: Multi-view document clustering based on geometrical similarity measurement. Int. J. Mach. Learn. Cybern. 13, 663–675 (2022). https://doi.org/10.1007/s13042-021-01295-8
Zandieh, P., Shakibapoor, E.: Clustering data text based on semantic. Int. J. Comput. 26(1), 8 (2017)
Saiyad, N.Y., Prajapati, H.B., Dabhi, V.K.: A survey of document clustering using semantic approach, p. 8 (2016)
Ali, I., Melton, A.: Semantic-based text document clustering using cognitive semantic learning and graph theory. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 243–247 (2018). https://doi.org/10.1109/ICSC.2018.00042
Polus, M.E., Abbas, T.: Intelligent text clustering based on semantics similarity, p. 7 (2020)
Ibrahim, R.K., Zeebaree, S.R.M., Jacksi, K., Sadeeq, M.A.M., Shukur, H.M., Alkhayyat, A.: Clustering document based semantic similarity system using TFIDF and k-mean. In: 2021 International Conference on Advanced Computer Applications (ACA), Maysan, Iraq, pp. 28–33 (2021). https://doi.org/10.1109/ACA52198.2021.9626822
Bafna, P., Pramod, D., Vaidya, A.: Document clustering: TF-IDF approach, p. 6 (2016)
Qona’ah, N., Devi, A.R., Dana, I.M.G.M.: Laboratory clustering using k-means, k-medoids, and model-based clustering. Indones. J. Appl. Stat. 3(1), 64 (2020). https://doi.org/10.13057/ijas.v3i1.40823
Lakshmi, R., Baskar, S.: Efficient text document clustering with new similarity measures. Int. J. Bus. Intell. Data Min. 18, 24 (2021)
Fatimi, S., El, C., Alaoui, L.: A framework for semantic text clustering. Int. J. Adv. Comput. Sci. Appl. 11(6) (2020). https://doi.org/10.14569/IJACSA.2020.0110657
Alian, M.: Semantic similarity for English and Arabic texts: a review, p. 29 (2020)
Jacksi, K., Ibrahim, R.K., Zeebaree, S.R., Zebari, R.R., Sadeeq, M.A.: Clustering documents based on semantic similarity using HAC and k-mean algorithms. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 205–210 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436570
Desai, S.S., Laxminarayana, J.A.: WordNet and semantic similarity based approach for document clustering. In: 2016 International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bengaluru, India, pp. 312–317 (2016). https://doi.org/10.1109/CSITSS.2016.7779377
Mohammed, S.M., Jacksi, K., Zeebaree, S.R.M.: Glove word embedding and DBSCAN algorithms for semantic document clustering. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436540
Radu, R.-G., Radulescu, I.-M., Truica, C.-O., Apostol, E.-S., Mocanu, M.: Clustering documents using the document to vector model for dimensionality reduction. In: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR), Cluj-Napoca, Romania, pp. 1–6 (2020). https://doi.org/10.1109/AQTR49680.2020.9129967
Salih, N.M., Jacksi, K.: Semantic document clustering using k-means algorithm and ward’s method. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), Duhok, Iraq, pp. 1–6 (2020). https://doi.org/10.1109/ICOASE51841.2020.9436588
Stanchev, L.: Semantic document clustering using a similarity graph. In: 2016 IEEE Tenth International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 1–8 (2016). https://doi.org/10.1109/ICSC.2016.8
Hssina, B., Bouikhalene, B., Merbouha, A.: Evaluation of semantic similarity using vector space model based on textual corpus. In: 2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV), Beni Mellal, Morocco, pp. 295–300 (2016). https://doi.org/10.1109/CGiV.2016.64
Stanchev, L.: Semantic document clustering using information from WordNet and DBPedia. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, pp. 100–107 (2018). https://doi.org/10.1109/ICSC.2018.00023
Banik, P., Gaikwad, S., Awate, A., Shaikh, S., Gunjgur, P., Padiya, P.: Semantic analysis of Wikipedia documents using ontology. In: 2018 IEEE International Conference on System, Computation, Automation and Networking (ICSCA), Pondicherry, pp. 1–6 (2018). https://doi.org/10.1109/ICSCAN.2018.8541162
Zafar, A., Awais, M., Aftab, M.A.: Ontology based document data analysis, p. 7 (2018)
Wang, S., Koopman, R.: Clustering articles based on semantic similarity. Scientometrics 111(2), 1017–1031 (2017). https://doi.org/10.1007/s11192-017-2298-x
Al-Azzawy, D.S., Al-Rufaye, F.M.L.: Arabic words clustering by using k-means algorithm. In: 2017 Annual Conference on New Trends in Information and Communications Technology Applications (NTICT), Baghdad, Iraq, pp. 263–267 (2017). https://doi.org/10.1109/NTICT.2017.7976098
Blokh, I., Alexandrov, V.: News clustering based on similarity analysis. Procedia Comput. Sci. 122, 715–719 (2017). https://doi.org/10.1016/j.procs.2017.11.428
Afreen, S., Srinivasu, D.B.: Semantic based document clustering using lexical chains, vol. 04, no. 01, p. 7 (2017)
Jang, J., Lee, Y., Lee, S., Shin, D., Kim, D., Rim, H.: A novel density-based clustering method using word embedding features for dialogue intention recognition. Cluster Comput. 19(4), 2315–2326 (2016). https://doi.org/10.1007/s10586-016-0649-7
Lwin, W.: Impressive approach for documents clustering using semantics relations in feature extraction. In: 2019 the 9th International Workshop on Computer Science and Engineering (2019). https://doi.org/10.18178/wcse.2019.03.007
Rafi, M., Naveed, M., Arshad, W., Rafay, H.: Exploiting document level semantics in document clustering. Int. J. Adv. Comput. Sci. Appl. 7(6) (2016). https://doi.org/10.14569/IJACSA.2016.070660
Rafi, M., Sharif, M.N., Arshad, W., Mohsin, S., Rafay, H.: Multi-layer semantics based document clustering. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, Nîmes France, pp. 1–4, June 2016. https://doi.org/10.1145/2912845.2912880
Singh, K.N., Devi, S.D., Devi, H.M., Mahanta, A.K.: A novel approach for dimension reduction using word embedding: an enhanced text classification approach. Int. J. Inf. Manag. Data Insights 2(1), 100061 (2022).https://doi.org/10.1016/j.jjimei.2022.100061
Shan, C., Du, Y.: A web service clustering method based on semantic similarity and multidimensional scaling analysis. Sci. Program. 2021, 1–12 (2021). https://doi.org/10.1155/2021/6661035
Han, M., Zhang, X., Yuan, X., Jiang, J., Yun, W., Gao, C.: A survey on the techniques, applications, and performance of short text semantic similarity. Concur. Comput. Pract. Exp. 33, 17 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Haji, S.H., Jacksi, K., Salah, R.M. (2022). Systematic Review for Selecting Methods of Document Clustering on Semantic Similarity of Online Laboratories Repository. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the ICR’22 International Conference on Innovations in Computing Research. ICR 2022. Advances in Intelligent Systems and Computing, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-031-14054-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-14054-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14053-2
Online ISBN: 978-3-031-14054-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)