Skip to main content
Log in

An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The concept of “near-form words” has existed since the ancient English period (about 450 years ago), yet few mathematical identification algorithms have been applied to these. With the widespread use of English and an increasing number of English words, near-form words have also increased. However, the traditional way of identifying near-form words cannot keep up with this ever-growing language. A mathematical algorithm is therefore needed which can calculate the degree of similarity between words, so that near-form words can then be identified, collected and classified according to appearance similarity, and a specific value can be assigned to these levels of similarity. In related fields, there have been many studies of English synonyms, phonetic words, English sentences and texts. Some algorithms have been used with the aim of studying similarities in word appearance, but these were for hieroglyphics, such as Chinese words, and not for English words. Many similar words can be found in dictionaries or networks which are incomplete due to the outcomes of subjective collection. More importantly, subjective collection methods cannot determine the value of similarities, which highlights the uniqueness and innovation of this research. Among existing research methods, the one used most often involves fuzzy neural networks, which are unstable and inaccurate. A stable and unique mathematical calculation method is therefore needed. In this study, coding methods were used to design an algorithm that could calculate different letter position coefficients and letter appearance coefficients in order to obtain corresponding values. In terms of application, this algorithm can help generate big data on near-form words in English teaching. In terms of English input software, this algorithm can also provide more words to prompt the input method. In the case of text-editing software (such as Microsoft Word), the algorithm can improve error-detection accuracy and suggest suitable alternatives. In the field of artificial intelligence, it can also be used to monitor counterfeit trademark registration in the commodity registration system. Thus, the authors firmly believe that this application will have a wide range of applications in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Lastra-Díaz JJ, Goikoetxea J, Taieb M (2021) A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. Inf Syst 96:101636

    Article  Google Scholar 

  2. Julián-Iranzo P, Sáenz-Pérez F (2021) Implementing word net measures of lexical semantic similarity in a fuzzy logic programming system. Comput Appl Res 8:2285–2288

    Google Scholar 

  3. Lin L, Xue F, Zhongsheng R (2019) Modified word similarity computation approach based on HowNet. Comput Appl 29(1):217–220

    MATH  Google Scholar 

  4. Navigli R, Martelli F (2019) An overview of word and sense similarity. Nat Lang Eng. https://doi.org/10.1017/S1351324919000305

    Article  Google Scholar 

  5. Yan YH, Chien TW (2021) The use of forest plot to identify article similarity and differences in characteristics between journals using medical subject headings terms: a protocol for bibliometric study. Medicine. https://doi.org/10.1097/MD.0000000000024610

    Article  Google Scholar 

  6. Rawte V, Gupta A, Zaki MJ (2020) A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents. Springer, Cham

    Google Scholar 

  7. Xin-Xin X, YL L, Song M (2019) Text similarity calculation with weighted word vector and sentence vector. J Chinese Comput Syst

  8. Liu RL (2019) Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS ONE 10(10):e0142026

    Google Scholar 

  9. Li B, Xu W, Xu Z et al (2021) A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories. Appl Intell 2021:1–17

    Google Scholar 

  10. Cross V, Mokrenko V, Crockett K, et al (2020) Using fuzzy set similarity in sentence similarity measures. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

  11. Guan X, Han J, Liu Z et al (2019) Sentence similarity algorithm based on fused bi-channel dependency matching feature. Int J Patt Recogn Artif Intell. https://doi.org/10.1142/S0218001420500196

    Article  Google Scholar 

  12. Lopez-Gazpio I, Maritxalar M, Lapata M et al (2019) Word N-gram attention models for sentence similarity and inference. Expert Syst Appl 132:1–11

    Article  Google Scholar 

  13. Bar-On A, Dattner E, Braun-Peretz O (2019) Resolving homography: the role of post-Homograph context in reading aloud ambiguous sentences in Hebrew. Appl Psycholinguist 40(6):1–16

    Article  Google Scholar 

  14. Noguchi M, Hirokawa S (2020) Collecting similar words form WWW with soft path. IPSJ Sig Notes ICS 2020:15–20

    Google Scholar 

  15. Xinxin W, Famin Ma (2020) Research on english word similarity algorithm from the perspective of evidence. Mod Sci Instrum 2:5

    Google Scholar 

  16. Xiaoxuan D (2020) On the phenomenon of repeated use of synonyms in Chinese-English translation and its countermeasures. Middle Sch Stud English 44:159

    Google Scholar 

  17. Zhang Ruiyuan (2021) An empirical study on Chinese english learners' acquisition of synonyms. Lanzhou Jiaotong University

  18. Ali W, Tian W, Din SU et al (2021) Classical and modern face recognition approaches: a complete review. Multimed Tools Appl 80(14):1–56

    Google Scholar 

  19. Li W, Li J, Cao D et al (2021) Neural mechanism of noise affecting face recognition. Neuroscience. https://doi.org/10.1016/j.neuroscience.2021.06.017

    Article  Google Scholar 

  20. Gewei ZH, Haidong H (2021) Text classification and recognition based on local convolution neural network algorithm. Microcomput Appl 37(08):136–139

    Google Scholar 

  21. Singh H, Lone Y A (2020) Fuzzy Neural Networks

  22. Cheng C, Zhang X Y, Shao X H, et al (2017) Handwritten Chinese character recognition by joint classification and similarity ranking. IEEE. International Conference on Frontiers in Handwriting Recognition.

  23. Yang H, Qinhong T, Xinlan S (2019) Similar handwritten Chinese character recognition based on eight direction gradient feature and CNN. Inf Commun 4:4

    Google Scholar 

  24. Liu, Ming, Rus et al (2018) Automatic Chinese character similarity measurement. Web Intelligence & Agent System

  25. Bouibed ML, Nemmour H, Chibani Y (2021) SVM-Based Writer Retrieval System in Handwritten Document Images. Multimedia Tools and Applications 2021:1–23

    Google Scholar 

  26. Chahi A, Merabet YE, Ruichek Y et al (2020) Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2019.103459

    Article  Google Scholar 

  27. Fangzhou L (2019) Handwritten letter recognition based on KNN algorithm. Natl Circ Econ 3:3

    Google Scholar 

  28. Chen CM, Chen L, Gan W, Qiu L, Ding W (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229

    Article  MathSciNet  Google Scholar 

  29. Chen C, Huang Y, Wang K, Kumari S, Wu M (2020) A secure authenticated and key exchange scheme for fog computing. Enterp Inf Syst 15(9):1200–1215. https://doi.org/10.1080/17517575.2020.1712746

    Article  Google Scholar 

  30. Chen X, Li A, Zeng X, Guo W, Huang G (2015) Runtime model based approach to IoT application development. Frontiers Comput Sci 9(4):540–553

    Article  Google Scholar 

  31. Chen X, Lin J, Ma Y, Lin B, Wang H, Huang G (2019) Self-adaptive resource allocation for cloud-based software services based on progressive QoS prediction model. Sci China Inf Sci 62(11):219101

    Article  Google Scholar 

  32. Huang G, Xu M, Lin X, Liu Y, Ma Y, Pushp S, Liu X (2017) Shuffledog: characterizing and adapting user-perceived latency of android apps. IEEE Trans Mob Comput 16(10):2913–2926

    Article  Google Scholar 

  33. Lin B, Huang Y, Zhang J, Hu J, Chen X, Li J (2020) Cost-driven offloading for DNN-based applications over cloud, edge and end devices. IEEE Trans Indus Inf 16(8):5456–5466

    Article  Google Scholar 

  34. Ye O, Huang P, Zhang Z, Zheng Y et al (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3088519

    Article  Google Scholar 

  35. Liyong F, Zechao L, Qiaolin Ye et al (2020) Learning robust discriminant subspace based on joint L2, p- and L2, s-norm distance metrics. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3027588

    Article  Google Scholar 

  36. Qiaolin Y, Zechao L, Liyong F et al (2019) Nonpeaked discriminant analysis. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This study was supported by the Dongguan Science and Technology for Social Development Programme in 2020 (2020507156694), the special for key fields in colleges and universities in Guangdong Province (2021zdzx1092) and the Science and Technology Research Project in the Department of Education of Jiangxi Province under Grant GJJ191599.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wen Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ruan, C., Qu, W., Luo, J. et al. An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters. J Supercomput 78, 15974–15994 (2022). https://doi.org/10.1007/s11227-022-04511-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04511-6

Keywords

Navigation