Abstract
The concept of “near-form words” has existed since the ancient English period (about 450 years ago), yet few mathematical identification algorithms have been applied to these. With the widespread use of English and an increasing number of English words, near-form words have also increased. However, the traditional way of identifying near-form words cannot keep up with this ever-growing language. A mathematical algorithm is therefore needed which can calculate the degree of similarity between words, so that near-form words can then be identified, collected and classified according to appearance similarity, and a specific value can be assigned to these levels of similarity. In related fields, there have been many studies of English synonyms, phonetic words, English sentences and texts. Some algorithms have been used with the aim of studying similarities in word appearance, but these were for hieroglyphics, such as Chinese words, and not for English words. Many similar words can be found in dictionaries or networks which are incomplete due to the outcomes of subjective collection. More importantly, subjective collection methods cannot determine the value of similarities, which highlights the uniqueness and innovation of this research. Among existing research methods, the one used most often involves fuzzy neural networks, which are unstable and inaccurate. A stable and unique mathematical calculation method is therefore needed. In this study, coding methods were used to design an algorithm that could calculate different letter position coefficients and letter appearance coefficients in order to obtain corresponding values. In terms of application, this algorithm can help generate big data on near-form words in English teaching. In terms of English input software, this algorithm can also provide more words to prompt the input method. In the case of text-editing software (such as Microsoft Word), the algorithm can improve error-detection accuracy and suggest suitable alternatives. In the field of artificial intelligence, it can also be used to monitor counterfeit trademark registration in the commodity registration system. Thus, the authors firmly believe that this application will have a wide range of applications in the future.



Similar content being viewed by others
References
Lastra-Díaz JJ, Goikoetxea J, Taieb M (2021) A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. Inf Syst 96:101636
Julián-Iranzo P, Sáenz-Pérez F (2021) Implementing word net measures of lexical semantic similarity in a fuzzy logic programming system. Comput Appl Res 8:2285–2288
Lin L, Xue F, Zhongsheng R (2019) Modified word similarity computation approach based on HowNet. Comput Appl 29(1):217–220
Navigli R, Martelli F (2019) An overview of word and sense similarity. Nat Lang Eng. https://doi.org/10.1017/S1351324919000305
Yan YH, Chien TW (2021) The use of forest plot to identify article similarity and differences in characteristics between journals using medical subject headings terms: a protocol for bibliometric study. Medicine. https://doi.org/10.1097/MD.0000000000024610
Rawte V, Gupta A, Zaki MJ (2020) A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents. Springer, Cham
Xin-Xin X, YL L, Song M (2019) Text similarity calculation with weighted word vector and sentence vector. J Chinese Comput Syst
Liu RL (2019) Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS ONE 10(10):e0142026
Li B, Xu W, Xu Z et al (2021) A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories. Appl Intell 2021:1–17
Cross V, Mokrenko V, Crockett K, et al (2020) Using fuzzy set similarity in sentence similarity measures. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
Guan X, Han J, Liu Z et al (2019) Sentence similarity algorithm based on fused bi-channel dependency matching feature. Int J Patt Recogn Artif Intell. https://doi.org/10.1142/S0218001420500196
Lopez-Gazpio I, Maritxalar M, Lapata M et al (2019) Word N-gram attention models for sentence similarity and inference. Expert Syst Appl 132:1–11
Bar-On A, Dattner E, Braun-Peretz O (2019) Resolving homography: the role of post-Homograph context in reading aloud ambiguous sentences in Hebrew. Appl Psycholinguist 40(6):1–16
Noguchi M, Hirokawa S (2020) Collecting similar words form WWW with soft path. IPSJ Sig Notes ICS 2020:15–20
Xinxin W, Famin Ma (2020) Research on english word similarity algorithm from the perspective of evidence. Mod Sci Instrum 2:5
Xiaoxuan D (2020) On the phenomenon of repeated use of synonyms in Chinese-English translation and its countermeasures. Middle Sch Stud English 44:159
Zhang Ruiyuan (2021) An empirical study on Chinese english learners' acquisition of synonyms. Lanzhou Jiaotong University
Ali W, Tian W, Din SU et al (2021) Classical and modern face recognition approaches: a complete review. Multimed Tools Appl 80(14):1–56
Li W, Li J, Cao D et al (2021) Neural mechanism of noise affecting face recognition. Neuroscience. https://doi.org/10.1016/j.neuroscience.2021.06.017
Gewei ZH, Haidong H (2021) Text classification and recognition based on local convolution neural network algorithm. Microcomput Appl 37(08):136–139
Singh H, Lone Y A (2020) Fuzzy Neural Networks
Cheng C, Zhang X Y, Shao X H, et al (2017) Handwritten Chinese character recognition by joint classification and similarity ranking. IEEE. International Conference on Frontiers in Handwriting Recognition.
Yang H, Qinhong T, Xinlan S (2019) Similar handwritten Chinese character recognition based on eight direction gradient feature and CNN. Inf Commun 4:4
Liu, Ming, Rus et al (2018) Automatic Chinese character similarity measurement. Web Intelligence & Agent System
Bouibed ML, Nemmour H, Chibani Y (2021) SVM-Based Writer Retrieval System in Handwritten Document Images. Multimedia Tools and Applications 2021:1–23
Chahi A, Merabet YE, Ruichek Y et al (2020) Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2019.103459
Fangzhou L (2019) Handwritten letter recognition based on KNN algorithm. Natl Circ Econ 3:3
Chen CM, Chen L, Gan W, Qiu L, Ding W (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229
Chen C, Huang Y, Wang K, Kumari S, Wu M (2020) A secure authenticated and key exchange scheme for fog computing. Enterp Inf Syst 15(9):1200–1215. https://doi.org/10.1080/17517575.2020.1712746
Chen X, Li A, Zeng X, Guo W, Huang G (2015) Runtime model based approach to IoT application development. Frontiers Comput Sci 9(4):540–553
Chen X, Lin J, Ma Y, Lin B, Wang H, Huang G (2019) Self-adaptive resource allocation for cloud-based software services based on progressive QoS prediction model. Sci China Inf Sci 62(11):219101
Huang G, Xu M, Lin X, Liu Y, Ma Y, Pushp S, Liu X (2017) Shuffledog: characterizing and adapting user-perceived latency of android apps. IEEE Trans Mob Comput 16(10):2913–2926
Lin B, Huang Y, Zhang J, Hu J, Chen X, Li J (2020) Cost-driven offloading for DNN-based applications over cloud, edge and end devices. IEEE Trans Indus Inf 16(8):5456–5466
Ye O, Huang P, Zhang Z, Zheng Y et al (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3088519
Liyong F, Zechao L, Qiaolin Ye et al (2020) Learning robust discriminant subspace based on joint L2, p- and L2, s-norm distance metrics. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3027588
Qiaolin Y, Zechao L, Liyong F et al (2019) Nonpeaked discriminant analysis. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832
Acknowledgements
This study was supported by the Dongguan Science and Technology for Social Development Programme in 2020 (2020507156694), the special for key fields in colleges and universities in Guangdong Province (2021zdzx1092) and the Science and Technology Research Project in the Department of Education of Jiangxi Province under Grant GJJ191599.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ruan, C., Qu, W., Luo, J. et al. An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters. J Supercomput 78, 15974–15994 (2022). https://doi.org/10.1007/s11227-022-04511-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04511-6