An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters

Ruan, Chunyan; Qu, Wen; Luo, Jianfeng; Lu, Kuan-Han

doi:10.1007/s11227-022-04511-6

An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters

Published: 29 April 2022

Volume 78, pages 15974–15994, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Chunyan Ruan¹,
Wen Qu²,
Jianfeng Luo³ &
…
Kuan-Han Lu⁴

197 Accesses
Explore all metrics

Abstract

The concept of “near-form words” has existed since the ancient English period (about 450 years ago), yet few mathematical identification algorithms have been applied to these. With the widespread use of English and an increasing number of English words, near-form words have also increased. However, the traditional way of identifying near-form words cannot keep up with this ever-growing language. A mathematical algorithm is therefore needed which can calculate the degree of similarity between words, so that near-form words can then be identified, collected and classified according to appearance similarity, and a specific value can be assigned to these levels of similarity. In related fields, there have been many studies of English synonyms, phonetic words, English sentences and texts. Some algorithms have been used with the aim of studying similarities in word appearance, but these were for hieroglyphics, such as Chinese words, and not for English words. Many similar words can be found in dictionaries or networks which are incomplete due to the outcomes of subjective collection. More importantly, subjective collection methods cannot determine the value of similarities, which highlights the uniqueness and innovation of this research. Among existing research methods, the one used most often involves fuzzy neural networks, which are unstable and inaccurate. A stable and unique mathematical calculation method is therefore needed. In this study, coding methods were used to design an algorithm that could calculate different letter position coefficients and letter appearance coefficients in order to obtain corresponding values. In terms of application, this algorithm can help generate big data on near-form words in English teaching. In terms of English input software, this algorithm can also provide more words to prompt the input method. In the case of text-editing software (such as Microsoft Word), the algorithm can improve error-detection accuracy and suggest suitable alternatives. In the field of artificial intelligence, it can also be used to monitor counterfeit trademark registration in the commodity registration system. Thus, the authors firmly believe that this application will have a wide range of applications in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Word Similarity Computation with Extreme-Similar Method

Analyzing the Content of Business Documents Recognized with a Large Number of Errors Using Modified Levenshtein Distance

A Similarity Algorithm Based on the Generality and Individuality of Words

References

Lastra-Díaz JJ, Goikoetxea J, Taieb M (2021) A large reproducible benchmark of ontology-based methods and word embeddings for word similarity. Inf Syst 96:101636
Article Google Scholar
Julián-Iranzo P, Sáenz-Pérez F (2021) Implementing word net measures of lexical semantic similarity in a fuzzy logic programming system. Comput Appl Res 8:2285–2288
Google Scholar
Lin L, Xue F, Zhongsheng R (2019) Modified word similarity computation approach based on HowNet. Comput Appl 29(1):217–220
MATH Google Scholar
Navigli R, Martelli F (2019) An overview of word and sense similarity. Nat Lang Eng. https://doi.org/10.1017/S1351324919000305
Article Google Scholar
Yan YH, Chien TW (2021) The use of forest plot to identify article similarity and differences in characteristics between journals using medical subject headings terms: a protocol for bibliometric study. Medicine. https://doi.org/10.1097/MD.0000000000024610
Article Google Scholar
Rawte V, Gupta A, Zaki MJ (2020) A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents. Springer, Cham
Google Scholar
Xin-Xin X, YL L, Song M (2019) Text similarity calculation with weighted word vector and sentence vector. J Chinese Comput Syst
Liu RL (2019) Passage-based bibliographic coupling: an inter-article similarity measure for biomedical articles. PLoS ONE 10(10):e0142026
Google Scholar
Li B, Xu W, Xu Z et al (2021) A two-domain coordinated sentence similarity scheme for question-answering robots regarding unpredictable outliers and non-orthogonal categories. Appl Intell 2021:1–17
Google Scholar
Cross V, Mokrenko V, Crockett K, et al (2020) Using fuzzy set similarity in sentence similarity measures. IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)
Guan X, Han J, Liu Z et al (2019) Sentence similarity algorithm based on fused bi-channel dependency matching feature. Int J Patt Recogn Artif Intell. https://doi.org/10.1142/S0218001420500196
Article Google Scholar
Lopez-Gazpio I, Maritxalar M, Lapata M et al (2019) Word N-gram attention models for sentence similarity and inference. Expert Syst Appl 132:1–11
Article Google Scholar
Bar-On A, Dattner E, Braun-Peretz O (2019) Resolving homography: the role of post-Homograph context in reading aloud ambiguous sentences in Hebrew. Appl Psycholinguist 40(6):1–16
Article Google Scholar
Noguchi M, Hirokawa S (2020) Collecting similar words form WWW with soft path. IPSJ Sig Notes ICS 2020:15–20
Google Scholar
Xinxin W, Famin Ma (2020) Research on english word similarity algorithm from the perspective of evidence. Mod Sci Instrum 2:5
Google Scholar
Xiaoxuan D (2020) On the phenomenon of repeated use of synonyms in Chinese-English translation and its countermeasures. Middle Sch Stud English 44:159
Google Scholar
Zhang Ruiyuan (2021) An empirical study on Chinese english learners' acquisition of synonyms. Lanzhou Jiaotong University
Ali W, Tian W, Din SU et al (2021) Classical and modern face recognition approaches: a complete review. Multimed Tools Appl 80(14):1–56
Google Scholar
Li W, Li J, Cao D et al (2021) Neural mechanism of noise affecting face recognition. Neuroscience. https://doi.org/10.1016/j.neuroscience.2021.06.017
Article Google Scholar
Gewei ZH, Haidong H (2021) Text classification and recognition based on local convolution neural network algorithm. Microcomput Appl 37(08):136–139
Google Scholar
Singh H, Lone Y A (2020) Fuzzy Neural Networks
Cheng C, Zhang X Y, Shao X H, et al (2017) Handwritten Chinese character recognition by joint classification and similarity ranking. IEEE. International Conference on Frontiers in Handwriting Recognition.
Yang H, Qinhong T, Xinlan S (2019) Similar handwritten Chinese character recognition based on eight direction gradient feature and CNN. Inf Commun 4:4
Google Scholar
Liu, Ming, Rus et al (2018) Automatic Chinese character similarity measurement. Web Intelligence & Agent System
Bouibed ML, Nemmour H, Chibani Y (2021) SVM-Based Writer Retrieval System in Handwritten Document Images. Multimedia Tools and Applications 2021:1–23
Google Scholar
Chahi A, Merabet YE, Ruichek Y et al (2020) Cross multi-scale locally encoded gradient patterns for off-line text-independent writer identification. Eng Appl Artif Intell. https://doi.org/10.1016/j.engappai.2019.103459
Article Google Scholar
Fangzhou L (2019) Handwritten letter recognition based on KNN algorithm. Natl Circ Econ 3:3
Google Scholar
Chen CM, Chen L, Gan W, Qiu L, Ding W (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229
Article MathSciNet Google Scholar
Chen C, Huang Y, Wang K, Kumari S, Wu M (2020) A secure authenticated and key exchange scheme for fog computing. Enterp Inf Syst 15(9):1200–1215. https://doi.org/10.1080/17517575.2020.1712746
Article Google Scholar
Chen X, Li A, Zeng X, Guo W, Huang G (2015) Runtime model based approach to IoT application development. Frontiers Comput Sci 9(4):540–553
Article Google Scholar
Chen X, Lin J, Ma Y, Lin B, Wang H, Huang G (2019) Self-adaptive resource allocation for cloud-based software services based on progressive QoS prediction model. Sci China Inf Sci 62(11):219101
Article Google Scholar
Huang G, Xu M, Lin X, Liu Y, Ma Y, Pushp S, Liu X (2017) Shuffledog: characterizing and adapting user-perceived latency of android apps. IEEE Trans Mob Comput 16(10):2913–2926
Article Google Scholar
Lin B, Huang Y, Zhang J, Hu J, Chen X, Li J (2020) Cost-driven offloading for DNN-based applications over cloud, edge and end devices. IEEE Trans Indus Inf 16(8):5456–5466
Article Google Scholar
Ye O, Huang P, Zhang Z, Zheng Y et al (2021) Multiview learning with robust double-sided twin SVM. IEEE Trans Cybern. https://doi.org/10.1109/TCYB.2021.3088519
Article Google Scholar
Liyong F, Zechao L, Qiaolin Ye et al (2020) Learning robust discriminant subspace based on joint L2, p- and L2, s-norm distance metrics. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2020.3027588
Article Google Scholar
Qiaolin Y, Zechao L, Liyong F et al (2019) Nonpeaked discriminant analysis. IEEE Trans Neural Netw Learn Syst 30(12):3818–3832
Article MathSciNet Google Scholar

Download references

Acknowledgements

This study was supported by the Dongguan Science and Technology for Social Development Programme in 2020 (2020507156694), the special for key fields in colleges and universities in Guangdong Province (2021zdzx1092) and the Science and Technology Research Project in the Department of Education of Jiangxi Province under Grant GJJ191599.

Author information

Authors and Affiliations

School of Foreign Languages, Dongguan City College, Dongguan, 523419, Guangdong, China
Chunyan Ruan
Department of Information Engineering, Gannan University of Science and Technology, Ganzhou, 341000, Jiangxi, China
Wen Qu
Department of Computer Engineering, Dongguan Polytechnic, Dongguan, 523808, Guangdong, China
Jianfeng Luo
Computer Science &Information, Management, Soochow University, Taipei, 11490, Taiwan
Kuan-Han Lu

Authors

Chunyan Ruan
View author publications
You can also search for this author inPubMed Google Scholar
Wen Qu
View author publications
You can also search for this author inPubMed Google Scholar
Jianfeng Luo
View author publications
You can also search for this author inPubMed Google Scholar
Kuan-Han Lu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Wen Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruan, C., Qu, W., Luo, J. et al. An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters. J Supercomput 78, 15974–15994 (2022). https://doi.org/10.1007/s11227-022-04511-6

Download citation

Accepted: 01 April 2022
Published: 29 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11227-022-04511-6

Keywords

Part of a collection:

SI - Data Mining for IoT in Mobile Edge computing

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An algorithm for calculating the degree of similarity between English words through the different position and appearance coefficients of letters

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Word Similarity Computation with Extreme-Similar Method

Analyzing the Content of Business Documents Recognized with a Large Number of Errors Using Modified Levenshtein Distance

A Similarity Algorithm Based on the Generality and Individuality of Words

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now