research-article

State-of-art: text similarity computing

Authors:

Xiuli DiaoAuthors Info & Claims

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

Pages 33 - 37

https://doi.org/10.1145/3290420.3290473

Published: 02 November 2018 Publication History

Abstract

In recent years, there have been extensive studies and rapid progresses in text similarity computing that is one of the host and important techniques in many NLP applications. This paper first introduces the background, the basic computing process, the related resources and the techniques of text similarity computing. By comparing several typical models, three key issues about text similarity computing are addressed in details which include text representation model, the similarity calculation and the quality evaluation. The typical applications of text similarity computing are addressed. Finally, the difficulties to compute the text similarity and many future research directions are discussed.

References

[1]

Lin, D.K.: An Information Theoretic Definition of Similarity. In: Fifteenth International Conference on Machine Learning, pp. 296--304(1998)

Digital Library

[2]

Atoum, I., Otoom, A.: Efficient Hybrid Semantic Text Similarity using Wordnet and a Corpus. International Journal of Advanced Computer Science & Applications.vol. 7, no. 9, pp. 124--130(2016)

[3]

Dong, Z.D., Dong, Q.: HowNet. http://www.keenage.com (1999)

[4]

Jacinto, C.: User-Driven Ontology Learning from Structured Data. In: 11th International Conference on Computer and Information Science, pp. 184 -- 189(2012)

Digital Library

[5]

Xu, L.H., Sun S.T., Wang Q.: Text Similarity Algorithm based on Semantic Vector Space Model. In: IEEE/ACIS International Conference on Computer & Information Science. pp. 1--4(2016)

[6]

Adebayo K., Caro, L.D., Boella, G.: A Multi-Feature Approach to Semantic Text Similarity. In: International Workshop on Semantic Evaluation, pp. 718--725(2016)

[7]

Kashyap, A., Han, L., Yus, R., Sleeman, J.: Robust Semantic Text Similarity using LSA, Machine Learning, and Linguistic Resources. Language Resources & Evaluation, vol. 50, no. 1, pp. 125--131(2016)

Digital Library

[8]

Chen, W.L., Zhu, J.B., Zhu, M.H., Yao, T.S.: Text Representation Using Domain Dictionary. Journal of Computer Research and Development. vol. 42, no. 12, pp. 2155--2160(2005)

[9]

Mohammad O.N., Feras, A.M., Eman, A. M.: Improving the User Query for the Boolean Model Using Genetic Algorithms. International Journal of Computer Science Issues, vol. 8, issue 5, pp. 66--70(2011)

[10]

Zhao, Y.H., Shi X.F.: The Application of Vector Space Model in the Information Retrieval System. Advances in Intelligent and Soft Computing, vol. 162, pp. 43--49(2012)

[11]

Roberson, S.E. Sparck, J.K.: Relevance Weighting of Search Terms. Journal of the American Society for Information Science, vol. 27, no. 3, pp. 129--146(1976)

[12]

Wang, K., Thrasher, C., Viegas, E., Li, X., Hsu, B.J.: An Overview of Microsoft Web N-gram Corpus and Applications, In: NAACL HLT 2010, pp. 45--48(2010)

Digital Library

[13]

Chen, D.: Relevance Calculation of Web Text Based on Lexical Cohesion. Master's Thesis. Harbin Institute of Technology(2007)

[14]

Ramachandran, L.: Determining Degree of Relevance of Reviews Using a Graph Based Text Representation. In: 23rd IEEE International Conference on Tools with Artificial Intelligence, pp. 442 -- 445(2011)

Digital Library

[15]

Zhou, Z.T., Bu, D.B., Cheng, X.Q.: Towards Graph-based Text Representation. Journal of Chinese Information Processing. vol. 19, no. 2, pp. 36--43(2005)

[16]

Pab, C., Miriam, F., David V.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering, vol.19, no.2, pp.261--272(2007)

Digital Library

[17]

Mrabet Y., Kilicoglu H.: TextFlow: A Text Similarity Measure based on Continuous Sequences. In: ACL2017, pp. 763--772(2017)

[18]

Wei, J., Rohini, K. S.: Graph-based Text Representation and Knowledge Discovery. In: ACM symposium on Applied computing on Information access and retrieval, pp. 807-- 811, Seoul, Korea (2007).

Digital Library

[19]

Tuukka, R., Eero, H.: A Method for Determining Ontology-Based Semantic Relevance. Database and Expert Systems Applications, vol. 4653, pp. 680--688(2007)

Digital Library

[20]

Wang, J.: Research on Ontology-Based Semantic Information Retrieval. Ph. Degree thesis. University of Science and Technology of China(2006)

[21]

Zheng, H.T., Kang, B.Y., Kim. H.G.: Exploiting Noun Phrases and Semantic Relationships for Text Document Clustering. Information Sciences, vol. 179, issue 13, pp. 2249--2262(2009)

Digital Library

[22]

Smadi, M., Jaradat, Z., Ayyoub M., Jararweh, Y.: Paraphrase Identification and Semantic Text Similarity Analysis in Arabic news Tweets Using Lexical, Syntactic, and Semantic Features. Information Processing & Management. vol. 53, no. 3, pp. 640--652(2017)

Digital Library

[23]

Huang, L., Milne D., Frank E.: Learning a Concept-based Document Similarity Measure. Journal of the American Society for Information Science and Technology. vol.63, issue 8, pp.1593--1608(2012)

Digital Library

[24]

Gao, M.T., Wang, Z. O.: Document Similarity Strategy Based on Document Index Graph Model. Computer Engineering. vol. 34, no. 7., pp. 19--22(2008)

[25]

Shishehchi, S. Review of Personalized Recommendation Techniques for Learners in E-learning Systems. In: International Conference on Semantic Technology and Information Retrieval. pp. 277--281(2011)

[26]

Neculoiu, P., Versteegh, M., Rotaru, M.: Learning Text Similarity with Siamese Recurrent Networks. In: 1st Workshop on Representation Learning for NLP, pp.148--157. Berlin, Germany (2016)

[27]

Al-Anzi, F.S., Abuzeina, D.: Toward an Enhanced Arabic Text Classification using Cosine Similarity and Latent Semantic Indexing. Journal of King Saud University. vol. 29, issue 2, pp. 189--195(2017)

[28]

Hong, Y., Zhang, Y., Fan, J.L., Liu, T.: Chinese Topic Link Detection based on Semantic Domain Language Model. Journal of Software. vol. 19, no. 9, pp. 2265--2275(2008)

[29]

Zhang, X. M., Li, Z.J., Chao, W.H.: Research of Automatic Topic Detection Based on Incremental Clustering. Journal of Software. vol. 23, no. 6, pp. 1578--1587(2012)

[30]

Lin, Y., Lin, H.F., Zhang, P.: A Learning to Rank Approach based on Ranking Positions. Journal of Shandong University.vol. 42, no. 1, pp. 19--24(2012)

[31]

Song, W.P.: Applications of Short Text Similarity Assessment in User-interactive Question Answering. Ph. Degree thesis. University of Science and Technology of China(2010)

[32]

Li X.F.: The Research and Implementation on Question Understanding and Similarity Computation of Chinese Question Answering System. Master's Thesis. South China University of Technology(2010)

[33]

Prajol, S., Christine, J., Béatrice, D.: Clustering Short Text and Its Evaluation. Lecture Notes in Computer Science, vol. 7182, pp. 169--180(2012)

Digital Library

[34]

Yih, W., Meek, C.: Improving Similarity Measures for Short Segments of Text. In: AAAI-07, pp. 1489--1494, Vancouver (2007)

Digital Library

[35]

Aminul, I., Evangelos, M., Vlado, K.: Text Similarity Using Google Tri-grams. In: Canadian Conference on Advances in Artificial Intelligence, pp. 312--317(2012)

Digital Library

[36]

Li, Y.H., McLean, D., Bandar, Z.A.: Sentence Similarity based on Semantic Nets and Corpus Statistics. IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 8, pp. 1138--1150(2006)

Digital Library

[37]

Jin, C.X., Zhou, H.Y., Bai, Q.C.: Short Text Clustering Algorithm with Feature Keyword Expansion. Advanced Materials Research, vol. 532, pp. 1716--1720(2012)

[38]

Kenter, T., Rijke M.D.: Short Text Similarity with Word Embeddings. In: 24th ACM International Conference on Information and Knowledge Management. pp. 1411--1420, Melbourne, Australia(2015)

Digital Library

Index Terms

State-of-art: text similarity computing
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

State-of-the-art research study for green cloud computing

Although cloud computing has rapidly emerged as a widely accepted computing paradigm, the research on cloud computing is still at an early stage. Cloud computing suffers from different challenging issues related to security, software frameworks, quality ...
State-of-the-art cloud computing security taxonomies: a classification of security challenges in the present cloud computing environment
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Cloud computing has taken center stage in the present business scenario due to its pay-as-you-use nature, where users need not bother about buying resources like hardware, software, infrastructure, etc. permanently. As much as the technological benefits,...
Survey of the State-of-the-Art of Cloud Computing

Cloud computing as a computational model has gathered tremendous traction. It is not completely clear what this term represents though it generally is thought to include a pay-as-you model for computation and storage. This paper explains what Cloud ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICCIP '18: Proceedings of the 4th International Conference on Communication and Information Processing

November 2018

326 pages

ISBN:9781450365345

DOI:10.1145/3290420

Conference Chairs:
Jalel Ben-Othman
University of Paris 13, France
,
Hui Yu
University of Portsmouth, the United Kingdom, UK
,
Program Chairs:
Herwig Unger
University of Hagen, Germany
,
Masayuki Arai
Graduate School of Science and Engineering Teikyo University, Japan

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICCIP 2018

ICCIP 2018: 2018 the 4th International Conference on Communication and Information Processing

November 2 - 4, 2018

Qingdao, China

Acceptance Rates

Overall Acceptance Rate 61 of 301 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
243
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten