research-article

A new sentence similarity assessment measure based on a three-layer sentence representation

Authors:

Rafael Ferreira,

Rafael Dueire Lins,

Steven J. Simske,

Marcelo RissAuthors Info & Claims

DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering

Pages 25 - 34

https://doi.org/10.1145/2644866.2644881

Published: 16 September 2014 Publication History

Abstract

Sentence similarity is used to measure the degree of likelihood between sentences. It is used in many natural language applications, such as text summarization, information retrieval, text categorization, and machine translation. The current methods for assessing sentence similarity represent sentences as vectors of bag of words or the syntactic information of the words in the sentence. The degree of likelihood between phrases is calculated by composing the similarity between the words in the sentences. Two important concerns in the area, the meaning problem and the word order, are not handled, however. This paper proposes a new sentence similarity assessment measure that largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method proposed here was benchmarked using a publically available standard dataset. The results obtained show that the new similarity assessment measure proposed outperforms the state of the art systems and achieve results comparable to the evaluation made by humans.

References

[1]

A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist., 32(1):13--47, Mar. 2006.

Digital Library

[2]

B. Choudhary and P. Bhattacharyya. Text clustering using semantics. In Proceedings of WORLD WIDE WEB CONFERENCE 2002, WWW '02, 2002.

[3]

T. A. S. Coelho, P. Calado, L. V. Souza, B. A. Ribeiro-Neto, and R. R. Muntz. Image retrieval using multiple evidence ranking. IEEE Transactions on Knowledge and Data Engineering, 16(4):408--417, 2004.

Digital Library

[4]

D. Das and A. F. T. Martins. A survey on automatic text summarization. Technical report, Literature Survey for the Language and Statistics II course at Carnegie Mellon University, 2007.

[5]

D. Das, N. Schneider, D. Chen, and N. A. Smith. Probabilistic frame-semantic parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 948--956, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.

Digital Library

[6]

C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998.

[7]

R. Ferreira, L. de Souza Cabral, R. D. Lins, G. de Franca Silva, F. Freitas, G. D. C. Cavalcanti, R. Lima, S. J. Simske, and L. Favaro. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(14):5755--5764, 2013.

[8]

A. Hotho, A. Nurnberger, and G. Paas. A brief survey of text mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20(1):19--62, May 2005.

[9]

A. Islam and D. Inkpen. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data, 2(2):10:1--10:25, July 2008.

Digital Library

[10]

A. Islam, E. E. Milios, and V. Keselj. Text similarity using google tri-grams. In L. Kosseim and D. Inkpen, editors, Canadian Conference on AI, volume 7310 of Lecture Notes in Computer Science, pages 312--317. Springer, 2012.

Digital Library

[11]

Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng., 15(4):871--882, July 2003.

Digital Library

[12]

Y. Li, D. McLean, Z. Bandar, J. O'Shea, and K. A. Crockett. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng., 18(8):1138--1150, 2006.

Digital Library

[13]

C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In M.-F. Moens and S. Szpakowicz, editors, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics.

[14]

R. D. Lins, S. J. Simske, L. de Souza Cabral, G. de Silva, R. Lima, R. F. Mello, and L. Favaro. A multi-tool scheme for summarizing textual documents. In Proc. of 11th IADIS International Conference WWW/INTERNET 2012, pages 1--8, July 2012.

[15]

T. Liu and J. Guo. Text similarity computing based on standard deviation. In Proceedings of the 2005 International Conference on Advances in Intelligent Computing - Volume Part I, ICIC'05, pages 456--464, Berlin, Heidelberg, 2005. Springer-Verlag.

Digital Library

[16]

E. Lloret and M. Palomar. Text summarisation in progress: a literature review. Artif. Intell. Rev., 37(1):1--41, Jan. 2012.

Digital Library

[17]

R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, AAAI'06, pages 775--780. AAAI Press, 2006.

Digital Library

[18]

F. P. Miller, A. F. Vandome, and J. McBrewster. Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau? Levenshtein distance, Spell checker, Hamming distance. Alpha Press, 2009.

Digital Library

[19]

A. Nenkova. Summarization evaluation for text and speech: issues and approaches. In NTERSPEECH, 2006.

[20]

A. Nenkova, R. Passonneau, and K. McKeown. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process., 4(2), May 2007.

Digital Library

[21]

J. Oliva, J. I. Serrano, M. D. del Castillo, and A. Iglesias. Symss: A syntax-based measure for short-text semantic similarity. Data Knowl. Eng., 70(4):390--405, Apr. 2011.

Digital Library

[22]

K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 311--318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.

Digital Library

[23]

D. R. Radev and D. Tam. Summarization evaluation using relative utility. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM '03, pages 508--511, New York, NY, USA, 2003. ACM.

Digital Library

[24]

F. F. B. A. S. J. S. Rafael Ferreira, Rafael Lins and M. Riss. A new sentence similarity method based on a three-layer sentence representation. In IEEE/WIC/ACM International Conference on Web Intelligence, 2014.

[25]

W3C. Resource description framework. http://www.w3.org/RDF/, 2004. Last Access March 2014.

[26]

F. Wei, W. Li, Q. Lu, and Y. He. A document-sensitive graph model for multi-document summarization. Knowledge and Information Systems, 22(2):245--259, 2010.

Digital Library

[27]

L.-C. Yu, C.-H. Wu, and F.-L. Jang. Psychiatric document retrieval using a discourse-aware model. Artificial Intelligence, 173(7-8):817--829, May 2009.

Digital Library

[28]

F. Zhou, F. Zhang, and B. Yang. Graph-based text representation model and its realization. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on, pages 1--8, 2010.

Cited By

Nishy Reshmi SShreelekshmi R(2023)Textual entailment classification using syntactic structures and semantic relationsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-22327545:1(929-939)Online publication date: 2-Jul-2023
https://doi.org/10.3233/JIFS-223275
Shajalal MAono M(2020)Semantic Sentence Modeling for Learning Textual Similarity Exploiting LSTMCyber Security and Computer Science10.1007/978-3-030-52856-0_34(426-438)Online publication date: 30-Jul-2020
https://doi.org/10.1007/978-3-030-52856-0_34
Quan ZWang ZLe YYao BLi KYin J(2019)An Efficient Framework for Sentence Similarity ModelingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.289949427:4(853-865)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1109/TASLP.2019.2899494
Show More Cited By

Index Terms

A new sentence similarity assessment measure based on a three-layer sentence representation
1. Computing methodologies
  1. Machine learning
    1. Learning settings
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction

Recommendations

Combining sentence similarities measures to identify paraphrases

It proposes a new paraphrase identification system based on lexical, syntactic, semantic analysis.It uses different machine learning algorithms to classify the paraphrase.The measure was evaluated using state-of-art dataset: Microsoft Paraphrase Corpus. ...
A New Sentence Similarity Method Based on a Three-Layer Sentence Representation
WI-IAT '14: Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) - Volume 01

Sentence similarity methods are used to assess the degree of likelihood between phrases. Many natural language applications such as text summarization, information retrieval, text categorization, and machine translation employ measures of sentence ...
The Semantic Computing Model of Sentence Similarity Based on Chinese FrameNet
WI-IAT '09: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

The semantic similarity computing among sentences is important in many fields of NLP. This paper proposes a sentence similarity theory based on Chinese Dependency Graph (CDG) from Chinese FrameNet (CFN). Firstly, the CFN project is introduced, and then ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering

September 2014

226 pages

ISBN:9781450329491

DOI:10.1145/2644866

General Chair:
Steven Simske
Hewlett-Packard, Fort Collins, USA
,
Program Chair:
Sebastian Rönnau
Zalando AG, Berlin, Germany

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Hewlett-Packard do Brazil & UFPE

Conference

DocEng '14

Sponsor:

SIGWEB

DocEng '14: ACM Symposium on Document Engineering 2014

September 16 - 19, 2014

Colorado, Fort Collins, USA

Acceptance Rates

DocEng '14 Paper Acceptance Rate 15 of 41 submissions, 37%;

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
326
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nishy Reshmi SShreelekshmi R(2023)Textual entailment classification using syntactic structures and semantic relationsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-22327545:1(929-939)Online publication date: 2-Jul-2023
https://doi.org/10.3233/JIFS-223275
Shajalal MAono M(2020)Semantic Sentence Modeling for Learning Textual Similarity Exploiting LSTMCyber Security and Computer Science10.1007/978-3-030-52856-0_34(426-438)Online publication date: 30-Jul-2020
https://doi.org/10.1007/978-3-030-52856-0_34
Quan ZWang ZLe YYao BLi KYin J(2019)An Efficient Framework for Sentence Similarity ModelingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.289949427:4(853-865)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1109/TASLP.2019.2899494
Yao LPan ZNing H(2019)Unlabeled Short Text Similarity With LSTM EncoderIEEE Access10.1109/ACCESS.2018.28856987(3430-3437)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2018.2885698
Shajalal MAono M(2019)Semantic textual similarity between sentences using bilingual word semanticsProgress in Artificial Intelligence10.1007/s13748-019-00180-48:2(263-272)Online publication date: 9-Mar-2019
https://doi.org/10.1007/s13748-019-00180-4
Zhang TLee BZhu Q(2019)Semantic measure of plagiarism using a hierarchical graph modelScientometrics10.1007/s11192-019-03204-xOnline publication date: 19-Aug-2019
https://doi.org/10.1007/s11192-019-03204-x
Soares VCampello RNourashrafeddin SMilios ENaldi M(2019)Combining semantic and term frequency similarities for text clusteringKnowledge and Information Systems10.1007/s10115-018-1278-7Online publication date: 2-Jan-2019
https://doi.org/10.1007/s10115-018-1278-7
Shajalal MAono M(2018)Sentence-Level Semantic Textual Similarity Using Word-Level Semantics2018 10th International Conference on Electrical and Computer Engineering (ICECE)10.1109/ICECE.2018.8636779(113-116)Online publication date: Dec-2018
https://doi.org/10.1109/ICECE.2018.8636779
Shajalal MAono M(2018)Semantic Textual Similarity in Bengali Text2018 International Conference on Bangla Speech and Language Processing (ICBSLP)10.1109/ICBSLP.2018.8554940(1-5)Online publication date: Sep-2018
https://doi.org/10.1109/ICBSLP.2018.8554940
Song XYang CZhang HZhao X(2018)The Algorithm of Automatic Text Summarization Based on Network Representation LearningNatural Language Processing and Chinese Computing10.1007/978-3-319-99501-4_32(362-371)Online publication date: 14-Aug-2018
https://doi.org/10.1007/978-3-319-99501-4_32
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten