skip to main content
10.1145/2644866.2644881acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

A new sentence similarity assessment measure based on a three-layer sentence representation

Published: 16 September 2014 Publication History

Abstract

Sentence similarity is used to measure the degree of likelihood between sentences. It is used in many natural language applications, such as text summarization, information retrieval, text categorization, and machine translation. The current methods for assessing sentence similarity represent sentences as vectors of bag of words or the syntactic information of the words in the sentence. The degree of likelihood between phrases is calculated by composing the similarity between the words in the sentences. Two important concerns in the area, the meaning problem and the word order, are not handled, however. This paper proposes a new sentence similarity assessment measure that largely improves and refines a recently published method that takes into account the lexical, syntactic and semantic components of sentences. The new method proposed here was benchmarked using a publically available standard dataset. The results obtained show that the new similarity assessment measure proposed outperforms the state of the art systems and achieve results comparable to the evaluation made by humans.

References

[1]
A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist., 32(1):13--47, Mar. 2006.
[2]
B. Choudhary and P. Bhattacharyya. Text clustering using semantics. In Proceedings of WORLD WIDE WEB CONFERENCE 2002, WWW '02, 2002.
[3]
T. A. S. Coelho, P. Calado, L. V. Souza, B. A. Ribeiro-Neto, and R. R. Muntz. Image retrieval using multiple evidence ranking. IEEE Transactions on Knowledge and Data Engineering, 16(4):408--417, 2004.
[4]
D. Das and A. F. T. Martins. A survey on automatic text summarization. Technical report, Literature Survey for the Language and Statistics II course at Carnegie Mellon University, 2007.
[5]
D. Das, N. Schneider, D. Chen, and N. A. Smith. Probabilistic frame-semantic parsing. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT '10, pages 948--956, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
[6]
C. Fellbaum, editor. WordNet: an electronic lexical database. MIT Press, 1998.
[7]
R. Ferreira, L. de Souza Cabral, R. D. Lins, G. de Franca Silva, F. Freitas, G. D. C. Cavalcanti, R. Lima, S. J. Simske, and L. Favaro. Assessing sentence scoring techniques for extractive text summarization. Expert Systems with Applications, 40(14):5755--5764, 2013.
[8]
A. Hotho, A. Nurnberger, and G. Paas. A brief survey of text mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology, 20(1):19--62, May 2005.
[9]
A. Islam and D. Inkpen. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data, 2(2):10:1--10:25, July 2008.
[10]
A. Islam, E. E. Milios, and V. Keselj. Text similarity using google tri-grams. In L. Kosseim and D. Inkpen, editors, Canadian Conference on AI, volume 7310 of Lecture Notes in Computer Science, pages 312--317. Springer, 2012.
[11]
Y. Li, Z. A. Bandar, and D. McLean. An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. on Knowl. and Data Eng., 15(4):871--882, July 2003.
[12]
Y. Li, D. McLean, Z. Bandar, J. O'Shea, and K. A. Crockett. Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng., 18(8):1138--1150, 2006.
[13]
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In M.-F. Moens and S. Szpakowicz, editors, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74--81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
[14]
R. D. Lins, S. J. Simske, L. de Souza Cabral, G. de Silva, R. Lima, R. F. Mello, and L. Favaro. A multi-tool scheme for summarizing textual documents. In Proc. of 11th IADIS International Conference WWW/INTERNET 2012, pages 1--8, July 2012.
[15]
T. Liu and J. Guo. Text similarity computing based on standard deviation. In Proceedings of the 2005 International Conference on Advances in Intelligent Computing - Volume Part I, ICIC'05, pages 456--464, Berlin, Heidelberg, 2005. Springer-Verlag.
[16]
E. Lloret and M. Palomar. Text summarisation in progress: a literature review. Artif. Intell. Rev., 37(1):1--41, Jan. 2012.
[17]
R. Mihalcea, C. Corley, and C. Strapparava. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 1, AAAI'06, pages 775--780. AAAI Press, 2006.
[18]
F. P. Miller, A. F. Vandome, and J. McBrewster. Levenshtein Distance: Information theory, Computer science, String (computer science), String metric, Damerau? Levenshtein distance, Spell checker, Hamming distance. Alpha Press, 2009.
[19]
A. Nenkova. Summarization evaluation for text and speech: issues and approaches. In NTERSPEECH, 2006.
[20]
A. Nenkova, R. Passonneau, and K. McKeown. The pyramid method: Incorporating human content selection variation in summarization evaluation. ACM Trans. Speech Lang. Process., 4(2), May 2007.
[21]
J. Oliva, J. I. Serrano, M. D. del Castillo, and A. Iglesias. Symss: A syntax-based measure for short-text semantic similarity. Data Knowl. Eng., 70(4):390--405, Apr. 2011.
[22]
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 311--318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.
[23]
D. R. Radev and D. Tam. Summarization evaluation using relative utility. In Proceedings of the twelfth international conference on Information and knowledge management, CIKM '03, pages 508--511, New York, NY, USA, 2003. ACM.
[24]
F. F. B. A. S. J. S. Rafael Ferreira, Rafael Lins and M. Riss. A new sentence similarity method based on a three-layer sentence representation. In IEEE/WIC/ACM International Conference on Web Intelligence, 2014.
[25]
W3C. Resource description framework. http://www.w3.org/RDF/, 2004. Last Access March 2014.
[26]
F. Wei, W. Li, Q. Lu, and Y. He. A document-sensitive graph model for multi-document summarization. Knowledge and Information Systems, 22(2):245--259, 2010.
[27]
L.-C. Yu, C.-H. Wu, and F.-L. Jang. Psychiatric document retrieval using a discourse-aware model. Artificial Intelligence, 173(7-8):817--829, May 2009.
[28]
F. Zhou, F. Zhang, and B. Yang. Graph-based text representation model and its realization. In Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on, pages 1--8, 2010.

Cited By

View all
  • (2023)Textual entailment classification using syntactic structures and semantic relationsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-22327545:1(929-939)Online publication date: 2-Jul-2023
  • (2020)Semantic Sentence Modeling for Learning Textual Similarity Exploiting LSTMCyber Security and Computer Science10.1007/978-3-030-52856-0_34(426-438)Online publication date: 30-Jul-2020
  • (2019)An Efficient Framework for Sentence Similarity ModelingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.289949427:4(853-865)Online publication date: 1-Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '14: Proceedings of the 2014 ACM symposium on Document engineering
September 2014
226 pages
ISBN:9781450329491
DOI:10.1145/2644866
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph-based model
  2. inductive logic programming
  3. relation extraction
  4. sentence simplification

Qualifiers

  • Research-article

Funding Sources

  • Hewlett-Packard do Brazil & UFPE

Conference

DocEng '14
Sponsor:
DocEng '14: ACM Symposium on Document Engineering 2014
September 16 - 19, 2014
Colorado, Fort Collins, USA

Acceptance Rates

DocEng '14 Paper Acceptance Rate 15 of 41 submissions, 37%;
Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Textual entailment classification using syntactic structures and semantic relationsJournal of Intelligent & Fuzzy Systems10.3233/JIFS-22327545:1(929-939)Online publication date: 2-Jul-2023
  • (2020)Semantic Sentence Modeling for Learning Textual Similarity Exploiting LSTMCyber Security and Computer Science10.1007/978-3-030-52856-0_34(426-438)Online publication date: 30-Jul-2020
  • (2019)An Efficient Framework for Sentence Similarity ModelingIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2019.289949427:4(853-865)Online publication date: 1-Apr-2019
  • (2019)Unlabeled Short Text Similarity With LSTM EncoderIEEE Access10.1109/ACCESS.2018.28856987(3430-3437)Online publication date: 2019
  • (2019)Semantic textual similarity between sentences using bilingual word semanticsProgress in Artificial Intelligence10.1007/s13748-019-00180-48:2(263-272)Online publication date: 9-Mar-2019
  • (2019)Semantic measure of plagiarism using a hierarchical graph modelScientometrics10.1007/s11192-019-03204-xOnline publication date: 19-Aug-2019
  • (2019)Combining semantic and term frequency similarities for text clusteringKnowledge and Information Systems10.1007/s10115-018-1278-7Online publication date: 2-Jan-2019
  • (2018)Sentence-Level Semantic Textual Similarity Using Word-Level Semantics2018 10th International Conference on Electrical and Computer Engineering (ICECE)10.1109/ICECE.2018.8636779(113-116)Online publication date: Dec-2018
  • (2018)Semantic Textual Similarity in Bengali Text2018 International Conference on Bangla Speech and Language Processing (ICBSLP)10.1109/ICBSLP.2018.8554940(1-5)Online publication date: Sep-2018
  • (2018)The Algorithm of Automatic Text Summarization Based on Network Representation LearningNatural Language Processing and Chinese Computing10.1007/978-3-319-99501-4_32(362-371)Online publication date: 14-Aug-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media