Skip to main content

Short Text Computing Based on Lexical Similarity Model

  • Conference paper
  • First Online:
Information and Software Technologies (ICIST 2019)

Abstract

Short text similarity deals with determining the closeness of two text mean the same thing by lexical or semantic. Various short text similarity approaches have been proposed which are based on lexical matching, semantic knowledge background or combining models. Lexical based model does not capture the actual meaning behind the words. However, semantic approach are relying on knowledge background or corpus which cannot be assumed to be available in handling such huge new word of data sparseness and noise in short text. This work are focusing on lexical-based similarity models for analysing the unstructured short text. The term-based and edit distance model are used in comparing the applicability of these model to compute the similarity value of short text. The experimental results shows that each model have their key strengths and limitations in computing similarity value of short text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alabbas, M., Ramsay, A.: Natural language inference for Arabic using extended tree edit distance with subtrees. J. Artif. Intell. Res. 48, 1–22 (2013)

    Article  MathSciNet  Google Scholar 

  2. Anson, S., Watson, H., Wadhwa, K., Metz, K.: Analysing social media data for disaster preparedness: understanding the opportunities and barriers faced by humanitarian actors. Int. J. Disaster Risk Reduction 21, 131–139 (2017)

    Article  Google Scholar 

  3. Boom, C.D., Canneyt, S.V., Bohez, S., Demeester, T., Dhoedt, B.: Learning semantic similarity for very short texts. CoRR abs/1512.00765 (2015)

    Google Scholar 

  4. Cong, G., Wang, L., Lin, C.Y., Song, Y.I., Sun, Y.: Finding question-answer pairs from online forums. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 467–474 (2008)

    Google Scholar 

  5. El-Shishtawy, T.: A hybrid algorithm for matching Arabic names. CoRR abs/1309.5657 (2013)

    Google Scholar 

  6. EnglishPractice.com: Writing similar sentences (2019). https://www.englishpractice.com/

  7. Ferreira, R., Lins, R.D., Simske, S.J., Freitas, F., Riss, M.: Assessing sentence similarity through lexical, syntactic and semantic analysis. Comput. Speech Lang. 39(C), 1–28 (2016)

    Article  Google Scholar 

  8. Gali, N., Mariescu-Istodor, R., FrÃnti, P.: Similarity measures for title matching. In: 23rd International Conference on Pattern Recognition (ICPR), pp. 1548–1553, December 2016

    Google Scholar 

  9. Gao, L., Zhou, S., Guan, J.: Effectively classifying short texts by structured sparse representation with dictionary filtering. Inf. Sci. 323(C), 130–142 (2015)

    Article  MathSciNet  Google Scholar 

  10. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13), 13–18 (2013)

    Google Scholar 

  11. Harispe, S., Ranwez, S., Janaqi, S., Montmain, J.: Semantic similarity from natural language and ontology analysis. CoRR abs/1704.05295 (2017)

    Google Scholar 

  12. Hasan, A.A., Tiun, S., Yusof, M.M., Mokhtar, U.A., Jambari, D.I.: Enhanced feature for short document classification. J. Eng. Appl. Sci. 12(13), 3534–3540 (2017)

    Google Scholar 

  13. Hu, X., Tang, L., Tang, J., Liu, H.: Exploiting social relations for sentiment analysis in microblogging. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, WSDM 2013, pp. 537–546. ACM, New York (2013)

    Google Scholar 

  14. Huang, A.: Similarity measures for text document clustering. In: Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC 2008), Christchurch, pp. 49–56 (2008)

    Google Scholar 

  15. Jiang, Y., Li, G., Feng, J., Li, W.S.: String similarity joins: an experimental evaluation. Proc. VLDB Endow. 7(8), 625–636 (2014)

    Article  Google Scholar 

  16. Jones, R., Bartz, K., Subasic, P., Rey, B.: Automatically generating related queries in Japanese. Lang. Resour. Eval. 40(3), 219–232 (2006)

    Google Scholar 

  17. Lee, D., Park, J., Shim, J., Lee, S.G.: Efficient filtering techniques for cosine similarity joins. Inf. Int. Interdisc. J. 14, 1265 (2011)

    Google Scholar 

  18. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC 1986, pp. 24–26. ACM, New York (1986)

    Google Scholar 

  19. Li, Y., McLean, D., Bandar, Z.A., O’Shea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Trans. Knowl. Data Eng. 18(8), 1138–1150 (2006)

    Article  Google Scholar 

  20. Lochter, J.V., Zanetti, R.F., Reller, D., Almeida, T.A.: Short text opinion detection using ensemble of classifiers and semantic indexing. Expert Syst. Appl. 62, 243–249 (2016)

    Article  Google Scholar 

  21. Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, Key Largo, 16–18 May 2016, pp. 232–237 (2016)

    Google Scholar 

  22. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  Google Scholar 

  23. Martínez-Cámara, E., Montejo-Ráez, A., Martín-Valdivia, M.T.: Ureña López, L.A.: SINAI: machine learning and emotion of the crowd for sentiment analysis in microblogs. In: Second Joint Conference on Lexical and Computational Semantics (*SEM). Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pp. 402–407. Association for Computational Linguistics, Atlanta, June 2013

    Google Scholar 

  24. Metzler, D., Dumais, S., Meek, C.: Similarity measures for short segments of text. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 16–27. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71496-5_5

    Chapter  Google Scholar 

  25. Nakov, P., et al.: Developing a successful SemEval task in sentiment analysis of Twitter and other social media texts. Lang. Resour. Eval. 50(1), 35–65 (2016)

    Article  Google Scholar 

  26. Noah, S.A., Amruddin, A.Y., Omar, N.: Semantic similarity measures for Malay sentences. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 117–126. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_19

    Chapter  Google Scholar 

  27. Noah, S.A., Omar, N., Amruddin, A.Y.: Evaluation of lexical-based approaches to the semantic similarity of Malay sentences. J. Quantit. Linguist. 22(2), 135–156 (2015)

    Article  Google Scholar 

  28. Rizzo Irfan, M., Fauzi, M., Tibyani, T., Dyah Mentari, N.: Twitter sentiment analysis on 2013 curriculum using ensemble features and k-nearest neighbor. Int. J. Electr. Comput. Eng. (IJECE) 8, 5409 (2018)

    Article  Google Scholar 

  29. Rong, C., Silva, Y.N., Li, C.: String similarity join with different similarity thresholds based on novel indexing techniques. Front. Comput. Sci. 11(2), 307–319 (2017)

    Article  Google Scholar 

  30. Saif, H., Fernandez, M., He, Y., Alani, H.: On stopwords, filtering and data sparsity for sentiment analysis of Twitter. In: Calzolari, N., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Language Resources Association (ELRA), Reykjavik, May 2014

    Google Scholar 

  31. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 373–382. ACM, New York (2015)

    Google Scholar 

  32. Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. J. Artif. Int. Res. 37(1), 1–40 (2010)

    MATH  Google Scholar 

  33. Varnhagen, C.K., McFall, G.P., Pugh, N., Routledge, L., Sumida-MacDonald, H., Kwong, T.E.: lol: new language and spelling in instant messaging. Read. Writ. 23(6), 719–733 (2010)

    Article  Google Scholar 

  34. Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Inf. Sci. 180(20), 4031–4041 (2010)

    Article  Google Scholar 

  35. Yan, L., Zheng, Y., Cao, J.: Few-shot learning for short text classification. Multimed. Tools Appl. 77(22), 29799–29810 (2018)

    Article  Google Scholar 

Download references

Acknowledgment

This research is sponsored by the Ministry of Higher Education, under the Fundamental Research Grants Scheme vot 59467.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arifah Che Alhadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Che Alhadi, A., Deraman, A., Abdul Jalil, M., Wan Yussof, W.N.J., Mohd Noah, S.A. (2019). Short Text Computing Based on Lexical Similarity Model. In: Damaševičius, R., Vasiljevienė, G. (eds) Information and Software Technologies. ICIST 2019. Communications in Computer and Information Science, vol 1078. Springer, Cham. https://doi.org/10.1007/978-3-030-30275-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30275-7_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30274-0

  • Online ISBN: 978-3-030-30275-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics