Skip to main content

From Vector Space Models to Vector Space Models of Semantics

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Abstract

This paper assesses the performance of frequency and concept based text representation in Mixed Script Information Retrieval and Classification tasks. In text analytics, representation serves as an unresolved research problem to progress further towards different applications. In this paper observations from different text representation methods in text classification and information retrieval are presented. The data set from the Mixed Script Information Retrieval shared task is used in this experiment and the performance of final submitted model is evaluated by task organizers. It is observed that distributional representation performs better than the frequency based text representation methods. The final system attained first place in task 2 and was 3.89% lesser than the top scored system in task 1.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Salton, G., Anita, W., Chung-Shu, Y.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Manwar, A.B., Mahalle, H.S., Chinchkhede, K.D., Chavan, V.: A vector space model for information retrieval: a matlab approach. Indian J. Comput. Sci. Eng. 3, 222–229 (2012)

    Google Scholar 

  3. Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J Math. Models Methods Appl. Sci. 4(1), 300–307 (2007)

    Google Scholar 

  4. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)

    Google Scholar 

  5. Reidy, P.: An introduction to latent semantic analysis. Indian J. Comput. Sci. Eng (2017)

    Google Scholar 

  6. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273 (2003)

    Google Scholar 

  7. Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556 (2012)

    Google Scholar 

  8. Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  9. Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Amrita CEN at SemEval-2016 Task 1: semantic relation from word embeddings in higher dimension. In: Proceedings of SemEval-2016, pp. 706–711 (2016)

    Google Scholar 

  10. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  MATH  Google Scholar 

  11. Reshma, U., Barathi Ganesh, H.B., Anand Kumar, M.: Author identification based on word distribution in word space. In: Advances in Computing, Communications and Informatics (ICACCI), pp. 1519–1523 (2015)

    Google Scholar 

  12. Banerjee, S., Naskar, S.K., Rosso, P., Bandyopadhyay, S.: The first cross-script code-mixed question answering corpus. In: Modelling, Learning and Mining for Cross/Multilinguality Workshop, pp. 56–65 (2016)

    Google Scholar 

  13. Banerjee, S., Naskar, S., Rosso, P., Bandyopadhyay, S., Chakma, K., Das, A., Choudhury, M.: MSIR@FIRE: overview of the mixed script information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)

    Google Scholar 

  14. Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation for text classification and information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)

    Google Scholar 

  15. Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation in health care text classification. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. B. Barathi Ganesh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P. (2018). From Vector Space Models to Vector Space Models of Semantics. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73606-8_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73605-1

  • Online ISBN: 978-3-319-73606-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics