From Vector Space Models to Vector Space Models of Semantics

Barathi Ganesh, H. B.; Anand Kumar, M.; Soman, K. P.

doi:10.1007/978-3-319-73606-8_4

From Vector Space Models to Vector Space Models of Semantics

H. B. Barathi Ganesh¹⁷,
M. Anand Kumar¹⁷ &
K. P. Soman¹⁷

Conference paper
First Online: 04 January 2018

715 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Abstract

This paper assesses the performance of frequency and concept based text representation in Mixed Script Information Retrieval and Classification tasks. In text analytics, representation serves as an unresolved research problem to progress further towards different applications. In this paper observations from different text representation methods in text classification and information retrieval are presented. The data set from the Mixed Script Information Retrieval shared task is used in this experiment and the performance of final submitted model is evaluated by task organizers. It is observed that distributional representation performs better than the frequency based text representation methods. The final system attained first place in task 2 and was 3.89% lesser than the top scored system in task 1.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Salton, G., Anita, W., Chung-Shu, Y.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Article MATH Google Scholar
Manwar, A.B., Mahalle, H.S., Chinchkhede, K.D., Chavan, V.: A vector space model for information retrieval: a matlab approach. Indian J. Comput. Sci. Eng. 3, 222–229 (2012)
Google Scholar
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J Math. Models Methods Appl. Sci. 4(1), 300–307 (2007)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)
Google Scholar
Reidy, P.: An introduction to latent semantic analysis. Indian J. Comput. Sci. Eng (2017)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273 (2003)
Google Scholar
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556 (2012)
Google Scholar
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Amrita CEN at SemEval-2016 Task 1: semantic relation from word embeddings in higher dimension. In: Proceedings of SemEval-2016, pp. 706–711 (2016)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article MATH Google Scholar
Reshma, U., Barathi Ganesh, H.B., Anand Kumar, M.: Author identification based on word distribution in word space. In: Advances in Computing, Communications and Informatics (ICACCI), pp. 1519–1523 (2015)
Google Scholar
Banerjee, S., Naskar, S.K., Rosso, P., Bandyopadhyay, S.: The first cross-script code-mixed question answering corpus. In: Modelling, Learning and Mining for Cross/Multilinguality Workshop, pp. 56–65 (2016)
Google Scholar
Banerjee, S., Naskar, S., Rosso, P., Bandyopadhyay, S., Chakma, K., Das, A., Choudhury, M.: MSIR@FIRE: overview of the mixed script information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Google Scholar
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation for text classification and information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Google Scholar
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation in health care text classification. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
H. B. Barathi Ganesh, M. Anand Kumar & K. P. Soman

Authors

H. B. Barathi Ganesh
View author publications
You can also search for this author in PubMed Google Scholar
M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H. B. Barathi Ganesh .

Editor information

Editors and Affiliations

DAIICT, Gujarat, India
Prasenjit Majumder
Indian Statistical Institute, Kolkata, India
Mandar Mitra
DAIICT, Gujarat, India
Parth Mehta
DAIICT, Gujarat, India
Jainisha Sankhavara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P. (2018). From Vector Space Models to Vector Space Models of Semantics. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-73606-8_4
Published: 04 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics