Abstract
This paper assesses the performance of frequency and concept based text representation in Mixed Script Information Retrieval and Classification tasks. In text analytics, representation serves as an unresolved research problem to progress further towards different applications. In this paper observations from different text representation methods in text classification and information retrieval are presented. The data set from the Mixed Script Information Retrieval shared task is used in this experiment and the performance of final submitted model is evaluated by task organizers. It is observed that distributional representation performs better than the frequency based text representation methods. The final system attained first place in task 2 and was 3.89% lesser than the top scored system in task 1.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Salton, G., Anita, W., Chung-Shu, Y.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)
Manwar, A.B., Mahalle, H.S., Chinchkhede, K.D., Chavan, V.: A vector space model for information retrieval: a matlab approach. Indian J. Comput. Sci. Eng. 3, 222–229 (2012)
Cha, S.-H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J Math. Models Methods Appl. Sci. 4(1), 300–307 (2007)
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)
Reidy, P.: An introduction to latent semantic analysis. Indian J. Comput. Sci. Eng (2017)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267–273 (2003)
Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546–556 (2012)
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Amrita CEN at SemEval-2016 Task 1: semantic relation from word embeddings in higher dimension. In: Proceedings of SemEval-2016, pp. 706–711 (2016)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Reshma, U., Barathi Ganesh, H.B., Anand Kumar, M.: Author identification based on word distribution in word space. In: Advances in Computing, Communications and Informatics (ICACCI), pp. 1519–1523 (2015)
Banerjee, S., Naskar, S.K., Rosso, P., Bandyopadhyay, S.: The first cross-script code-mixed question answering corpus. In: Modelling, Learning and Mining for Cross/Multilinguality Workshop, pp. 56–65 (2016)
Banerjee, S., Naskar, S., Rosso, P., Bandyopadhyay, S., Chakma, K., Das, A., Choudhury, M.: MSIR@FIRE: overview of the mixed script information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation for text classification and information retrieval. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P.: Distributional semantic representation in health care text classification. In: Working Notes of FIRE 2016 - Forum for Information Retrieval Evaluation, 7–10 December 2016, Kolkata, India, CEUR Workshop Proceedings (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Barathi Ganesh, H.B., Anand Kumar, M., Soman, K.P. (2018). From Vector Space Models to Vector Space Models of Semantics. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-73606-8_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)