Abstract
This paper explains the overview of the shared task on “Detecting Paraphrases in Indian Languages” (DPIL) conducted at FIRE 2016. Given a pair of sentences in the same language, participants were asked to detect the semantic equivalence between sentences. This shared task was proposed for four Indian languages, namely Tamil, Malayalam, Hindi, and Punjabi. There were two subtasks given under the shared task on Detecting Paraphrase in Indian Languages. Given a pair of sentences, the subtask-1 was to classify them as paraphrases or not paraphrases. The subtask-2 was to identify whether they are paraphrases or semi-paraphrases or not paraphrases. The dataset created for the shared task has been made available online, and it is the first open-source paraphrase detection corpora for Indian languages. In this overview paper, we describe both subtasks, datasets, evaluation methods and system descriptions as well as performances of the submitted runs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
It does not affect the result of the participating teams
References
Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP, October 2005
Xu, W., Callison-Burch, C., Dolan, W.B.: SemEval-2015 task 1: paraphrase and semantic similarity in Twitter (PIT). In: Proceedings of SemEval (2015)
Xu, W., Ritter, A., Callison-Burch, C., Dolan, W.B., Ji, Y.: Extracting lexically divergent paraphrases from Twitter. Trans. Assoc. Comput. Linguist. 2, 435–448 (2014)
Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41718-9_8
Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 997–1005. Association for Computational Linguistics, August 2010
Rus, V., Banjade, R., Lintean, M.C.: On paraphrase identification corpora. In: LREC, pp. 2422–2429 (2014)
Kothwal, R., Varma, V.: Cross lingual text reuse detection based on keyphrase extraction and similarity measures. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds.) FIRE 2010-2011. LNCS, vol. 7536, pp. 71–78. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40087-2_7
Mahalakshmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithm. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Idicula, S.M.: Fingerprinting based detection system for identifying plagiarism in Malayalam text documents. In: 2015 International Conference on Computing and Network Communications (CoCoNet), pp. 553–558. IEEE, December 2015
Mathew, D., Idicula, S.M.: Paraphrase identification of Malayalam sentences-an experience. In: 2013 Fifth International Conference on Advanced Computing (ICoAC), pp. 376–382. IEEE, December 2013
Kahane, S.: The meaning-text theory. Dependency Valency Int. Handb. Contemp. Res. 1, 546–570 (2003)
Acknowledgement
First, we would like to thank FIRE 2016 organizers for giving us an opportunity to organize the shared task on Detecting Paraphrases for Indian Languages (DPIL). We would like to extend our gratitude to the advisory committee members Prof. Ramanan, RelAgent Pvt. Ltd and Prof. Rajendran S, Computational Engineering and Networking (CEN) for actively supporting us throughout the track. We would like to thank our PG students at CEN for helping us in creating the paraphrase corpora.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Anand Kumar, M., Singh, S., Kavirajan, B., Soman, K.P. (2018). Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-73606-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)