Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview

Anand Kumar, M.; Singh, Shivkaran; Kavirajan, B.; Soman, K. P.

doi:10.1007/978-3-319-73606-8_10

M. Anand Kumar ORCID: orcid.org/0000-0003-0310-4510¹⁷,
Shivkaran Singh¹⁷,
B. Kavirajan¹⁷ &
…
K. P. Soman¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10478))

Included in the following conference series:

Forum for Information Retrieval Evaluation

657 Accesses

Abstract

This paper explains the overview of the shared task on “Detecting Paraphrases in Indian Languages” (DPIL) conducted at FIRE 2016. Given a pair of sentences in the same language, participants were asked to detect the semantic equivalence between sentences. This shared task was proposed for four Indian languages, namely Tamil, Malayalam, Hindi, and Punjabi. There were two subtasks given under the shared task on Detecting Paraphrase in Indian Languages. Given a pair of sentences, the subtask-1 was to classify them as paraphrases or not paraphrases. The subtask-2 was to identify whether they are paraphrases or semi-paraphrases or not paraphrases. The dataset created for the shared task has been made available online, and it is the first open-source paraphrase detection corpora for Indian languages. In this overview paper, we describe both subtasks, datasets, evaluation methods and system descriptions as well as performances of the submitted runs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

ParaPhraser: Russian Paraphrase Corpus and Shared Task

Paraphrase Detection in Indian Languages Using Deep Learning

A New Russian Paraphrase Corpus. Paraphrase Identification and Classification Based on Different Prediction Models

Notes

1.
http://nlp.amrita.edu/dpil_cen/
2.
http://alt.qcri.org/semeval2015/
3.
http://nlp.amrita.edu/dpil_cen/
4.
It does not affect the result of the participating teams

References

Dolan, W.B., Brockett, C.: Automatically constructing a corpus of sentential paraphrases. In: Proceedings of IWP, October 2005
Google Scholar
Xu, W., Callison-Burch, C., Dolan, W.B.: SemEval-2015 task 1: paraphrase and semantic similarity in Twitter (PIT). In: Proceedings of SemEval (2015)
Google Scholar
Xu, W., Ritter, A., Callison-Burch, C., Dolan, W.B., Ji, Y.: Extracting lexically divergent paraphrases from Twitter. Trans. Assoc. Comput. Linguist. 2, 435–448 (2014)
Google Scholar
Pronoza, E., Yagunova, E., Pronoza, A.: Construction of a Russian paraphrase corpus: unsupervised paraphrase extraction. In: Braslavski, P., Markov, I., Pardalos, P., Volkovich, Y., Ignatov, Dmitry I., Koltsov, S., Koltsova, O. (eds.) RuSSIR 2015. CCIS, vol. 573, pp. 146–157. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41718-9_8
Chapter Google Scholar
Potthast, M., Stein, B., Barrón-Cedeño, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 997–1005. Association for Computational Linguistics, August 2010
Google Scholar
Rus, V., Banjade, R., Lintean, M.C.: On paraphrase identification corpora. In: LREC, pp. 2422–2429 (2014)
Google Scholar
Kothwal, R., Varma, V.: Cross lingual text reuse detection based on keyphrase extraction and similarity measures. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds.) FIRE 2010-2011. LNCS, vol. 7536, pp. 71–78. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40087-2_7
Chapter Google Scholar
Mahalakshmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithm. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Google Scholar
Idicula, S.M.: Fingerprinting based detection system for identifying plagiarism in Malayalam text documents. In: 2015 International Conference on Computing and Network Communications (CoCoNet), pp. 553–558. IEEE, December 2015
Google Scholar
Mathew, D., Idicula, S.M.: Paraphrase identification of Malayalam sentences-an experience. In: 2013 Fifth International Conference on Advanced Computing (ICoAC), pp. 376–382. IEEE, December 2013
Google Scholar
Kahane, S.: The meaning-text theory. Dependency Valency Int. Handb. Contemp. Res. 1, 546–570 (2003)
Google Scholar

Download references

Acknowledgement

First, we would like to thank FIRE 2016 organizers for giving us an opportunity to organize the shared task on Detecting Paraphrases for Indian Languages (DPIL). We would like to extend our gratitude to the advisory committee members Prof. Ramanan, RelAgent Pvt. Ltd and Prof. Rajendran S, Computational Engineering and Networking (CEN) for actively supporting us throughout the track. We would like to thank our PG students at CEN for helping us in creating the paraphrase corpora.

Author information

Authors and Affiliations

Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
M. Anand Kumar, Shivkaran Singh, B. Kavirajan & K. P. Soman

Authors

M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shivkaran Singh
View author publications
You can also search for this author in PubMed Google Scholar
B. Kavirajan
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Anand Kumar .

Editor information

Editors and Affiliations

DAIICT, Gujarat, India
Prasenjit Majumder
Indian Statistical Institute, Kolkata, India
Mandar Mitra
DAIICT, Gujarat, India
Parth Mehta
DAIICT, Gujarat, India
Jainisha Sankhavara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anand Kumar, M., Singh, S., Kavirajan, B., Soman, K.P. (2018). Shared Task on Detecting Paraphrases in Indian Languages (DPIL): An Overview. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds) Text Processing. FIRE 2016. Lecture Notes in Computer Science(), vol 10478. Springer, Cham. https://doi.org/10.1007/978-3-319-73606-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-73606-8_10
Published: 04 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73605-1
Online ISBN: 978-3-319-73606-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics