ParaCap: paraphrase detection model using capsule network

Jain, Rachna; Kathuria, Abhishek; Singh, Anubhav; Saxena, Anmol; Khandelwal, Anjali

doi:10.1007/s00530-020-00746-6

ParaCap: paraphrase detection model using capsule network

Special Issue Paper
Published: 22 January 2021

Volume 28, pages 1877–1895, (2022)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Rachna Jain¹,
Abhishek Kathuria¹,
Anubhav Singh ORCID: orcid.org/0000-0002-2156-7335¹,
Anmol Saxena¹ &
…
Anjali Khandelwal¹

360 Accesses
1 Citation
Explore all metrics

Abstract

This paper is concerned withthe problem of paraphrase detection. For a number of applications, the ability to detect similar sentences, such as text mining, summary text, plagiarism detection, authorship authentication, and question answering, is important. Given two phrases, the goal is to detect whether they are identical semantically. This work involves a novel model namely, ParaCap, which uses capsule networks for the investigation of sentences. Capsule networks understand the spatial information (context, language, length of sentences and others) by using the instantiation parameters for the better results as compared to CNNs. For the objective, the Quora Question Pair dataset containing 404291 pairs of Quora Questions is being used. The ParaCap model outperforms many state-of-art methods, and also proves to be comparable to other techniques by achieving the accuracy of 89.19%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

Article Open access 04 March 2023

References

Chiu, C., Zhan, J.: An evolutionary approach to compact DAG neural network optimization. IEEE Access 7, 178331 (2019)
Article Google Scholar
Mahmoud, A., Zrigui, M.: Deep neural network models for paraphrased text classification in the Arabic Language. In: International conference on applications of natural language to information systems (Springer) , pp. 3–16 (2019)
Prayogo, A.H., Mubarok, A., et al.: On the structure of Bayesian network for Indonesian text document paraphrase identification. J. Phys. Conf. Ser. 971, 012051 (2018)
Article Google Scholar
Sameen, S., Sharjeel, M., Nawab, R.M.A., Rayson, P., Muneer, I.: Measuring short text reuse for the Urdu language. IEEE Access 6, 7412 (2017)
Article Google Scholar
Mahmood, A., Khan, H.U., Khan, W., et al.: Query based information retrieval and knowledge extraction using Hadith datasets. In: 2017 13th International Conference on Emerging Technologies (ICET) (IEEE) , pp. 1–6 (2017)
Rashid, J., Shah, S.M.A., Irtaza, A.: Fuzzy topic modeling approach for text mining over short text. Inf. Process. Manag. 56(6), 102060 (2019)
Article Google Scholar
Shakeel, M.H., Karim, A., Khan, I.: A Multi-cascaded Deep Model for Bilingual SMS Classification. In: International conference on neural information processing (Springer), pp. 287–298 (2019)
Dey, K., Shrivastava, R., Kaushik, S.: A paraphrase and semantic similarity detection system for user generated short-text content on microblogs. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2880–2890 (2016)
Huang, J., Yao, S., Lyu, C., Ji, D.: Multi-granularity neural sentence model for measuring short text similarity. In: International conference on database systems for advanced applications (Springer), pp. 439–455 (2017)
Tomar, G.S., Duque, T., Täckström, O., Uszkoreit, J., Das, D.: Neural paraphrase identification of questions with noisy pretraining. arXiv:1704.04565 (2017)
Reimers, N., Gurevych, I.: Reporting score distributions makes a difference: performance study of lstm-networks for sequence tagging. arXiv:1707.09861 (2017)
Hussain, M.J., Wasti, S.H., Huang, G., Wei, L., Jiang, Y., Tang, Y.: An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances. Inf. Process. Manag. 57(3), 102188 (2020)
Article Google Scholar
Mohamed, M., Oussalah, M.: A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics. Lang. Resour. Eval. 2019, 1–29 (2019)
Google Scholar
Haneef, I., Nawab, A., Muhammad, R., Munir, E.U., Bajwa, I.S.: Design and development of a large cross-lingual plagiarism corpus for Urdu-English language Pair. Sci. Program. 2019, 5 (2019)
Google Scholar
Duong, P.H., Nguyen, H.T., Duong, H.N., Ngo, K., Ngo, D.: A hybrid approach to paraphrase detection. In: 2018 5th NAFOSTED conference on information and computer science (NICS)
Shahmohammadi, H., Dezfoulian, M., Mansoorizadeh, M.: An extensive comparison of feature extraction methods for paraphrase detection. In: 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE), pp. 47–51 (2018)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems , pp. 3856–3866 (2017)
Zhang, B., Xu, X., Yang, M., Chen, X., Ye, Y.: Cross-domain sentiment classification by capsule network with semantic rules. IEEE Access 6, 58284 (2018)
Article Google Scholar
Katarya, R., Arora, Y.: Study on text classification using capsule networks. In: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) (IEEE) , pp. 501–505 (2019)
Bhattacharjee, U.: Capsule network on social media text: an application to automatic detection of clickbaits. In: 2019 11th International Conference on Communication Systems & Networks (COMSNETS) (IEEE), pp. 473–476 (2019)
Kim, J., Jang, S., Park, E., Choi, S.: Text classification using capsules. Neurocomputing 376, 214 (2020)
Article Google Scholar
Li, W., Qi, F., Tang, M., Yu, Z.: Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing (2020)
Gong, J., Qiu, X., Wang, S., Huang, X.: Information aggregation via dynamic routing for sequence encoding. arXiv:1806.01501 (2018)
Sánchez-Vega, F., Villatoro-Tello, E., Montes-y Gómez, M., Rosso, P., Stamatatos, E., Villaseñor-Pineda, L.: Paraphrase plagiarism identification with character-level features. Pattern Anal. Appl. 22(2), 669 (2019)
Article MathSciNet Google Scholar
Quan, Z., Wang, Z.J., Le, Y., Yao, B., Li, K., Yin, J.: An efficient framework for sentence similarity modeling. IEEE/ACM Trans. Audio Speech Lang Process 27(4), 853 (2019)
Article Google Scholar
Liu, W., Liu, P., Yi, J., Yang, Y., Liu, W., Li, N.: A sentence similarity model based on word embeddings and dependency syntax-tree. In: International conference on neural information processing (Springer), pp. 126–137 (2018)
Amir, S., Tanasescu, A., Zighed, D.A.: Sentence similarity based on semantic kernels for intelligent text retrieval. J. Intell. Inf. Syst. 48(3), 675 (2017)
Article Google Scholar
Shajalal, M., Aono, M.: Semantic textual similarity between sentences using bilingual word semantics. Progress Artif. Intell. 8(2), 263 (2019)
Article Google Scholar
Lei, K., Fu, Q., Liang, Y.: Multi-task learning with capsule networks. In: 2019 international joint conference on neural networks (IJCNN) (IEEE), pp. 1–8 (2019)
Patrick, M.K., Adekoya, A.F., Mighty, A.A., Edward, B.Y.: Capsule networks–a survey. J. King Saud Univ. Comput. Inf. Sci. 2019, 5 (2019)
Xiong, Y., Su, G., Ye, S., Sun, Y., Sun, Y.: Deeper capsule network for complex data. In: 2019 international joint conference on neural networks (IJCNN) (IEEE), pp. 1–8 (2019)
Wu, Y., Li, J., Wu, J., Chang, J.: Siamese capsule networks with global and local features for text classification. Neurocomputing 2020, 5 (2020)
Google Scholar
Yang, M., Zhao, W., Chen, L., Qu, Q., Zhao, Z., Shen, Y.: Investigating the transferring capability of capsule networks for text classification. Neural Netw. 118, 247 (2019)
Article Google Scholar
Loper, E.: loper2stevenbird. In: Proceedings of the ACL-02 Workshop on EffectiveTools and methodologies for teaching natural language processing and computational linguistics, vol 1 (2019)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Question pairs dataset. (2019). https://www.kaggle.com/c/quora-question-pairs
Dauphin, Y., De Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. In: Advances in neural information processing systems, pp. 1504–1512 (2015)
Chhachhiya, D., Sharma, A., Gupta, M.: Designing optimal architecture of recurrent neural network (LSTM) with particle swarm optimization technique specifically for educational dataset. Int. J. Inf. Technol. 11(1), 159 (2019)
Google Scholar
Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. arXiv:1901.11504 (2019)
Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. arXiv:1908.00300 (2019)
Mirakyan, M., Hambardzumyan, K., Khachatrian, H.: Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report. arXiv:1802.03198 (2018)
Choi, J., Kim, T., Lee, S.g.: Cell-aware stacked LSTMs for modeling sentences. arXiv:1809.02279 (2018)
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences (2017)
Subramanian, S., Trischler, A., Bengio, Y., Pal, C.J.: Learning general purpose distributed sentence representations via large scale multi-task learning (2018)

Download references

Author information

Authors and Affiliations

Department of Computer Science, Bharati Vidyapeeth’s College of Engineering, New Delhi, India
Rachna Jain, Abhishek Kathuria, Anubhav Singh, Anmol Saxena & Anjali Khandelwal

Authors

Rachna Jain
View author publications
You can also search for this author in PubMed Google Scholar
Abhishek Kathuria
View author publications
You can also search for this author in PubMed Google Scholar
Anubhav Singh
View author publications
You can also search for this author in PubMed Google Scholar
Anmol Saxena
View author publications
You can also search for this author in PubMed Google Scholar
Anjali Khandelwal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anubhav Singh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jain, R., Kathuria, A., Singh, A. et al. ParaCap: paraphrase detection model using capsule network. Multimedia Systems 28, 1877–1895 (2022). https://doi.org/10.1007/s00530-020-00746-6

Download citation

Published: 22 January 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00530-020-00746-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ParaCap: paraphrase detection model using capsule network

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on deep learning approaches for text-to-SQL

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ParaCap: paraphrase detection model using capsule network

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

A survey on deep learning approaches for text-to-SQL

Evaluating the Feasibility of ChatGPT in Healthcare: An Analysis of Multiple Clinical and Research Scenarios

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation