Abstract
This paper is concerned withthe problem of paraphrase detection. For a number of applications, the ability to detect similar sentences, such as text mining, summary text, plagiarism detection, authorship authentication, and question answering, is important. Given two phrases, the goal is to detect whether they are identical semantically. This work involves a novel model namely, ParaCap, which uses capsule networks for the investigation of sentences. Capsule networks understand the spatial information (context, language, length of sentences and others) by using the instantiation parameters for the better results as compared to CNNs. For the objective, the Quora Question Pair dataset containing 404291 pairs of Quora Questions is being used. The ParaCap model outperforms many state-of-art methods, and also proves to be comparable to other techniques by achieving the accuracy of 89.19%.
Similar content being viewed by others
References
Chiu, C., Zhan, J.: An evolutionary approach to compact DAG neural network optimization. IEEE Access 7, 178331 (2019)
Mahmoud, A., Zrigui, M.: Deep neural network models for paraphrased text classification in the Arabic Language. In: International conference on applications of natural language to information systems (Springer) , pp. 3–16 (2019)
Prayogo, A.H., Mubarok, A., et al.: On the structure of Bayesian network for Indonesian text document paraphrase identification. J. Phys. Conf. Ser. 971, 012051 (2018)
Sameen, S., Sharjeel, M., Nawab, R.M.A., Rayson, P., Muneer, I.: Measuring short text reuse for the Urdu language. IEEE Access 6, 7412 (2017)
Mahmood, A., Khan, H.U., Khan, W., et al.: Query based information retrieval and knowledge extraction using Hadith datasets. In: 2017 13th International Conference on Emerging Technologies (ICET) (IEEE) , pp. 1–6 (2017)
Rashid, J., Shah, S.M.A., Irtaza, A.: Fuzzy topic modeling approach for text mining over short text. Inf. Process. Manag. 56(6), 102060 (2019)
Shakeel, M.H., Karim, A., Khan, I.: A Multi-cascaded Deep Model for Bilingual SMS Classification. In: International conference on neural information processing (Springer), pp. 287–298 (2019)
Dey, K., Shrivastava, R., Kaushik, S.: A paraphrase and semantic similarity detection system for user generated short-text content on microblogs. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2880–2890 (2016)
Huang, J., Yao, S., Lyu, C., Ji, D.: Multi-granularity neural sentence model for measuring short text similarity. In: International conference on database systems for advanced applications (Springer), pp. 439–455 (2017)
Tomar, G.S., Duque, T., Täckström, O., Uszkoreit, J., Das, D.: Neural paraphrase identification of questions with noisy pretraining. arXiv:1704.04565 (2017)
Reimers, N., Gurevych, I.: Reporting score distributions makes a difference: performance study of lstm-networks for sequence tagging. arXiv:1707.09861 (2017)
Hussain, M.J., Wasti, S.H., Huang, G., Wei, L., Jiang, Y., Tang, Y.: An approach for measuring semantic similarity between Wikipedia concepts using multiple inheritances. Inf. Process. Manag. 57(3), 102188 (2020)
Mohamed, M., Oussalah, M.: A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics. Lang. Resour. Eval. 2019, 1–29 (2019)
Haneef, I., Nawab, A., Muhammad, R., Munir, E.U., Bajwa, I.S.: Design and development of a large cross-lingual plagiarism corpus for Urdu-English language Pair. Sci. Program. 2019, 5 (2019)
Duong, P.H., Nguyen, H.T., Duong, H.N., Ngo, K., Ngo, D.: A hybrid approach to paraphrase detection. In: 2018 5th NAFOSTED conference on information and computer science (NICS)
Shahmohammadi, H., Dezfoulian, M., Mansoorizadeh, M.: An extensive comparison of feature extraction methods for paraphrase detection. In: 2018 8th International Conference on Computer and Knowledge Engineering (ICCKE) (IEEE), pp. 47–51 (2018)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems , pp. 3856–3866 (2017)
Zhang, B., Xu, X., Yang, M., Chen, X., Ye, Y.: Cross-domain sentiment classification by capsule network with semantic rules. IEEE Access 6, 58284 (2018)
Katarya, R., Arora, Y.: Study on text classification using capsule networks. In: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS) (IEEE) , pp. 501–505 (2019)
Bhattacharjee, U.: Capsule network on social media text: an application to automatic detection of clickbaits. In: 2019 11th International Conference on Communication Systems & Networks (COMSNETS) (IEEE), pp. 473–476 (2019)
Kim, J., Jang, S., Park, E., Choi, S.: Text classification using capsules. Neurocomputing 376, 214 (2020)
Li, W., Qi, F., Tang, M., Yu, Z.: Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification. Neurocomputing (2020)
Gong, J., Qiu, X., Wang, S., Huang, X.: Information aggregation via dynamic routing for sequence encoding. arXiv:1806.01501 (2018)
Sánchez-Vega, F., Villatoro-Tello, E., Montes-y Gómez, M., Rosso, P., Stamatatos, E., Villaseñor-Pineda, L.: Paraphrase plagiarism identification with character-level features. Pattern Anal. Appl. 22(2), 669 (2019)
Quan, Z., Wang, Z.J., Le, Y., Yao, B., Li, K., Yin, J.: An efficient framework for sentence similarity modeling. IEEE/ACM Trans. Audio Speech Lang Process 27(4), 853 (2019)
Liu, W., Liu, P., Yi, J., Yang, Y., Liu, W., Li, N.: A sentence similarity model based on word embeddings and dependency syntax-tree. In: International conference on neural information processing (Springer), pp. 126–137 (2018)
Amir, S., Tanasescu, A., Zighed, D.A.: Sentence similarity based on semantic kernels for intelligent text retrieval. J. Intell. Inf. Syst. 48(3), 675 (2017)
Shajalal, M., Aono, M.: Semantic textual similarity between sentences using bilingual word semantics. Progress Artif. Intell. 8(2), 263 (2019)
Lei, K., Fu, Q., Liang, Y.: Multi-task learning with capsule networks. In: 2019 international joint conference on neural networks (IJCNN) (IEEE), pp. 1–8 (2019)
Patrick, M.K., Adekoya, A.F., Mighty, A.A., Edward, B.Y.: Capsule networks–a survey. J. King Saud Univ. Comput. Inf. Sci. 2019, 5 (2019)
Xiong, Y., Su, G., Ye, S., Sun, Y., Sun, Y.: Deeper capsule network for complex data. In: 2019 international joint conference on neural networks (IJCNN) (IEEE), pp. 1–8 (2019)
Wu, Y., Li, J., Wu, J., Chang, J.: Siamese capsule networks with global and local features for text classification. Neurocomputing 2020, 5 (2020)
Yang, M., Zhao, W., Chen, L., Qu, Q., Zhao, Z., Shen, Y.: Investigating the transferring capability of capsule networks for text classification. Neural Netw. 118, 247 (2019)
Loper, E.: loper2stevenbird. In: Proceedings of the ACL-02 Workshop on EffectiveTools and methodologies for teaching natural language processing and computational linguistics, vol 1 (2019)
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532–1543 (2014)
Question pairs dataset. (2019). https://www.kaggle.com/c/quora-question-pairs
Dauphin, Y., De Vries, H., Bengio, Y.: Equilibrated adaptive learning rates for non-convex optimization. In: Advances in neural information processing systems, pp. 1504–1512 (2015)
Chhachhiya, D., Sharma, A., Gupta, M.: Designing optimal architecture of recurrent neural network (LSTM) with particle swarm optimization technique specifically for educational dataset. Int. J. Inf. Technol. 11(1), 159 (2019)
Liu, X., He, P., Chen, W., Gao, J.: Multi-task deep neural networks for natural language understanding. arXiv:1901.11504 (2019)
Yang, R., Zhang, J., Gao, X., Ji, F., Chen, H.: Simple and effective text matching with richer alignment features. arXiv:1908.00300 (2019)
Mirakyan, M., Hambardzumyan, K., Khachatrian, H.: Natural Language Inference over Interaction Space: ICLR 2018 Reproducibility Report. arXiv:1802.03198 (2018)
Choi, J., Kim, T., Lee, S.g.: Cell-aware stacked LSTMs for modeling sentences. arXiv:1809.02279 (2018)
Wang, Z., Hamza, W., Florian, R.: Bilateral multi-perspective matching for natural language sentences (2017)
Subramanian, S., Trischler, A., Bengio, Y., Pal, C.J.: Learning general purpose distributed sentence representations via large scale multi-task learning (2018)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jain, R., Kathuria, A., Singh, A. et al. ParaCap: paraphrase detection model using capsule network. Multimedia Systems 28, 1877–1895 (2022). https://doi.org/10.1007/s00530-020-00746-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-020-00746-6