Skip to main content
Log in

Detection of document modification based on deep neural networks

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

In this paper, we focus on the detection of the semantic and structural modifications in documents. We define the following six inter-document relations that we use to represent document modification: Eliminate, Extend, Merge, Split, Rewrite, and Reorder. We also develop a detection model based on a deep neural network to identify the relations between two given documents. We assumed that several modifications can be applied to a document; in this situation, the modifications can overlap each other, so it can be very difficult to detect the applied modifications. We represent a document pair by using a sentence-based similarity matrix, and the inter-document relations are then detected by applying the deep neural network to the similarity matrix. The experiments show that our model performed impressively in the detection of document modifications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. https://translate.google.com/.

References

  • Dos Santos CN, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING, pp 69–78

  • Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences. Advances in neural information processing systems, pp 2042–2050

  • Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, Berlin Heidelberg, pp 137–142

    Google Scholar 

  • Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. In: Proceedings of ACL, pp 212–217

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of EMNLP, pp 1746–1751

  • Kim NR, Choi Y, Lee H, Lee JH (2016) Detection of content changes based on deep neural networks. In: International conference on Computer Science and its Applicatioins, pp 806–811

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, pp 1097–1105

  • Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. Conference of the Association for the Advancement of Artificial Intelligence, pp 2267–2273

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of ICML, pp 1188–1196

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv:1301.3781

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, pp 3111–3119

  • Pang L, Lan Y, Guo J, Xu J, Wan S, Cheng X (2016) Text matching as image recognition. In: Proceedings of AAAI, pp 2793–2799

  • Park J, Kim J, Lee JH (2014) Keyword extraction for blogs based on content richness. J Inform Sci 40(1):38–49

    Article  Google Scholar 

  • Qiu L, Kan MY, Chua TS (2006) Paraphrase recognition via dissimilarity significance classification. In: Proceedings of EMNLP, pp 18–26

  • Rafiei M, Kardan AA (2015) A novel method for expert finding in online communities based on concept map and PageRank. Hum-Centric Comput Inform Sci 5(10):1–18

  • Sanna G, Angius A, Concas G, Manca D, Pani FE (2015) PCE: a knowledge base of semantically disambiguated contents. JoC 6(2):10–18

    Google Scholar 

  • Schutz AT (2008) Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods. Doctoral dissertation, National University of Ireland, Galway

  • Shen Y, Rong W, Sun Z, Ouyang Y, Xiong Z (2015) Question/answer matching for CQA system via combining lexical and sequential information. In: Proceedings of AAAI, pp 275–281

  • Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  • Socher R, Huang EH, Pennington J, Ng AY, Manning CD (2011) Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Advances in neural information processing systems, pp 801–809

  • Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of EMNLP, pp 1422–1432

  • Tang D, Qin B, Liu T (2016) Aspect level sentiment classification with deep memory network. arXiv:1605.08900

  • Vijayarajan V, Dinakaran M, Tejaswin P, Lohani M (2016) A generic framework for ontology-based information retrieval and image retrieval in web data. Hum-Centric Comput Inform Sci 6(18):1–30

  • Xu J, Chen D, Qiu X, Huang X (2016) Cached long short-term memory neural networks for document-level sentiment classification. arXiv:1610.04989

  • Yin W, Schütze H, Xiang B, Zhou B (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272

    Google Scholar 

  • Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. Advances in neural information processing systems, pp 649–657

Download references

Acknowledgements

This research was supported by the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Science, ICT & Future Planning (NRF-2014M3C4A7030503). This research was also supported by Ministry of Culture, Sports and Tourism (MCST) and Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & Development Program 2016 (R2016030046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jee-Hyong Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, Nr., Choi, Y., Lee, H. et al. Detection of document modification based on deep neural networks. J Ambient Intell Human Comput 9, 1089–1096 (2018). https://doi.org/10.1007/s12652-017-0617-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-017-0617-y

Keywords

Navigation