Skip to main content
Log in

Automatic video clip and mixing based on semantic sentence matching

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Semanticsentence matching is a crucial task of natural language processing. However, semantic sentence matching is mainly used in text domain. For video clip and mixing, it explored less. Existing methods mainly focus on mapping text and video into latent spaces in video clip and mixing, but their extractor lack the ability to get effective information. So, we present a M ulti F eature F usion semantic sentence matching model (MFF), which forms the double filtering. The double filtering is designed for filtering to the similar semantic fragments in video clip and mixing, reducing the burden of heavy manual video editing. Experiments are conducted on two datasets, namely, SNLI and Quora Question Pairs, to verify that MFF can significantly improve the accuracy. Results show that MMF improves the performance of SNLI and Quora Question Pairs datasets to 75.3% and 76.7% (accuracy), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences, pp 4144–4150

  2. Ghaeini R, Hasan S A, Datla V, Liu J, Lee K, Qadir A, Ling Y, Prakash A, Fern X, Farri O (2018) Dr-bilstm: Dependent reading bidirectional lstm for natural language inference, pp 1460– 1469

  3. Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. Association for Computational Linguistics, pp 670–680

  4. Álvarez-Carmona M A, Franco-Salvador M, Villatoro-Tello E, Montes-y Gómez M, Rosso P, Villaseñor-Pineda L (2018) Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J Intell Fuzzy Syst 34(5):2983–2990

    Article  Google Scholar 

  5. Choi E, Hewlett D, Uszkoreit J, Polosukhin I, Lacoste A, Berant J (2017) Coarse-to-fine question answering for long documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 209–220

  6. Shen D, Min M R, Li Y, Carin L (2020) Learning context-aware convolutional filters for text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP 2018. Association for Computational Linguistics, pp 1839–1848

  7. Shen T, Zhou T, Long G, Jiang J, Wang S, Zhang C (2018) Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling, pp 4345–4352

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  9. Bowman S R, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. Association for Computational Linguistics (ACL), pp 632–642

  10. Pennington J, Socher R, Manning C D (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Citeseer, pp 1532–1543

  11. Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training

  12. Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge, pp 2406–2417

  13. Tan C, Wei F, Wang W, Lv W, Zhou M (2018) Multiway attention networks for modeling sentence pairs.. In: IJCAI, pp 4411–4417

  14. Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  15. Li P, Yu H, Zhang W, Xu G, Sun X (2020) Sa-nli: A supervised attention based framework for natural language inference. Neurocomputing 407:72–82

    Article  Google Scholar 

  16. Wang Z, Yan D (2021) Sentence matching with deep self-attention and co-attention features. In: International conference on knowledge science, engineering and management. Springer, pp 550–561

  17. Liu M, Zhang Y, Xu J, Chen Y (2021) Deep bi-directional interaction network for sentence matching. Appl Intell:1–25

  18. Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences, vol 27, pp 2042–2050

  19. Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 373–382

  20. Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272

    Article  Google Scholar 

  21. Meng L, Li Y, Liu M, Shu P (2016) Skipping word: A character-sequential representation based framework for question answering. In: Proceedings of the 25th acm international on conference on information and knowledge management, pp 1869–1872

  22. Dauphin Y N, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941

  23. Dai Z, Yang Z, Yang Y, Carbonell J G, Le Q, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context, pp 2978–2988

  24. Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10941–10950

  25. Vu N T, Adel H, Gupta P, et al. (2016) Combining recurrent and convolutional neural networks for relation classification. In: Proceedings of NAACL-HLT, pp 534–539

  26. Bai Q, Wu Y, Zhou J, He L (2021) Aligned variational autoencoder for matching danmaku and video storylines. Neurocomputing 454:228–237

    Article  Google Scholar 

  27. Chen J, Hu H, Wu H, Jiang Y, Wang C (2021) Learning the best pooling strategy for visual semantic embedding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15789–15798

  28. Chen Z, Ma L, Luo W, Tang P, Wong K-Y K (2020) Look closer to ground better: weakly-supervised temporal grounding of sentence in video

  29. Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013) Distributed representations of words and phrases and their compositionality, vol 26, pp 3111–3119

  30. Schuster M, Paliwal K K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  31. Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 6586–6593

  32. Shen D, Zhang Y, Henao R, Su Q, Carin L (2018) Deconvolutional latent-variable model for text sequence matching. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

  33. Bromley J, Bentz J. W., Bottou L, Guyon I, LeCun Y, Moore C, Säckinger E, Shah R (1993) Signature verification using a ’siamese’ time delay neural network. Int J Pattern Recogn Artif Intell 7(04):669–688

  34. Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318

Download references

Acknowledgements

This research was funded by Shandong Province Major Science and Technology Innovation Project Grants 2019 JZZY0101128; the National Natural Science Foundation of China (61872073); the Fundamental Research Funds for the Central Universities (N2126005, N2126002); National Natural Science Foundation of Liaoning (2021-MS-101).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zixi Jia or Jingyu Ru.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, Z., Li, J., Du, Z. et al. Automatic video clip and mixing based on semantic sentence matching. Appl Intell 53, 2133–2146 (2023). https://doi.org/10.1007/s10489-022-03226-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03226-8

Keywords

Navigation