Automatic video clip and mixing based on semantic sentence matching

Jia, Zixi; Li, Jiao; Du, Zhengjun; Ru, Jingyu; Wang, Yating; Wu, Chengdong; Zhang, Yutong; Yu, Shuangjiang; Wang, Zhou; Sun, Changsheng; Lyu, Ao

doi:10.1007/s10489-022-03226-8

Automatic video clip and mixing based on semantic sentence matching

Published: 05 May 2022

Volume 53, pages 2133–2146, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zixi Jia ORCID: orcid.org/0000-0002-6110-9240¹,
Jiao Li¹,
Zhengjun Du²,
Jingyu Ru¹,
Yating Wang¹,
Chengdong Wu¹,
Yutong Zhang¹,
Shuangjiang Yu¹,
Zhou Wang¹,
Changsheng Sun¹ &
…
Ao Lyu¹

330 Accesses
1 Altmetric
Explore all metrics

Abstract

Semanticsentence matching is a crucial task of natural language processing. However, semantic sentence matching is mainly used in text domain. For video clip and mixing, it explored less. Existing methods mainly focus on mapping text and video into latent spaces in video clip and mixing, but their extractor lack the ability to get effective information. So, we present a M ulti F eature F usion semantic sentence matching model (MFF), which forms the double filtering. The double filtering is designed for filtering to the similar semantic fragments in video clip and mixing, reducing the burden of heavy manual video editing. Experiments are conducted on two datasets, namely, SNLI and Quora Question Pairs, to verify that MFF can significantly improve the accuracy. Results show that MMF improves the performance of SNLI and Quora Question Pairs datasets to 75.3% and 76.7% (accuracy), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Natural language processing: state of the art, current trends and challenges

Article 14 July 2022

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

Article 30 January 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Wang Z, Hamza W, Florian R (2017) Bilateral multi-perspective matching for natural language sentences, pp 4144–4150
Ghaeini R, Hasan S A, Datla V, Liu J, Lee K, Qadir A, Ling Y, Prakash A, Fern X, Farri O (2018) Dr-bilstm: Dependent reading bidirectional lstm for natural language inference, pp 1460– 1469
Conneau A, Kiela D, Schwenk H, Barrault L, Bordes A (2017) Supervised learning of universal sentence representations from natural language inference data. Association for Computational Linguistics, pp 670–680
Álvarez-Carmona M A, Franco-Salvador M, Villatoro-Tello E, Montes-y Gómez M, Rosso P, Villaseñor-Pineda L (2018) Semantically-informed distance and similarity measures for paraphrase plagiarism identification. J Intell Fuzzy Syst 34(5):2983–2990
Article Google Scholar
Choi E, Hewlett D, Uszkoreit J, Polosukhin I, Lacoste A, Berant J (2017) Coarse-to-fine question answering for long documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 209–220
Shen D, Min M R, Li Y, Carin L (2020) Learning context-aware convolutional filters for text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing, EMNLP 2018. Association for Computational Linguistics, pp 1839–1848
Shen T, Zhou T, Long G, Jiang J, Wang S, Zhang C (2018) Reinforced self-attention network: a hybrid of hard and soft attention for sequence modeling, pp 4345–4352
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Bowman S R, Angeli G, Potts C, Manning C D (2015) A large annotated corpus for learning natural language inference. Association for Computational Linguistics (ACL), pp 632–642
Pennington J, Socher R, Manning C D (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). Citeseer, pp 1532–1543
Radford A, Narasimhan K, Salimans T, Sutskever I (2018) Improving language understanding by generative pre-training
Chen Q, Zhu X, Ling Z-H, Inkpen D, Wei S (2018) Neural natural language inference models enhanced with external knowledge, pp 2406–2417
Tan C, Wei F, Wang W, Lv W, Zhou M (2018) Multiway attention networks for modeling sentence pairs.. In: IJCAI, pp 4411–4417
Shen T, Zhou T, Long G, Jiang J, Pan S, Zhang C (2018) Disan: Directional self-attention network for rnn/cnn-free language understanding. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Li P, Yu H, Zhang W, Xu G, Sun X (2020) Sa-nli: A supervised attention based framework for natural language inference. Neurocomputing 407:72–82
Article Google Scholar
Wang Z, Yan D (2021) Sentence matching with deep self-attention and co-attention features. In: International conference on knowledge science, engineering and management. Springer, pp 550–561
Liu M, Zhang Y, Xu J, Chen Y (2021) Deep bi-directional interaction network for sentence matching. Appl Intell:1–25
Hu B, Lu Z, Li H, Chen Q (2014) Convolutional neural network architectures for matching natural language sentences, vol 27, pp 2042–2050
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, pp 373–382
Yin W, Schütze H, Xiang B, Zhou B (2016) Abcnn: Attention-based convolutional neural network for modeling sentence pairs. Trans Assoc Comput Linguist 4:259–272
Article Google Scholar
Meng L, Li Y, Liu M, Shu P (2016) Skipping word: A character-sequential representation based framework for question answering. In: Proceedings of the 25th acm international on conference on information and knowledge management, pp 1869–1872
Dauphin Y N, Fan A, Auli M, Grangier D (2017) Language modeling with gated convolutional networks. In: International conference on machine learning. PMLR, pp 933–941
Dai Z, Yang Z, Yang Y, Carbonell J G, Le Q, Salakhutdinov R (2019) Transformer-xl: Attentive language models beyond a fixed-length context, pp 2978–2988
Wei X, Zhang T, Li Y, Zhang Y, Wu F (2020) Multi-modality cross attention network for image and sentence matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10941–10950
Vu N T, Adel H, Gupta P, et al. (2016) Combining recurrent and convolutional neural networks for relation classification. In: Proceedings of NAACL-HLT, pp 534–539
Bai Q, Wu Y, Zhou J, He L (2021) Aligned variational autoencoder for matching danmaku and video storylines. Neurocomputing 454:228–237
Article Google Scholar
Chen J, Hu H, Wu H, Jiang Y, Wang C (2021) Learning the best pooling strategy for visual semantic embedding. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15789–15798
Chen Z, Ma L, Luo W, Tang P, Wong K-Y K (2020) Look closer to ground better: weakly-supervised temporal grounding of sentence in video
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J (2013) Distributed representations of words and phrases and their compositionality, vol 26, pp 3111–3119
Schuster M, Paliwal K K (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Kim S, Kang I, Kwak N (2019) Semantic sentence matching with densely-connected recurrent and co-attentive information. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 6586–6593
Shen D, Zhang Y, Henao R, Su Q, Carin L (2018) Deconvolutional latent-variable model for text sequence matching. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Bromley J, Bentz J. W., Bottou L, Guyon I, LeCun Y, Moore C, Säckinger E, Shah R (1993) Signature verification using a ’siamese’ time delay neural network. Int J Pattern Recogn Artif Intell 7(04):669–688
Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. Proceedings of the 40th annual meeting of the association for computational linguistics, pp 311–318

Download references

Acknowledgements

This research was funded by Shandong Province Major Science and Technology Innovation Project Grants 2019 JZZY0101128; the National Natural Science Foundation of China (61872073); the Fundamental Research Funds for the Central Universities (N2126005, N2126002); National Natural Science Foundation of Liaoning (2021-MS-101).

Author information

Authors and Affiliations

Faculty of Robot Science and Engineering, Northeastern University, Shenyang, 110819, China
Zixi Jia, Jiao Li, Jingyu Ru, Yating Wang, Chengdong Wu, Yutong Zhang, Shuangjiang Yu, Zhou Wang, Changsheng Sun & Ao Lyu
SIASUN Robot Automation CO. Ltd., Shenyang, 110819, China
Zhengjun Du

Authors

Zixi Jia
View author publications
You can also search for this author in PubMed Google Scholar
Jiao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhengjun Du
View author publications
You can also search for this author in PubMed Google Scholar
Jingyu Ru
View author publications
You can also search for this author in PubMed Google Scholar
Yating Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chengdong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yutong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuangjiang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Ao Lyu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zixi Jia or Jingyu Ru.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jia, Z., Li, J., Du, Z. et al. Automatic video clip and mixing based on semantic sentence matching. Appl Intell 53, 2133–2146 (2023). https://doi.org/10.1007/s10489-022-03226-8

Download citation

Accepted: 09 January 2022
Published: 05 May 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10489-022-03226-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic video clip and mixing based on semantic sentence matching

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic video clip and mixing based on semantic sentence matching

Abstract

Access this article

Similar content being viewed by others

Natural language processing: state of the art, current trends and challenges

Automatic question generation: a review of methodologies, datasets, evaluation metrics, and applications

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation