skip to main content
10.1145/3581783.3611978acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos

Published: 27 October 2023 Publication History

Abstract

Human mesh reconstruction (HMR) from monocular video is the key step to many mixed reality and robotic applications. Although existing methods show promising results by capturing frames' temporal information, these methods predict human mesh with the design of implicit temporal learning modules in a sequence to frame manner. To mine more temporal information from the video, we present a bi-level clip inference network for HMR, which leverages both local motion and global context explicitly for dense 3D reconstruction. Specifically, we propose a novel bi-level temporal fusion strategy that takes both neighboring and long-range relations into consideration. In addition, different from traditional frame-wise operation, we investigate an alternative perspective by treating video-based HMR as clip-wise inference. We evaluate the proposed method on multiple datasets (3DPW, Human3.6M, and MPI-INF-3DHP) quantitatively and qualitatively, demonstrating a significant improvement over existing methods (in terms of PA-MPJPE, ACC-Error etc). Furthermore, we extend the proposed method on more challenging Multiple Shots HMR task to demonstrate its generalizability. Some visual demos can be seen https://github.com/bicf0/bicf_demo.

Supplemental Material

MP4 File
This is the presentation for the accepted paper by ACM International Conference on Multimedia, that is Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos. In this paper presentation, six parts are included. First, we formulate Human Mesh Reconstruction and introduce the related work. Next, we discuss the main challenge when we try reconstructing human mesh from monocular videos. Then, we introduce the idea to address the above issues. And then, we present the details of our proposed method. After that, we demonstrate the effectiveness of our proposed method. Finally, we summarize the main content of this paper.

References

[1]
Mykhaylo Andriluka, Umar Iqbal, Eldar Insafutdinov, Leonid Pishchulin, Anton Milan, Juergen Gall, and Bernt Schiele. 2018. Posetrack: A benchmark for human pose estimation and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5167--5176.
[2]
Anurag Arnab, Carl Doersch, and Andrew Zisserman. 2019. Exploiting temporal context for 3D human pose estimation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3395--3404.
[3]
Juhan Bae and Roger B Grosse. 2020. Delta-stn: Efficient bilevel optimization for neural networks using structured response jacobians. In Advances in Neural Information Processing Systems. 21725--21737.
[4]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
[5]
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision. 561--578.
[6]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6299--6308.
[7]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[8]
Hongsuk Choi, Gyeongsik Moon, Ju Yong Chang, and Kyoung Mu Lee. 2021. Beyond static features for temporally consistent 3d human pose and shape from a video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1964--1973.
[9]
Hongsuk Choi, Gyeongsik Moon, and Kyoung Mu Lee. 2020. Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose. In Proceedings of the European Conference on Computer Vision. 769--787.
[10]
Carl Doersch and Andrew Zisserman. 2019. Sim2real transfer learning for 3d human pose estimation: motion to the rescue. In Advances in Neural Information Processing Systems.
[11]
Junting Dong, Qing Shuai, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, and Hujun Bao. 2020. Motion capture from internet videos. In In Proceedings of the European Conference on Computer Vision. 210--227.
[12]
Sai Kumar Dwivedi, Nikos Athanasiou, Muhammed Kocabas, and Michael J Black. 2021. Learning to regress bodies from images using differentiable semantic rendering. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11250--11259.
[13]
Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batmanghelich, and Dacheng Tao. 2018. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2002--2011.
[14]
Shubham Goel, Georgios Pavlakos, Jathushan Rajasegaran, Angjoo Kanazawa, and Jitendra Malik. 2023. Humans in 4D: Reconstructing and Tracking Humans with Transformers. arXiv preprint arXiv:2305.20091 (2023).
[15]
Shenjian Gong, Shanshan Zhang, Jian Yang, Dengxin Dai, and Bernt Schiele. 2022. Bi-level alignment for cross-domain crowd counting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7542--7550.
[16]
Chunhui Gu, Chen Sun, David A Ross, Carl Vondrick, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, et al. 2018. Ava: A video dataset of spatio-temporally localized atomic visual actions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6047--6056.
[17]
Shanyan Guan, Jingwei Xu, Michelle Zhang He, Yunbo Wang, Bingbing Ni, and Xiaokang Yang. 2022. Out-of-domain human mesh reconstruction via dynamic bilevel online adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 4 (2022), 5070--5086.
[18]
Shanyan Guan, Jingwei Xu, Yunbo Wang, Bingbing Ni, and Xiaokang Yang. 2021. Bilevel online adaptation for out-of-domain human mesh reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10472--10481.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 770--778.
[20]
Yinghao Huang, Federica Bogo, Christoph Lassner, Angjoo Kanazawa, Peter V Gehler, Javier Romero, Ijaz Akhter, and Michael J Black. 2017. Towards accurate marker-less human shape and pose estimation over time. In 2017 international conference on 3D vision. 421--430.
[21]
Catalin Ionescu, Fuxin Li, and Cristian Sminchisescu. 2011. Latent structured models for human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2220--2227.
[22]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 7 (2013), 1325--1339.
[23]
Hanbyul Joo, Natalia Neverova, and Andrea Vedaldi. 2021. Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 2021 International Conference on 3D Vision. 42--52.
[24]
Angjoo Kanazawa, Michael J Black, David W Jacobs, and Jitendra Malik. 2018. End-to-end recovery of human shape and pose. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7122--7131.
[25]
Angjoo Kanazawa, Jason Y Zhang, Panna Felsen, and Jitendra Malik. 2019. Learning 3d human dynamics from video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5614--5623.
[26]
Muhammed Kocabas, Nikos Athanasiou, and Michael J Black. 2020. Vibe: Video inference for human body pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5253--5263.
[27]
Muhammed Kocabas, Chun-Hao P Huang, Otmar Hilliges, and Michael J Black. 2021. PARE: Part attention regressor for 3D human body estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11127--11137.
[28]
Nikos Kolotouros, Georgios Pavlakos, Michael J Black, and Kostas Daniilidis. 2019b. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2252--2261.
[29]
Nikos Kolotouros, Georgios Pavlakos, and Kostas Daniilidis. 2019a. Convolutional mesh regression for single-image human shape reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4501--4510.
[30]
Nikos Kolotouros, Georgios Pavlakos, Dinesh Jayaraman, and Kostas Daniilidis. 2021. Probabilistic modeling for human mesh recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11605--11614.
[31]
Gun-Hee Lee and Seong-Whan Lee. 2021. Uncertainty-aware human mesh recovery from video by learning part-based 3d dynamics. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12375--12384.
[32]
Jiefeng Li, Siyuan Bian, Chao Xu, Gang Liu, Gang Yu, and Cewu Lu. 2022a. D &D: Learning Human Dynamics from Dynamic Camera. In Proceedings of the European Conference on Computer Vision. 479--496.
[33]
Kejie Li, Yansong Tang, Victor Adrian Prisacariu, and Philip HS Torr. 2022c. Bnv-fusion: dense 3D reconstruction using bi-level neural volume fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6166--6175.
[34]
Zhihao Li, Jianzhuang Liu, Zhensong Zhang, Songcen Xu, and Youliang Yan. 2022b. Cliff: Carrying location information in full frames into human pose and shape estimation. In Proceedings of the European Conference on Computer Vision. 590--606.
[35]
Ziwen Li, Bo Xu, Han Huang, Cheng Lu, and Yandong Guo. 2022d. Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 430--439.
[36]
Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021a. End-to-end human pose and mesh reconstruction with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1954--1963.
[37]
Kevin Lin, Lijuan Wang, and Zicheng Liu. 2021b. Mesh graphormer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12939--12948.
[38]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055 (2018).
[39]
Risheng Liu, Jiaxin Gao, Jin Zhang, Deyu Meng, and Zhouchen Lin. 2021. Investigating bi-level optimization for learning and vision from a unified perspective: A survey and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 12 (2021), 10045--10067.
[40]
Xiaoxiao Long, Lingjie Liu, Wei Li, Christian Theobalt, and Wenping Wang. 2021. Multi-view depth estimation using epipolar spatio-temporal networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8258--8267.
[41]
Matthew Loper, Naureen Mahmood, and Michael J Black. 2014. MoSh: Motion and shape capture from sparse markers. ACM Transactions on Graphics, Vol. 33, 6 (2014), 1--13.
[42]
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. 2015. SMPL: A skinned multi-person linear model. CM Transactions on Graphics, Vol. 34, 6 (2015), 1--16.
[43]
Xiankai Lu, Wenguan Wang, Jianbing Shen, David Crandall, and Jiebo Luo. 2020. Zero-shot video object segmentation with co-attention siamese networks. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 4 (2020), 2228--2242.
[44]
Xiankai Lu, Wenguan Wang, Jianbing Shen, David J Crandall, and Luc Van Gool. 2021. Segmenting objects from relational visual data. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 11 (2021), 7885--7897.
[45]
Zhengyi Luo, S Alireza Golestaneh, and Kris M Kitani. 2020. 3d human motion estimation via motion compression and refinement. In Proceedings of the Asian Conference on Computer Vision.
[46]
Matthew MacKay, Paul Vicol, Jon Lorraine, David Duvenaud, and Roger Grosse. 2019. Self-tuning networks: Bilevel optimization of hyperparameters using structured best-response functions. arXiv preprint arXiv:1903.03088 (2019).
[47]
Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. 2019. AMASS: Archive of motion capture as surface shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5442--5451.
[48]
Dushyant Mehta, Helge Rhodin, Dan Casas, Pascal Fua, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision. 506--516.
[49]
Gyeongsik Moon and Kyoung Mu Lee. 2020. I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image. In Proceedings of the European Conference on Computer Vision. 752--768.
[50]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975--10985.
[51]
Georgios Pavlakos, Jitendra Malik, and Angjoo Kanazawa. 2022. Human mesh recovery from multiple shots. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1485--1495.
[52]
Zheyun Qin, Xiankai Lu, Xiushan Nie, Dongfang Liu, Yilong Yin, and Wenguan Wang. 2023. Coarse-to-fine video instance segmentation with factorized conditional appearance flows. IEEE/CAA Journal of Automatica Sinica, Vol. 10, 5 (2023), 1192--1208.
[53]
Steven M Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. 2006. A comparison and evaluation of multi-view stereo reconstruction algorithms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 519--528.
[54]
Ankur Sinha, Pekka Malo, and Kalyanmoy Deb. 2017. A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, Vol. 22, 2 (2017), 276--295.
[55]
Heinrich von Stackelberg et al. 1952. Theory of the market economy. (1952).
[56]
Yu Sun, Yun Ye, Wu Liu, Wenpeng Gao, Yili Fu, and Tao Mei. 2019. Human mesh recovery from monocular images via a skeleton-disentangled representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5349--5358.
[57]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems.
[58]
Timo Von Marcard, Roberto Henschel, Michael J Black, Bodo Rosenhahn, and Gerard Pons-Moll. 2018. Recovering accurate 3d human pose in the wild using imus and a moving camera. In Proceedings of the European Conference on Computer Vision. 601--617.
[59]
Ziniu Wan, Zhengjia Li, Maoqing Tian, Jianbo Liu, Shuai Yi, and Hongsheng Li. 2021. Encoder-decoder with multi-level attention for 3D human shape and pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 13033--13042.
[60]
Runzhong Wang, Zhigang Hua, Gan Liu, Jiayi Zhang, Junchi Yan, Feng Qi, Shuang Yang, Jun Zhou, and Xiaokang Yang. 2021. A bi-level framework for learning to solve combinatorial optimization on graphs. Advances in Neural Information Processing Systems (2021), 21453--21466.
[61]
Wen-Li Wei, Jen-Chun Lin, Tyng-Luh Liu, and Hong-Yuan Mark Liao. 2022. Capturing humans in motion: temporal-attentive 3D human pose and shape estimation from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13211--13220.
[62]
Peng Yang, Yingjie Lao, and Ping Li. 2021. Robust watermarking for deep neural networks via bi-level optimization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14841--14850.
[63]
Zhenbo Yu, Junjie Wang, Jingwei Xu, Bingbing Ni, Chenglong Zhao, Minsi Wang, and Wenjun Zhang. 2021. Skeleton2mesh: Kinematics prior injected unsupervised human mesh recovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 8619--8629.
[64]
Andrei Zanfir, Eduard Gabriel Bazavan, Mihai Zanfir, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2021a. Neural descent for visual 3d human pose and shape. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14484--14493.
[65]
Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2021b. Thundr: Transformer-based 3d human reconstruction with markers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 12971--12980.
[66]
Haokui Zhang, Chunhua Shen, Ying Li, Yuanzhouhan Cao, Yu Liu, and Youliang Yan. 2019. Exploiting temporal consistency for real-time video depth estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1725--1734.
[67]
Hongwen Zhang, Yating Tian, Xinchi Zhou, Wanli Ouyang, Yebin Liu, Limin Wang, and Zhenan Sun. 2021. Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11446--11456.
[68]
Jinlu Zhang, Zhigang Tu, Jianyu Yang, Yujin Chen, and Junsong Yuan. 2022. Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13232--13242.
[69]
Weiyu Zhang, Menglong Zhu, and Konstantinos G Derpanis. 2013. From actemes to action: A strongly-supervised representation for detailed action understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2248--2255.
[70]
Ce Zheng, Matias Mendieta, Pu Wang, Aidong Lu, and Chen Chen. 2022. A lightweight graph transformer network for human mesh reconstruction from 2d human pose. In Proceedings of the 30th ACM International Conference on Multimedia. 5496--5507.
[71]
Xiaoyu Zhu, Po-Yao Huang, Junwei Liang, Celso M de Melo, and Alexander G Hauptmann. 2023. Stmt: A spatial-temporal mesh transformer for mocap-based action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1526--1536.

Cited By

View all
  • (2025)A Review of Human Mesh Reconstruction: Beyond 2D Video Object SegmentationSocial Robotics10.1007/978-981-96-1151-5_17(167-176)Online publication date: 7-Feb-2025
  • (2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
  • (2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Clip Fusion with Bi-level Optimization for Human Mesh Reconstruction from Monocular Videos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bi-level optimization
    2. clip-wise inference
    3. human mesh reconstruction

    Qualifiers

    • Research-article

    Funding Sources

    • The Open Research Project Programme of the State Key Laboratory of Internet of Things for Smart City(University of Macau)(Ref. No.:SKL-IoTSC(UM)-2021-2023/ORP/GA05/2022)
    • The FDCT grants 0154/2022/A3
    • The Major basic research project of Shandong Natural Science Foundation
    • The MYRG-CRG2022-00013-IOTSC-ICI grant
    • National Natural Science Foundation of China
    • Natural Science Foundation of Shandong Province
    • The Young Elite Scientists Sponsorship Program by CAST
    • SKL-IOTSC(UM)-2021-2023
    • The Open project of Key Laboratory of Artificial Intelligence, Ministry of Education

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)163
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)A Review of Human Mesh Reconstruction: Beyond 2D Video Object SegmentationSocial Robotics10.1007/978-981-96-1151-5_17(167-176)Online publication date: 7-Feb-2025
    • (2024)ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from VideosProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680881(1514-1523)Online publication date: 28-Oct-2024
    • (2024)Towards Practical Human Motion Prediction with LiDAR Point CloudsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680720(7629-7638)Online publication date: 28-Oct-2024
    • (2024)Extending Implicit Neural Representations for Text-to-Image GenerationICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446171(3650-3654)Online publication date: 14-Apr-2024
    • (2024)Self-supervised spatial–temporal feature enhancement for one-shot video object detectionNeurocomputing10.1016/j.neucom.2024.128219600:COnline publication date: 1-Oct-2024
    • (2024)SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classificationMultimedia Tools and Applications10.1007/s11042-024-19615-983:39(86457-86478)Online publication date: 19-Jun-2024
    • (2024)Differential motion attention network for efficient action recognitionThe Visual Computer10.1007/s00371-024-03478-041:3(1719-1731)Online publication date: 13-Jun-2024
    • (2024)SimpliFusion: a simplified infrared and visible image fusion networkThe Visual Computer10.1007/s00371-024-03423-141:2(1335-1350)Online publication date: 29-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media