Abstract
Point cloud frame interpolation is a challenging task that involves accurate scene flow estimation across frames and maintaining the geometry structure. Prevailing techniques often rely on pre-trained motion estimators or intensive testing-time optimization, resulting in compromised interpolation accuracy or prolonged inference. This work presents FastPCI that introduces Pyramid Convolution-Transformer architecture for point cloud frame interpolation. Our hybrid Convolution-Transformer improves the local and long-range feature learning, while the pyramid network offers multilevel features and reduces the computation. In addition, FastPCI proposes a unique Dual-Direction Motion-Structure block for more accurate scene flow estimation. Our design is motivated by two facts: (1) accurate scene flow preserves 3D structure, and (2) point cloud at the previous timestep should be reconstructable using reverse motion from future timestep. Extensive experiments show that FastPCI significantly outperforms the state-of-the-art PointINet and NeuralPCI with notable gains (e.g. 26.6% and 18.3% reduction in Chamfer Distance in KITTI), while being more than 10\(\times \) and 600\(\times \) faster, respectively. Code is available at https://github.com/genuszty/FastPCI.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bao, W., Lai, W.S., Ma, C., Zhang, X., Gao, Z., Yang, M.H.: Depth-aware video frame interpolation. In: CVPR (2019)
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. In: CVPR, pp. 11621–11631 (2020)
Chang, M.F., et al.: Argoverse: 3d tracking and forecasting with rich maps. In: CVPR, pp. 8748–8757 (2019)
Garrido, D., Rodrigues, R., Augusto Sousa, A., Jacob, J., Castro Silva, D.: Point cloud interaction and manipulation in virtual reality. In: 2021 5th International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 15–20 (2021)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: CVPR, pp. 3354–3361 (2012)
Huang, Z., Zhang, T., Heng, W., Shi, B., Zhou, S.: Real-time intermediate flow estimation for video frame interpolation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. LNCS, vol. 13674. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19781-9_36
Kalluri, T., Pathak, D., Chandraker, M., Tran, D.: FLAVR: flow-agnostic video representations for fast frame interpolation. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 2070–2081 (2021)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kong, L., et al.: IFRNet: intermediate feature refine network for efficient frame interpolation. In: CVPR (2022)
Li, G., et al.: Deepgcns: making gcns go as deep as cnns. IEEE TPAMI PP (2021)
Li, X., Kaesemodel Pontes, J., Lucey, S.: Neural scene flow prior. NeurIPS 34 (2021)
Liu, H., Liao, K., Lin, C., Zhao, Y., Guo, Y.: Pseudo-lidar point cloud interpolation based on 3D motion representation and spatial supervision. IEEE Trans. Intell. Transp. Syst. 23(7), 6379–6389 (2021)
Liu, H., Liao, K., Lin, C., Zhao, Y., Liu, M.: PLIN: a network for pseudo-lidar point cloud interpolation. Sensors 20(6), 1573 (2020)
Lu, F., Chen, G., Qu, S., Li, Z., Liu, Y., Knoll, A.: Pointinet: point cloud frame interpolation network. In: AAAI, pp. 2251–2259 (2021)
Lu, L., Wu, R., Lin, H., Lu, J., Jia, J.: Video frame interpolation with transformer. In: CVPR (2022)
Luo, W., Yang, B., Urtasun, R.: Fast and furious: real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In: CVPR, pp. 3569–3577 (2018)
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_24
Niklaus, S., Liu, F.: Context-aware synthesis for video frame interpolation. In: CVPR (2018)
Park, J., Ko, K., Lee, C., Kim, C.-S.: BMBC: bilateral motion estimation with bilateral cost volume for video interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 109–125. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_7
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. NeurIPS 32 (2019)
Pumarola, A., Corona, E., Pons-Moll, G., Moreno-Noguer, F.: D-NeRF: neural radiance fields for dynamic scenes. In: CVPR, pp. 10318–10327 (2021)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3d classification and segmentation. In: CVPR (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)
Qian, G., Hamdi, A., Zhang, X., Ghanem, B.: Pix4Point: image pretrained standard transformers for 3d point cloud understanding. In: 2024 International Conference on 3D Vision (3DV), pp. 1280–1290 (2024)
Qian, G., Hammoud, H., Li, G., Thabet, A., Ghanem, B.: Assanet: an anisotropic separable set abstraction for efficient point cloud representation learning. NeurIPS 34 (2021)
Qian, G., et al.: Pointnext: revisiting pointnet++ with improved training and scaling strategies. In: NeurIPS (2022)
Ronneberger, O., Fischer, P., Brox, T.: Unet: convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention (MICCAI), pp. 234–241 (2015)
Sim, H., Oh, J., Kim, M.: XVFI: extreme video frame interpolation. In: ICCV (2021)
Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. (TOG) (2019)
Wu, W., Qi, Z., Fuxin, L.: Pointconv: deep convolutional networks on 3D point clouds. In: CVPR (2019)
Wu, W., Wang, Z.Y., Li, Z., Liu, W., Fuxin, L.: PointPWC-Net: cost volume on point clouds for (Self-)supervised scene flow estimation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 88–107. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_6
Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point transformer v2: grouped vector attention and partition-based pooling. Adv. Neural Inf. Process. Syst. 35, 33330–33342 (2022)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. CoRR (2015)
Yin, T., Zhou, X., Krähenbühl, P.: Center-based 3d object detection and tracking. In: CVPR (2021)
Zeng, Y., Qian, Y., Zhang, Q., Hou, J., Yuan, Y., He, Y.: Idea-net: dynamic 3D point cloud interpolation via deep embedding alignment. In: CVPR (2022)
Zhang, G., Zhu, Y., Wang, H., Chen, Y., Wu, G., Wang, L.: Extracting motion and appearance via inter-frame attention for efficient video frame interpolation. In: CVPR, pp. 5682–5692 (2023)
Zhang, Z., Hu, L., Deng, X., Xia, S.: Sequential 3D human pose estimation using adaptive point cloud sampling strategy. In: IJCAI, pp. 1330–1337 (2021)
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: ICCV, pp. 16259–16268 (2021)
Zheng, Z., Wu, D., Lu, R., Lu, F., Chen, G., Jiang, C.: Neuralpci: spatio-temporal neural field for 3d point cloud multi-frame non-linear interpolation. In: CVPR (2023)
Acknowledgements
The authors would like to thank the valuable feedback from the anonymous reviewers. This work is supported in part by the NSFC under Grant Nos. 62276144.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, T., Qian, G., Xie, J., Yang, J. (2025). FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15132. Springer, Cham. https://doi.org/10.1007/978-3-031-72904-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-72904-1_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72903-4
Online ISBN: 978-3-031-72904-1
eBook Packages: Computer ScienceComputer Science (R0)