Implicit Neural Models to Extract Heart Rate from Video

Chari, Pradyumna; Harish, Anirudh Bindiganavale; Armouti, Adnan; Vilesov, Alexander; Sarda, Sanjit; Jalilian, Laleh; Kadambi, Achuta

doi:10.1007/978-3-031-73010-8_10

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15141))

Included in the following conference series:

European Conference on Computer Vision

182 Accesses

Abstract

Scene representation networks, or implicit neural representations (INR) have seen a range of success in numerous image and video applications. However, being universal function fitters, they fit all variations in videos without any selectivity. This is particularly a problem for tasks such as remote plethysmography, the extraction of heart rate information from face videos. As a result of low native signal to noise ratio, previous signal processing techniques suffer from poor performance, while previous learning-based methods have improved performance but suffer from hallucinations that mitigate generalizability. Directly applying prior INRs cannot remedy this signal strength deficit, since they fit to both the signal as well as interfering factors. In this work, we introduce an INR framework that increases this plethysmograph signal strength. Specifically, we leverage architectures to have selective representation capabilities. We are able to decompose the face video into a blood plethysmograph component, and a face appearance component. By inferring the plethysmograph signal from this blood component, we show state-of-the-art performance on out-of-distribution samples without sacrificing performance for in-distribution samples. We implement our framework on a custom-built multiresolution hash encoding backbone to enable practical dataset-scale representations through a 50x speed-up over traditional INRs. We also present a dataset of optically challenging out-of-distribution scenes to test generalization to real-world scenarios. Code and data may be found at https://implicitppg.github.io/.

P. Chari and A. B. Harish—Contributed Equally.

A. B. Harish and A. Armouti—Work done when at UCLA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Instantaneous Physiological Estimation Using Video Transformers

VIPL-HR: A Multi-modal Database for Pulse Estimation from Less-Constrained Face Video

PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Article Open access 15 February 2023

Notes

1.
Baselines are configured, where possible, using the toolbox from [23].

References

Al Masri, A., Jasra, S.K.: The forensic biometric analysis of emotions from facial expressions, and physiological processes from the heart and skin. J. Emerg. Forensic Sci. Res. 1(1), 61–77 (2016)
Google Scholar
Association CT: Physical activity monitoring for heart rate, ANSI/CTA-2065 (2018)
Google Scholar
Ba, Y., Wang, Z., Karinca, K.D., Bozkurt, O.D., Kadambi, A.: Overcoming difficulty in obtaining dark-skinned subjects for remote-PPG by synthetic augmentation. arXiv preprint arXiv:2106.06007 (2021)
Balakrishnan, G., Durand, F., Guttag, J.: Detecting pulse from head motions in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3430–3437 (2013)
Google Scholar
Chari, P., Ba, Y., Athreya, S., Kadambi, A.: MIME: minority inclusion for majority group enhancement of AI performance. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13673, pp. 326–343. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19778-9_19
Chapter Google Scholar
Chari, P., et al.: Diverse R-PPG: camera-based heart rate estimation for diverse subject skin-tones and scenes. arXiv preprint arXiv:2010.12769 (2020)
Chen, H., He, B., Wang, H., Ren, Y., Lim, S.N., Shrivastava, A.: NeRV: Neural representations for videos. In: Advances in Neural Information Processing Systems, vol. 34, pp. 21557–21568 (2021)
Google Scholar
Chen, W., McDuff, D.: DeepPhys: video-based physiological measurement using convolutional attention networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 349–365 (2018)
Google Scholar
Chen, Z., et al.: VideoINR: learning video implicit neural representation for continuous space-time super-resolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2047–2057 (2022)
Google Scholar
Chen, Z., Zheng, T., Cai, C., Luo, J.: MoVi-Fi: motion-robust vital signs waveform recovery via deep interpreted RF sensing. In: Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, pp. 392–405 (2021)
Google Scholar
De Haan, G., Jeanne, V.: Robust pulse rate from chrominance-based RPPG. IEEE Trans. Biomed. Eng. 60(10), 2878–2886 (2013)
Article Google Scholar
Del Regno, K., et al.: Thermal imaging and radar for remote sleep monitoring of breathing and apnea. arXiv preprint arXiv:2407.11936 (2024)
Gao, C., Saraf, A., Kopf, J., Huang, J.B.: Dynamic view synthesis from dynamic monocular video. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5712–5721 (2021)
Google Scholar
Hurter, C., McDuff, D.: Cardiolens: remote physiological monitoring in a mixed reality environment. In: ACM SIGGRAPH 2017 Emerging Technologies, pp. 1–2 (2017)
Google Scholar
Jiang, C., et al.: Local implicit grid representations for 3D scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6001–6010 (2020)
Google Scholar
Kadambi, A.: Achieving fairness in medical devices. Science 372(6537), 30–31 (2021)
Article Google Scholar
Lee, E., Chen, E., Lee, C.-Y.: Meta-rPPG: remote heart rate estimation using a transductive meta-learner. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12372, pp. 392–409. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58583-9_24
Chapter Google Scholar
Li, R., Tancik, M., Kanazawa, A.: NerfAcc: a general nerf acceleration toolbox. arXiv preprint arXiv:2210.04847 (2022)
Li, Z., Niklaus, S., Snavely, N., Wang, O.: Neural scene flow fields for space-time view synthesis of dynamic scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6498–6508 (2021)
Google Scholar
Lindell, D.B., Van Veen, D., Park, J.J., Wetzstein, G.: Bacon: band-limited coordinate networks for multiscale scene representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16252–16262 (2022)
Google Scholar
Liu, X., Fromm, J., Patel, S., McDuff, D.: Multi-task temporal shift attention networks for on-device contactless vitals measurement. In: Advances in Neural Information Processing Systems, vol. 33, pp. 19400–19411 (2020)
Google Scholar
Liu, X., Hill, B., Jiang, Z., Patel, S., McDuff, D.: EfficientPhys: enabling simple, fast and accurate camera-based cardiac measurement. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 5008–5017 (2023)
Google Scholar
Liu, X., et al.: Deep physiological sensing toolbox. arXiv preprint arXiv:2210.00716 (2022)
Magdalena Nowara, E., Marks, T.K., Mansour, H., Veeraraghavan, A.: SparsePPG: towards driver monitoring using camera-based vital signs estimation in near-infrared. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1272–1281 (2018)
Google Scholar
Mai, L., Liu, F.: Motion-adjustable neural implicit video representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10738–10747 (2022)
Google Scholar
Maity, A.K., Wang, J., Sabharwal, A., Nayar, S.K.: RobustPPG: camera-based robust heart rate estimation using motion cancellation. Biomed. Opt. Express 13(10), 5447–5467 (2022)
Article Google Scholar
Martel, J.N., Lindell, D.B., Lin, C.Z., Chan, E.R., Monteiro, M., Wetzstein, G.: Acorn: adaptive coordinate networks for neural scene representation. arXiv preprint arXiv:2105.02788 (2021)
Mehta, I., Gharbi, M., Barnes, C., Shechtman, E., Ramamoorthi, R., Chandraker, M.: Modulated periodic activations for generalizable local functional representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14214–14223 (2021)
Google Scholar
Mildenhall, B., Hedman, P., Martin-Brualla, R., Srinivasan, P.P., Barron, J.T.: NeRF in the dark: high dynamic range view synthesis from noisy raw images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16190–16199 (2022)
Google Scholar
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021)
Article Google Scholar
Monitors, C.: Heart rate meters, and alarms. ANSI/AAMI Standard EC13 (2002)
Google Scholar
Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 41(4), 102:1–102:15 (2022). https://doi.org/10.1145/3528223.3530127
Nelson, B.W., Allen, N.B.: Accuracy of consumer wearable heart rate measurement during an ecologically valid 24-hour period: intraindividual validation study. JMIR Mhealth Uhealth 7(3), e10828 (2019)
Article Google Scholar
Nießner, M., Zollhöfer, M., Izadi, S., Stamminger, M.: Real-time 3D reconstruction at scale using voxel hashing. ACM Trans. Graph. (ToG) 32(6), 1–11 (2013)
Article Google Scholar
Niu, X., Shan, S., Han, H., Chen, X.: RhythmNet: end-to-end heart rate estimation from face via spatial-temporal representation. IEEE Trans. Image Process. 29, 2409–2423 (2019)
Article Google Scholar
Nowara, E.M., McDuff, D., Veeraraghavan, A.: A meta-analysis of the impact of skin tone and gender on non-contact photoplethysmography measurements. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 284–285 (2020)
Google Scholar
Nowara, E.M., Sabharwal, A., Veeraraghavan, A.: PPGSecure: biometric presentation attack detection using photopletysmograms. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 56–62. IEEE (2017)
Google Scholar
Owhadi, H., Scovel, C., Sullivan, T.J., McKerns, M., Ortiz, M.: Optimal uncertainty quantification. SIAM Rev. 55(2), 271–345 (2013)
Article MathSciNet Google Scholar
Park, J.J., Florence, P., Straub, J., Newcombe, R., Lovegrove, S.: DeepSDF: learning continuous signed distance functions for shape representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 165–174 (2019)
Google Scholar
Peters, H., Ba, Y., Kadambi, A.: pCON: polarimetric coordinate networks for neural scene representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2023)
Google Scholar
Poh, M.Z., McDuff, D.J., Picard, R.W.: Non-contact, automated cardiac pulse measurements using video imaging and blind source separation. Opt. Express 18(10), 10762–10774 (2010)
Article Google Scholar
Ramaswamy, V.V., Kim, S.S., Russakovsky, O.: Fair attribute classification through latent space de-biasing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9301–9310 (2021)
Google Scholar
Schulz, P., Scheuvens, L., Fettweis, G.: A new perspective on maximal-ratio combining. In: 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), pp. 1–7. IEEE (2023)
Google Scholar
Sitzmann, V., Martel, J., Bergman, A., Lindell, D., Wetzstein, G.: Implicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7462–7473 (2020)
Google Scholar
Song, R., Chen, H., Cheng, J., Li, C., Liu, Y., Chen, X.: PulseGAN: learning to generate realistic pulse waveforms in remote photoplethysmography. IEEE J. Biomed. Health Inform. 25(5), 1373–1384 (2021)
Article Google Scholar
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: Advances in Neural Information Processing Systems, vol. 33, pp. 7537–7547 (2020)
Google Scholar
Teschner, M., Heidelberger, B., Müller, M., Pomerantes, D., Gross, M.H.: Optimized spatial hashing for collision detection of deformable objects. In: VMV, vol. 3, pp. 47–54 (2003)
Google Scholar
Verkruysse, W., Svaasand, L.O., Nelson, J.S.: Remote plethysmographic imaging using ambient light. Opt. Express 16(26), 21434–21445 (2008)
Article Google Scholar
Vilesov, A., et al.: Blending camera and 77 GHz radar sensing for equitable, robust plethysmography. ACM Trans. Graph. (TOG) 41(4), 1–14 (2022)
Article Google Scholar
Wadhwa, N., Rubinstein, M., Durand, F., Freeman, W.T.: Phase-based video motion processing. ACM Tran. Graph. (TOG) 32(4), 1–10 (2013)
Article Google Scholar
Wang, W., Den Brinker, A.C., Stuijk, S., De Haan, G.: Algorithmic principles of remote PPG. IEEE Trans. Biomed. Eng. 64(7), 1479–1491 (2016)
Article Google Scholar
Wang, Z., et al.: Towards fairness in visual recognition: Effective strategies for bias mitigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8919–8928 (2020)
Google Scholar
Wang, Z., et al.: Synthetic generation of face videos with plethysmograph physiology. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20587–20596 (2022)
Google Scholar
Wang, Z., et al.: Alto: alternating latent topologies for implicit 3D reconstruction. arXiv preprint arXiv:2212.04096 (2022)
Wu, H.Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., Freeman, W.: Eulerian video magnification for revealing subtle changes in the world. ACM Trans. Graph. (TOG) 31(4), 1–8 (2012)
Article Google Scholar
Xu, T., White, J., Kalkan, S., Gunes, H.: Investigating bias and fairness in facial expression recognition. In: Bartoli, A., Fusiello, A. (eds.) ECCV 2020. LNCS, vol. 12540, pp. 506–523. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-65414-6_35
Chapter Google Scholar
Yu, Z., Li, X., Zhao, G.: Remote photoplethysmograph signal measurement from facial videos using spatio-temporal networks. arXiv preprint arXiv:1905.02419 (2019)
Yu, Z., Shen, Y., Shi, J., Zhao, H., Torr, P.H., Zhao, G.: PhysFormer: facial video-based physiological measurement with temporal difference transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4186–4196 (2022)
Google Scholar
Zhao, E.Q., et al.: Making thermal imaging more equitable and accurate: resolving solar loading biases. arXiv preprint arXiv:2304.08832 (2023)
Zheng, T., Chen, Z., Zhang, S., Cai, C., Luo, J.: MoRe-Fi: motion-robust and fine-grained respiration monitoring via deep-learning UWB radar. In: Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, pp. 111–124 (2021)
Google Scholar
Zhi, S., Laidlow, T., Leutenegger, S., Davison, A.J.: In-place scene labelling and understanding with implicit scene representation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15838–15847 (2021)
Google Scholar

Download references

Acknowledgements

We thank the Visual Machines Group (VMG) at UCLA for feedback and support. A.K. was supported by a National Science Foundation (NSF) CAREER award, IIS-2046737, Army Young Investigator Program Award, Defense Advanced Research Projects Agency (DARPA) Young Faculty Award, and a Cisco Research Award. P.C. was partially supported by a Cisco Research Award.

Author information

Authors and Affiliations

University of California Los Angeles, Los Angeles, CA, 90095, USA
Pradyumna Chari, Alexander Vilesov, Sanjit Sarda, Laleh Jalilian & Achuta Kadambi
Rice University, Houston, TX, 77005, USA
Anirudh Bindiganavale Harish
Cornell Tech, New York, NY, 10044, USA
Adnan Armouti

Authors

Pradyumna Chari
View author publications
You can also search for this author in PubMed Google Scholar
Anirudh Bindiganavale Harish
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Armouti
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Vilesov
View author publications
You can also search for this author in PubMed Google Scholar
Sanjit Sarda
View author publications
You can also search for this author in PubMed Google Scholar
Laleh Jalilian
View author publications
You can also search for this author in PubMed Google Scholar
Achuta Kadambi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradyumna Chari .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 8179 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chari, P. et al. (2025). Implicit Neural Models to Extract Heart Rate from Video. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15141. Springer, Cham. https://doi.org/10.1007/978-3-031-73010-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-73010-8_10
Published: 10 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73009-2
Online ISBN: 978-3-031-73010-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Implicit Neural Models to Extract Heart Rate from Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Instantaneous Physiological Estimation Using Video Transformers

VIPL-HR: A Multi-modal Database for Pulse Estimation from Less-Constrained Face Video

PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8179 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Implicit Neural Models to Extract Heart Rate from Video

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Instantaneous Physiological Estimation Using Video Transformers

VIPL-HR: A Multi-modal Database for Pulse Estimation from Less-Constrained Face Video

PhysFormer++: Facial Video-Based Physiological Measurement with SlowFast Temporal Difference Transformer

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 8179 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation