Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-processing

Key, Matthew L.; Mehtiyev, Tural; Qu, Xiaodong

doi:10.1007/978-3-031-61572-6_1

Matthew L. Key²⁶,
Tural Mehtiyev²⁶ &
Xiaodong Qu²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14695))

Included in the following conference series:

International Conference on Human-Computer Interaction

469 Accesses

Abstract

In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers, enriched by a pre-processing strategy involving data clustering. The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm. This achievement underscores the impact of pre-processing and model refinement in enhancing EEG-based applications.

M. L. Key and T. Mehtiyev—The first authors contributed equally to this work.

Full source code is available at https://github.com/GWU-CS/EEG-DCViT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Gaze estimation using convolutional neural networks

Article 14 September 2023

Monocular 3D gaze estimation using feature discretization and attention mechanism

Article 22 May 2023

Appearance-Based Gaze Estimation Using Dilated-Convolutions

References

An, S., Bhat, G., Gumussoy, S., Ogras, U.: Transfer learning for human activity recognition using representational analysis of neural networks. ACM Trans. Comput. Healthc. 4(1), 1–21 (2023)
Article Google Scholar
An, S., Tuncel, Y., Basaklar, T., Ogras, U.Y.: A survey of embedded machine learning for smart and sustainable healthcare applications. In: Pasricha, S., Shafique, M. (eds.) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges, pp. 127–150. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-40677-5_6
Chapter Google Scholar
Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Syst. Appl. 39(12), 10873–10888 (2012). https://doi.org/10.1016/j.eswa.2012.03.005
Article Google Scholar
Chen, H., et al.: Pre-trained image processing transformer. arXiv preprint arXiv:2012.00364 (2021). https://doi.org/10.48550/arXiv.2012.00364
Cheng, Y., Lu, F.: Gaze estimation using transformer. arXiv preprint arXiv:2105.14424 (2021). https://ar5iv.labs.arxiv.org/html/2105.14424
Craik, A., He, Y., Contreras-Vidal, J.L.: Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 16(3) (2019). https://doi.org/10.1088/1741-2552/aba0b5
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ICLR, January 2021
Google Scholar
Godoy, R.V., et al.: EEG-based epileptic seizure prediction using temporal multi-channel transformers. arXiv preprint arXiv:2209.11172 (2022). https://ar5iv.labs.arxiv.org/html/2209.11172
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). https://arxiv.org/pdf/1704.04861.pdf
Huang, Z., et al.: CDBA: a novel multi-branch feature fusion model for EEG-based emotion recognition. Front. Physiol. 14 (2023). https://doi.org/10.3389/fphys.2023.1200656. https://www.frontiersin.org/articles/10.3389/fphys.2023.1200656/full
Kastrati, A., et al.: EEGEyeNet: a simultaneous electroencephalography and eye-tracking dataset and benchmark for eye movement prediction. ETH Zurich, November 2021
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NeurIPS (2012). https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Li, Q., et al.: Multidimensional feature in emotion recognition based on multi-channel EEG signals. Entropy 24(12), 1830 (2022). https://doi.org/10.3390/e24121830. https://www.mdpi.com/1099-4300/24/12/1830
Li, W., Lu, X., Qian, S., Lu, J., Zhang, X., Jia, J.: On efficient transformer-based image pre-training for low-level vision. arXiv: Computer Vision and Pattern Recognition, December 2021. https://doi.org/10.48550/arXiv.2112.10175
Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C., Wei, W.: Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062 (2023)
Lu, Y., et al.: COT: an efficient and accurate method for detecting marker genes among many subtypes. Bioinform. Adv. 2(1), vbac037 (2022)
Google Scholar
Majaranta, P., Bulling, A.: Eye tracking eye-based human-computer interaction, March 2014. https://doi.org/10.1007/978-1-4471-6392-3_3
Murungi, N.K., Pham, M.V., Dai, X., Qu, X.: Trends in machine learning and electroencephalogram (EEG): a review for undergraduate researchers. arXiv preprint arXiv:2307.02819 (2023)
Okada, G., Masui, K., Tsumura, N.: Advertisement effectiveness estimation based on crowdsourced multimodal affective responses. In: CVPR Workshop (2023)
Google Scholar
Qiu, Y., Zhao, Z., Yao, H., Chen, D., Wang, Z.: Modal-aware visual prompting for incomplete multi-modal brain tumor segmentation. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 3228–3239 (2023)
Google Scholar
Qu, X., Liu, P., Li, Z., Hickey, T.: Multi-class time continuity voting for EEG classification. In: Frasson, C., Bamidis, P., Vlamos, P. (eds.) BFAL 2020. LNCS (LNAI), vol. 12462, pp. 24–33. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60735-7_3
Chapter Google Scholar
Qu, X., Mei, Q., Liu, P., Hickey, T.: Using EEG to distinguish between writing and typing for the same cognitive task. In: Frasson, C., Bamidis, P., Vlamos, P. (eds.) BFAL 2020. LNCS (LNAI), vol. 12462, pp. 66–74. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60735-7_7
Chapter Google Scholar
Tang, Y., Song, S., Gui, S., Chao, W., Cheng, C., Qin, R.: Active and low-cost hyperspectral imaging for the spectral analysis of a low-light environment. Sensors 23(3), 1437 (2023)
Article Google Scholar
Teplan, M.: Fundamentals of EEG measurement. Meas. Sci. Rev. 2(2), 1–11 (2002)
Google Scholar
Wang, X., Shi, R., Wu, X., Zhang, J.: Decoding human interaction type from inter-brain synchronization by using EEG brain network. IEEE J. Biomed. Health Inform. (2023). https://doi.org/10.1109/JBHI.2023.3239742. https://pubmed.ncbi.nlm.nih.gov/37917521/. epub ahead of print
Xu, K., Lee, A.H.X., Zhao, Z., Wang, Z., Wu, M., Lin, W.: Metagrad: adaptive gradient quantization with hypernetworks. arXiv preprint arXiv:2303.02347 (2023)
Yang, R., Modesitt, E.: ViT2EEG: leveraging hybrid pretrained vision transformers for EEG data (2023)
Google Scholar
Yi, L., Qu, X.: Attention-based CNN capturing EEG recording’s average voltage and local change. In: Degen, H., Ntoa, S. (eds.) HCII 2022. LNCS, vol. 13336, pp. 448–459. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05643-7_29
Chapter Google Scholar
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015). https://arxiv.org/pdf/1506.06579.pdf
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv preprint arXiv:1311.2901 (2013). https://arxiv.org/pdf/1311.2901.pdf
Zhao, S., et al.: Deep learning based CETSA feature prediction cross multiple cell lines with latent space representation. Sci. Rep. 14(1), 1878 (2024)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The George Washington University, Washington DC, USA
Matthew L. Key, Tural Mehtiyev & Xiaodong Qu

Authors

Matthew L. Key
View author publications
You can also search for this author in PubMed Google Scholar
Tural Mehtiyev
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Qu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew L. Key .

Editor information

Editors and Affiliations

Soar Technology Inc., Orlando, FL, USA
Dylan D. Schmorrow
Katmai Government Services, Orlando, FL, USA
Cali M. Fidopiastis

Ethics declarations

Disclosure of Interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Key, M.L., Mehtiyev, T., Qu, X. (2024). Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-processing. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2024. Lecture Notes in Computer Science(), vol 14695. Springer, Cham. https://doi.org/10.1007/978-3-031-61572-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-61572-6_1
Published: 01 June 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61571-9
Online ISBN: 978-3-031-61572-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-processing