Abstract
In the field of EEG-based gaze prediction, the application of deep learning to interpret complex neural data poses significant challenges. This study evaluates the effectiveness of pre-processing techniques and the effect of additional depthwise separable convolution on EEG vision transformers (ViTs) in a pretrained model architecture. We introduce a novel method, the EEG Deeper Clustered Vision Transformer (EEG-DCViT), which combines depthwise separable convolutional neural networks (CNNs) with vision transformers, enriched by a pre-processing strategy involving data clustering. The new approach demonstrates superior performance, establishing a new benchmark with a Root Mean Square Error (RMSE) of 51.6 mm. This achievement underscores the impact of pre-processing and model refinement in enhancing EEG-based applications.
M. L. Key and T. Mehtiyev—The first authors contributed equally to this work.
Full source code is available at https://github.com/GWU-CS/EEG-DCViT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
An, S., Bhat, G., Gumussoy, S., Ogras, U.: Transfer learning for human activity recognition using representational analysis of neural networks. ACM Trans. Comput. Healthc. 4(1), 1–21 (2023)
An, S., Tuncel, Y., Basaklar, T., Ogras, U.Y.: A survey of embedded machine learning for smart and sustainable healthcare applications. In: Pasricha, S., Shafique, M. (eds.) Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing: Use Cases and Emerging Challenges, pp. 127–150. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-40677-5_6
Chaaraoui, A.A., Climent-Pérez, P., Flórez-Revuelta, F.: A review on vision techniques applied to human behaviour analysis for ambient-assisted living. Expert Syst. Appl. 39(12), 10873–10888 (2012). https://doi.org/10.1016/j.eswa.2012.03.005
Chen, H., et al.: Pre-trained image processing transformer. arXiv preprint arXiv:2012.00364 (2021). https://doi.org/10.48550/arXiv.2012.00364
Cheng, Y., Lu, F.: Gaze estimation using transformer. arXiv preprint arXiv:2105.14424 (2021). https://ar5iv.labs.arxiv.org/html/2105.14424
Craik, A., He, Y., Contreras-Vidal, J.L.: Deep learning for electroencephalogram (EEG) classification tasks: a review. J. Neural Eng. 16(3) (2019). https://doi.org/10.1088/1741-2552/aba0b5
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. ICLR, January 2021
Godoy, R.V., et al.: EEG-based epileptic seizure prediction using temporal multi-channel transformers. arXiv preprint arXiv:2209.11172 (2022). https://ar5iv.labs.arxiv.org/html/2209.11172
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017). https://arxiv.org/pdf/1704.04861.pdf
Huang, Z., et al.: CDBA: a novel multi-branch feature fusion model for EEG-based emotion recognition. Front. Physiol. 14 (2023). https://doi.org/10.3389/fphys.2023.1200656. https://www.frontiersin.org/articles/10.3389/fphys.2023.1200656/full
Kastrati, A., et al.: EEGEyeNet: a simultaneous electroencephalography and eye-tracking dataset and benchmark for eye movement prediction. ETH Zurich, November 2021
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1. NeurIPS (2012). https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf
Li, Q., et al.: Multidimensional feature in emotion recognition based on multi-channel EEG signals. Entropy 24(12), 1830 (2022). https://doi.org/10.3390/e24121830. https://www.mdpi.com/1099-4300/24/12/1830
Li, W., Lu, X., Qian, S., Lu, J., Zhang, X., Jia, J.: On efficient transformer-based image pre-training for low-level vision. arXiv: Computer Vision and Pattern Recognition, December 2021. https://doi.org/10.48550/arXiv.2112.10175
Lu, Y., Shen, M., Wang, H., Wang, X., van Rechem, C., Wei, W.: Machine learning for synthetic data generation: a review. arXiv preprint arXiv:2302.04062 (2023)
Lu, Y., et al.: COT: an efficient and accurate method for detecting marker genes among many subtypes. Bioinform. Adv. 2(1), vbac037 (2022)
Majaranta, P., Bulling, A.: Eye tracking eye-based human-computer interaction, March 2014. https://doi.org/10.1007/978-1-4471-6392-3_3
Murungi, N.K., Pham, M.V., Dai, X., Qu, X.: Trends in machine learning and electroencephalogram (EEG): a review for undergraduate researchers. arXiv preprint arXiv:2307.02819 (2023)
Okada, G., Masui, K., Tsumura, N.: Advertisement effectiveness estimation based on crowdsourced multimodal affective responses. In: CVPR Workshop (2023)
Qiu, Y., Zhao, Z., Yao, H., Chen, D., Wang, Z.: Modal-aware visual prompting for incomplete multi-modal brain tumor segmentation. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 3228–3239 (2023)
Qu, X., Liu, P., Li, Z., Hickey, T.: Multi-class time continuity voting for EEG classification. In: Frasson, C., Bamidis, P., Vlamos, P. (eds.) BFAL 2020. LNCS (LNAI), vol. 12462, pp. 24–33. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60735-7_3
Qu, X., Mei, Q., Liu, P., Hickey, T.: Using EEG to distinguish between writing and typing for the same cognitive task. In: Frasson, C., Bamidis, P., Vlamos, P. (eds.) BFAL 2020. LNCS (LNAI), vol. 12462, pp. 66–74. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60735-7_7
Tang, Y., Song, S., Gui, S., Chao, W., Cheng, C., Qin, R.: Active and low-cost hyperspectral imaging for the spectral analysis of a low-light environment. Sensors 23(3), 1437 (2023)
Teplan, M.: Fundamentals of EEG measurement. Meas. Sci. Rev. 2(2), 1–11 (2002)
Wang, X., Shi, R., Wu, X., Zhang, J.: Decoding human interaction type from inter-brain synchronization by using EEG brain network. IEEE J. Biomed. Health Inform. (2023). https://doi.org/10.1109/JBHI.2023.3239742. https://pubmed.ncbi.nlm.nih.gov/37917521/. epub ahead of print
Xu, K., Lee, A.H.X., Zhao, Z., Wang, Z., Wu, M., Lin, W.: Metagrad: adaptive gradient quantization with hypernetworks. arXiv preprint arXiv:2303.02347 (2023)
Yang, R., Modesitt, E.: ViT2EEG: leveraging hybrid pretrained vision transformers for EEG data (2023)
Yi, L., Qu, X.: Attention-based CNN capturing EEG recording’s average voltage and local change. In: Degen, H., Ntoa, S. (eds.) HCII 2022. LNCS, vol. 13336, pp. 448–459. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05643-7_29
Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015). https://arxiv.org/pdf/1506.06579.pdf
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv preprint arXiv:1311.2901 (2013). https://arxiv.org/pdf/1311.2901.pdf
Zhao, S., et al.: Deep learning based CETSA feature prediction cross multiple cell lines with latent space representation. Sci. Rep. 14(1), 1878 (2024)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors declare no competing interests.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Key, M.L., Mehtiyev, T., Qu, X. (2024). Advancing EEG-Based Gaze Prediction Using Depthwise Separable Convolution and Enhanced Pre-processing. In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Augmented Cognition. HCII 2024. Lecture Notes in Computer Science(), vol 14695. Springer, Cham. https://doi.org/10.1007/978-3-031-61572-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-61572-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-61571-9
Online ISBN: 978-3-031-61572-6
eBook Packages: Computer ScienceComputer Science (R0)