Skip to main content
Log in

Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Missing values occur in bio-signal processing for various reasons, including technical problems or biological characteristics. These missing values are then either simply excluded or substituted with estimated values for further processing. When the missing signal values are estimated for electroencephalography (EEG) signals, an example where electrical signals arrive quickly and successively, rapid processing of high-speed data is required for immediate decision making. In this study, we propose an incremental expectation maximization principal component analysis (iEMPCA) method that automatically estimates missing values from multivariable EEG time series data without requiring a whole and complete data set. The proposed method solves the problem of a biased model, which inevitably results from simply removing incomplete data rather than estimating them, and thus reduces the loss of information by incorporating missing values in real time. By using an incremental approach, the proposed method also minimizes memory usage and processing time of continuously arriving data. Experimental results show that the proposed method assigns more accurate missing values than previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abdala, O.T., Saeed, M., 2004. Estimation of missing values in clinical laboratory measurements of ICU patients using a weighted K-nearest neighbors algorithm. Comput. Cardiol., 31:693–696. [doi:10.1109/CIC.2004.1443033]

    Article  Google Scholar 

  • Acar, E., Dunlavy, D.M., Kolda, T.G., Mørup, M., 2011. Scalable tensor factorizations for incomplete data. Chemometr. Intell. Lab. Syst., 106(1):41–56. [doi:10.1016/j.chemolab.2010.08.004]

    Article  Google Scholar 

  • Adams, E., Walczak, B., Vervaet, C., Risha, P.G., Massart, D.L., 2002. Principal component analysis of dissolution data with missing elements. Int. J. Pharm., 234(1–2):169–178. [doi:10.1016/S0378-5173(01)00966-8]

    Article  Google Scholar 

  • Al-Deek, H.M., Venkata, C., Chandra, S.R., 2004. New algorithms for filtering and imputation of real-time and archived dual-loop detector data in I-4 data warehouse. Trans. Res. Rec. J. Transp. Res. Board, 1867:116–126. [doi:10.3141/1867-14]

    Article  Google Scholar 

  • Ching, W.K., Li, L., Tsing, N.K., Tai, C.W., Ng, T.W., Wong, A.S., Cheng, K.W., 2010. A weighted local least squares imputation method for missing value estimation in microarray gene expression data. Int. J. Data. Min. Bioinform., 4(3):331–347. [doi:10.1504/IJDMB.2010.033524]

    Article  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B., 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 39(1):1–38.

    MathSciNet  MATH  Google Scholar 

  • Dixon, J.K., 1979. Pattern recognition with partly missing data. IEEE. Tran. Syst. Man. Cybern., 9(10):617–621. [doi:10.1109/TSMC.1979.4310090]

    Article  Google Scholar 

  • Graham, J.W., 2009. Missing data analysis: making it work in the real world. Ann. Rev. Psychol., 60(1):549–576. [doi:10.1146/annurev.psych.58.110405.085530]

    Article  Google Scholar 

  • Graham, J.W., Olchowski, A.E., Gilreath, T.D., 2007. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev. Sci., 8(3):206–213. [doi:10.1007/s11121-007-0070-9]

    Article  Google Scholar 

  • Horton, N.J., Lipsitz, S.R., 2001. Multiple imputation in practice: comparison of software packages for regression models with missing variables. Am. Stat., 55(3):244–254. [doi:10.1198/000313001317098266]

    Article  MathSciNet  Google Scholar 

  • Janssen, K.J.M., Vergouwe, Y., Donders, A.R.T., Harrell, F.E.Jr., Chen, O., Grobbee, D.E., Moons, K.G.M., 2009. Dealing with missing predictor values when applying clinical prediction models. Clin. Chem., 55(5):994–1001. [doi:10.1373/clinchem.2008.115345]

    Article  Google Scholar 

  • Little, R.J.A., Rubin, D.B., 2002. Statistical Analysis with Missing Data (2nd Ed.). John Wiley and Sons, New York, p.200–222.

    MATH  Google Scholar 

  • Musil, C.M., Warnerm, C.B., Yobas, P.K., Jones, S.L., 2002. A comparison of imputation techniques for handling missing data. West. J. Nurs. Res., 24(7):815–829. [doi:10.1177/019394502762477004]

    Article  Google Scholar 

  • Ni, D., Leonard, J.D., Guin, A., Feng, C., 2005. Multiple imputation scheme for overcoming the missing values and variability issues in ITS data. J. Transp. Eng., 131(12):931–938. [doi:10.1061/(ASCE)0733-947X(2005)131:12 (931)]

    Article  Google Scholar 

  • Norazian, M.N., Shukri, Y.A., Azam, R.N., Al Bakri, A.M.M., 2008. Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia, 34(3):341–345. [doi:10.2306/scienceasia1513-1874.2008.34.341]

    Article  Google Scholar 

  • Pan, J.Y., Kitagawa, H., Hamamoto, M., Faloutsos, C., 2004. AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases. 8th Pacific-Asia Conf. on Knowledge Discovery and Data Mining, p.519–528. [doi:10.1007/978-3-540-24775-3_62]

  • Papadimitriou, S., Sun, J., Faloutsos, C., 2005. Streaming Pattern Discovery in Multiple Time-Series. 31st Int. Conf. on Very Large Data Bases, p.697–708.

  • Raghunathan, T.E., Lepkowksi, J.M., van Hoewyk, J., Solenbeger, P., 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol., 27(1):85–95.

    Google Scholar 

  • Rosenbaum, P.R., Rubin, D.B., 1983. Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B, 45(2):212–218.

    Google Scholar 

  • Roweis, S., 1998. EM algorithms for PCA and SPCA. Adv. Neur. Inform. Process. Syst., 10:626–632.

    Google Scholar 

  • Rubin, D.B., 1978. Multiple Imputation in Sample Surveys-a Phenomenological Bayesian Approach to Nonresponse. Proc. Survey Research Methods Section, p.20–34.

  • Rubin, D.B., 1987. Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, New York, p.249–250.

    Book  Google Scholar 

  • Ryan, C., Greene, D., Cagney, G., Cunningham, P., 2010. Missing value imputation for epistatic MAPs. BMC Bioinform., 11(1):197–234. [doi:10.1186/1471-2105-11-197]

    Article  Google Scholar 

  • Schafer, J.L., 1997. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, p.478–479.

    Book  MATH  Google Scholar 

  • Schlogl, A., Supp, G., 2006. Analyzing event-related EEG data with multivariate autoregressive parameters. Progr. Brain Res., 159:135–147. [doi:10.1016/S0079-6123(06)59009-0]

    Article  Google Scholar 

  • Schneider, T., 2001. Analysis of incomplete climate data: estimation of mean values and covariance matrices and imputation of missing values. J. Climate, 14:853–871. [doi:10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2]

    Article  Google Scholar 

  • Sharma, S., Lingras, P., Zhong, M., 2004. Effect of missing values estimations of traffic parameters. Transp. Plan. Technol., 27(2):119–144. [doi:10.1080/0308106042000218203]

    Article  Google Scholar 

  • Smith, B.L., Scherer, W.T., Conklin, J.H., 2003. Exploring imputation techniques for missing data in transportation management systems. Transp. Res. Rec. J. Transp. Res. Board, 1836:132–142. [doi:10.3141/1836-17]

    Article  Google Scholar 

  • Smith, L., 2002. A Tutorial on Principal Components Analysis. Cornell University, USA. Available from http://www.cs. otgo.ac.nz/cosc453/student_tutorials/principal_components.pdf [Accessed on Sept. 10, 2009].

    Google Scholar 

  • Smith, S.J.M., 2005. EEG in the diagnosis, classification, and management of patients with epilepsy. J. Neurol. Neurosurg. Psych., 76:ii2–ii7. [doi:10.1136/jnnp.2005.069245]

    Article  Google Scholar 

  • Stanimirova, I., Daszykowski, M., Walczak, B., 2007. Dealing with missing values and outliers in principal component analysis. Talanta, 72(1):172–178. [doi:10.1016/j.talanta.2006.10.011]

    Article  Google Scholar 

  • Subha, D.P., Joseph, P.K., Acharya, U.R., Lim, C.M., 2010. EEG signal analysis: a survey. J. Med. Syst., 34(2):195–212. [doi:10.1007/s10916-008-9231-z]

    Article  Google Scholar 

  • Sun, J., Papadimitriou, S., Faloutsos, C., 2005. Online Latent Variable Detection in Sensor Networks. 21st Int. Conf. on Data Engineering, p.1126–1127. [doi:10.1109/ICDE.2005.100]

  • Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, B., Altman, R.B., 2001. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520–525. [doi:10.1093/bioinformatics/17.6.520]

    Article  Google Scholar 

  • Wang, X., Li, A., Jiang, Z., Feng, H., 2006. Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme. BMC Bioinform., 7:32. [doi:10.1186/1471-2105-7-32]

    Article  Google Scholar 

  • Yamaguchi, T., Mackin, K.J., Matsumoto, K., Okusa, H., 2008. SOM for classifying data sets with missing values: application to clinical data of bladder cancer patients. Artif. Life Robot., 13(1):271–274. [doi:10.1007/s10015-008-0578-5]

    Article  Google Scholar 

  • Yuan, Y.C., 2001. Multiple Imputation for Missing Data: Concepts and New Development SAS/STAT 8.2. Available from http://www.sas.com/statistics [Accessed on May 18, 2010].

  • Zhao, L., Chai, T., Cong, Q., 2006. Operating Condition Recognition of Predenitrification Bioprocess Using Robust EMPCA and FCM. Sixth World Congress on Intelligent Control and Automation, p.9386–9390. [doi:10.1109/WCICA.2006.1713818]

  • Zhong, M., Sharma, S., Liu, Z., 2005. Assessing robustness of imputation models based on data from different Jurisdictions: examples of Alberta and Saskatchewan, Canada. Transp. Res. Rec. J. Transp. Res. Board, 1917:116–126. [doi:10.3141/1917-14]

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyung Jeong Yang.

Additional information

Project supported by the Ministry of Knowledge Economy, Korea, under the Information Technology Research Center support program supervised by National IT Industry Promotion Agency (No. NIPA-2011-C1090-1111-0008), the Special Research Program of Chonnam National University, 2009, and the LG Yonam Culture Foundation

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, S.H., Yang, H.J. & Ng, K.S. Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data. J. Zhejiang Univ. - Sci. C 12, 687–697 (2011). https://doi.org/10.1631/jzus.C10b0359

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C10b0359

Key words

CLC number

Navigation