Abstract
The accurate estimation of missing values is important for efficient use of DNA microarray data since most of the analysis and clustering algorithms require a complete data matrix. Several imputation algorithms have already been proposed in the biological literature. Most of these approaches identify, in one or another way, a fixed number of neighbouring genes for the estimation of each missing value. This increases the possibility of involving in the evaluation process gene expression profiles, which are rather distant from the profile of the target gene. The latter may significantly affect the performance of the applied imputation algorithm. We propose in this article a novel adaptive multiple imputation algorithm, which uses a varying number of neighbouring genes for the estimation of each missing value. The algorithm generates for each missing value a list of multiple candidate estimation values and then selects the most suitable one, according to some well-defined criteria, in order to replace the missing entry. The similarity between the expression profiles can be estimated either with the Euclidean metric or with the Dynamic Time Warping (DTW) distance measure. In this way, the proposed algorithm can be applied for the imputation of missing values for both non-time series and time series data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aach, J., Church, G.M.: Aligning gene expression time series with time warping algorithms. Bioinformatics 17, 495–508 (2001)
Criel, J., Tsiporkova, E.: Gene Time Expression Warper: A tool for alignment, template matching and visualization of gene expression time series. Bioinformatics 22(2), 251–252 (2006)
Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., Brown, P.O.: Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell 11, 4241–4257 (2000)
Hermans, F., Tsiporkova, E.: Merging microarray cell synchronization experiments through curve alignment. Bioinformatics 23, e64–e70 (2007)
Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., Botsein, D.: Imputing missing data for gene expression arrays. Technical report, Division of Biostatistics, Standford University (1999)
Kim, H., Golub, G.H., Park, H.: Missing value estimation for DNA microarray gene expression data: Local least squares imputation. Bioinformatics 21, 187–198 (2005)
Kim, K., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinformatics 5, 160 (2004)
Little, R., Rubin, D.: Statistical analysis with missing data. Wiley, New York (1987)
Nguyen, D., Wang, N., Carroll, R.: Evaluation of missing value estimation for microarray data. Journal of Data Science 2, 347–370 (2004)
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19, 2088–2096 (2003)
Rubin, D.B.: Multiple imputation for nonresponse in surveys. John Wiley & Sons, Inc., New York (1987)
Rustici, G., Mata, J., Kivinen, K., Lio, P., Penkett, C.J., Burns, G., Hayles, J., Brazma, A., Nurse, P., Bähler, J.: Periodic gene expression program of the fission yeast cell cycle. Nat. Genet.
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. on Acoust., Speech, and Signal Proc. ASSP 26, 43–49 (1978)
Sankoff, D., Kruskal, J.: Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. AddisonWesley, Reading Mass. (1983)
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., Altman, R.B.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Tsiporkova, E., Boeva, V.: Dynamic time warping techniques for missing value estimation in gene expression time series. In: Proc. of the 15th Dutch-Belgium Conference on Machine Learning, pp. 97–104 (2006)
Tsiporkova, E., Boeva, V.: Two-pass imputation algorithm for missing value estimation in gene expression time series. Journal of Bioinformatics and Computational Biology 5(5), 1005–1022 (2007)
Zhipeng, C., Maysam, H., Guohui, L.: Iterated local least squares microarray missing value imputation. Journal of Bioinformatics and Computational Biology 4(5), 935–957 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boeva, V., Tsiporkova, E. (2008). A Novel Adaptive Multiple Imputation Algorithm. In: Elloumi, M., Küng, J., Linial, M., Murphy, R.F., Schneider, K., Toma, C. (eds) Bioinformatics Research and Development. BIRD 2008. Communications in Computer and Information Science, vol 13. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70600-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-70600-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70598-7
Online ISBN: 978-3-540-70600-7
eBook Packages: Computer ScienceComputer Science (R0)