Abstract
This paper proposes a grey-based nearest neighbor approach to predict accurately missing attribute values. First, grey relational analysis is employed to determine the nearest neighbors of an instance with missing attribute values. Accordingly, the known attribute values derived from these nearest neighbors are used to infer those missing values. Two datasets were used to demonstrate the performance of the proposed method. Experimental results show that our method outperforms both multiple imputation and mean substitution. Moreover, the proposed method was evaluated using five classification problems with incomplete data. Experimental results indicate that the accuracy of classification is maintained or even increased when the proposed method is applied for missing attribute value prediction.
Similar content being viewed by others
References
D.W. Aha, D. Kibler, and M.K. Albert, “Instance-based learning algorithms,” Machine Learning, vol. 6, pp. 37-66, 1991.
M.R. Berthold and K.-P. Huber, “Missing values and learning of fuzzy Rules,” Int. J. Uncertainty, Fuzziness, and Knowledgebased Systems, vol. 6, no. 2, 1998.
C.L. Blake and C.J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository. html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
W.L. Buntine and A.S. Weigend, “Bayesian backpropagation,” Complex Systems, vol. 5, pp. 603-643, 1991.
B. Cestnik, I. Kononenko, and I. Bratko, “Assistant 86: A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, edited by I. Bratko and N. Lavrac, Sigma Press: Wilmslow, 1987, pp. 31-45.
T.M. Cover and P.E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967.
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, series B, vol. 39, pp. 1-38, 1977.
J. Deng, “The theory and method of socioeconomic grey systems,” Social Sciences in China, vol. 6, pp. 47-60, 1984 (in Chinese).
J. Deng, “Introduction to grey system theory,” The Journal of Grey System, vol. 1, pp. 1-24, 1989.
J. Deng, “Grey information space,” The Journal of Grey System, vol. 1, pp. 103-117, 1989.
J.K. Dixon, “Pattern recognition with partly missing data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 10, pp. 617-621, 1979.
R. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics Part 2, vol. 7, pp. 179-188, 1936.
E. Fix and J.L. Hodges, “Discriminatory analysis: Nonparametric discrimination: consistency properties,” Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.
Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” in Proc. of the 16th International Conference on Machine Learning, Bled, Slovenia, 1999, pp. 124-133.
Y. Freund and R.E. Schapire, “Large margin classification using the perceptron algorithm,” in Proc. 11th Annual Conf. on Comput. Learning Theory, ACM Press: New York, NY, 1998, pp. 209-217.
J.H. Friedman, “A recursive partitioning decision rule for nonparametric classification,” IEEE Transactions on Computers, pp. 404-408, 1977.
R.C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, pp. 63-91, 1993.
C.C. Huang and H.M. Lee, “An instance-based learning approach based on grey relational structure,” in Proc. of the UK Workshop on Computational Intelligence (UKCI-02), Birmingham, Sept., 2002.
19. G.H. John and P. Langley, “Estimating continuous distributions in bayesian classifiers,” in Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338-345.
20. G.C. John and E.T. Leonard, “K*: An instance-based learner using an entropic distance measure,” in Proc. of the 12th International Conference on Machine Learning, 1995, pp. 108-114.
G. King, J. Honaker, A. Joseph, and K. Scheve, “Analyzing incomplete political science data: An alternative algorithm for multiple imputation,” American Political Science Review, vol. 95, no. 1, pp. 49-69, 2001.
R. Kohavi, “The power of decision tables,” in European Conference on Machine Learning, 1995.
C.T. Lin and S.Y.Yang, “Selection of home mortgage loans using grey relational analysis,” The Journal of Grey System, vol. 4, pp. 359-368, 1999.
R.S. Michalski and R.L. Chilausky, “Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis,” International Journal of Policy Analysis and Information Systems, vol. 4, no. 2, 1980.
J.R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986.
J.R. Quinlan, “Unknown attribute values in induction,” in Proc. of the Sixth International Machine Learning Workshop, Morgan Kaufmann: San Mateo, CA, 1989, pp. 164-168.
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers: San Mateo, CA, 1993.
Y. Reich, “Converging to 'ideal' design knowledge by learning,” in Proc. of the First InternationalWorkshop on Formal Methods in Engineering Design, 1990, pp. 330-349.
D.B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley: New York, 1987.
S. Salzberg, “Exemplar-based learning: Theory and implementation,” Technical Report TR-10-88, Harvard University, Center for Research in Computing Technology, 1988.
C. Stanfill and D. Waltz, “Towards memory-based reasoning,” Communications of the ACM, vol. 29, no. 12, pp. 1213-1228, 1986.
M. Stone, “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society, vol. B, 36, pp. 111-147, 1974.
C.J. Watson, P. Billingsley, D.J. Croft and D.V. Huntsberger, Statistics for Management and Economics, 5th edition, Allyn and Bacon, Boston, 1993.
I.Witten and E. Frank, Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann: San Francisco, CA, 2000.
J.H. Wu, M.L. You, and K.L. Wen, “A modified grey relational analysis,” The Journal of Grey System, vol. 3, pp. 287-292, 1999.
Rights and permissions
About this article
Cite this article
Huang, CC., Lee, HM. A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction. Applied Intelligence 20, 239–252 (2004). https://doi.org/10.1023/B:APIN.0000021416.41043.0f
Issue Date:
DOI: https://doi.org/10.1023/B:APIN.0000021416.41043.0f