Skip to main content
Log in

A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper proposes a grey-based nearest neighbor approach to predict accurately missing attribute values. First, grey relational analysis is employed to determine the nearest neighbors of an instance with missing attribute values. Accordingly, the known attribute values derived from these nearest neighbors are used to infer those missing values. Two datasets were used to demonstrate the performance of the proposed method. Experimental results show that our method outperforms both multiple imputation and mean substitution. Moreover, the proposed method was evaluated using five classification problems with incomplete data. Experimental results indicate that the accuracy of classification is maintained or even increased when the proposed method is applied for missing attribute value prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. D.W. Aha, D. Kibler, and M.K. Albert, “Instance-based learning algorithms,” Machine Learning, vol. 6, pp. 37-66, 1991.

    Google Scholar 

  2. M.R. Berthold and K.-P. Huber, “Missing values and learning of fuzzy Rules,” Int. J. Uncertainty, Fuzziness, and Knowledgebased Systems, vol. 6, no. 2, 1998.

  3. C.L. Blake and C.J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository. html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.

    Google Scholar 

  4. W.L. Buntine and A.S. Weigend, “Bayesian backpropagation,” Complex Systems, vol. 5, pp. 603-643, 1991.

    Google Scholar 

  5. B. Cestnik, I. Kononenko, and I. Bratko, “Assistant 86: A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, edited by I. Bratko and N. Lavrac, Sigma Press: Wilmslow, 1987, pp. 31-45.

    Google Scholar 

  6. T.M. Cover and P.E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967.

    Google Scholar 

  7. A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, series B, vol. 39, pp. 1-38, 1977.

    Google Scholar 

  8. J. Deng, “The theory and method of socioeconomic grey systems,” Social Sciences in China, vol. 6, pp. 47-60, 1984 (in Chinese).

    Google Scholar 

  9. J. Deng, “Introduction to grey system theory,” The Journal of Grey System, vol. 1, pp. 1-24, 1989.

    Google Scholar 

  10. J. Deng, “Grey information space,” The Journal of Grey System, vol. 1, pp. 103-117, 1989.

    Google Scholar 

  11. J.K. Dixon, “Pattern recognition with partly missing data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 10, pp. 617-621, 1979.

    Google Scholar 

  12. R. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics Part 2, vol. 7, pp. 179-188, 1936.

    Google Scholar 

  13. E. Fix and J.L. Hodges, “Discriminatory analysis: Nonparametric discrimination: consistency properties,” Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.

    Google Scholar 

  14. Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” in Proc. of the 16th International Conference on Machine Learning, Bled, Slovenia, 1999, pp. 124-133.

  15. Y. Freund and R.E. Schapire, “Large margin classification using the perceptron algorithm,” in Proc. 11th Annual Conf. on Comput. Learning Theory, ACM Press: New York, NY, 1998, pp. 209-217.

    Google Scholar 

  16. J.H. Friedman, “A recursive partitioning decision rule for nonparametric classification,” IEEE Transactions on Computers, pp. 404-408, 1977.

  17. R.C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, pp. 63-91, 1993.

    Google Scholar 

  18. C.C. Huang and H.M. Lee, “An instance-based learning approach based on grey relational structure,” in Proc. of the UK Workshop on Computational Intelligence (UKCI-02), Birmingham, Sept., 2002.

  19. 19. G.H. John and P. Langley, “Estimating continuous distributions in bayesian classifiers,” in Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338-345.

  20. 20. G.C. John and E.T. Leonard, “K*: An instance-based learner using an entropic distance measure,” in Proc. of the 12th International Conference on Machine Learning, 1995, pp. 108-114.

  21. G. King, J. Honaker, A. Joseph, and K. Scheve, “Analyzing incomplete political science data: An alternative algorithm for multiple imputation,” American Political Science Review, vol. 95, no. 1, pp. 49-69, 2001.

    Google Scholar 

  22. R. Kohavi, “The power of decision tables,” in European Conference on Machine Learning, 1995.

  23. C.T. Lin and S.Y.Yang, “Selection of home mortgage loans using grey relational analysis,” The Journal of Grey System, vol. 4, pp. 359-368, 1999.

    Google Scholar 

  24. R.S. Michalski and R.L. Chilausky, “Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis,” International Journal of Policy Analysis and Information Systems, vol. 4, no. 2, 1980.

  25. J.R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986.

    Google Scholar 

  26. J.R. Quinlan, “Unknown attribute values in induction,” in Proc. of the Sixth International Machine Learning Workshop, Morgan Kaufmann: San Mateo, CA, 1989, pp. 164-168.

    Google Scholar 

  27. J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers: San Mateo, CA, 1993.

    Google Scholar 

  28. Y. Reich, “Converging to 'ideal' design knowledge by learning,” in Proc. of the First InternationalWorkshop on Formal Methods in Engineering Design, 1990, pp. 330-349.

  29. D.B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley: New York, 1987.

    Google Scholar 

  30. S. Salzberg, “Exemplar-based learning: Theory and implementation,” Technical Report TR-10-88, Harvard University, Center for Research in Computing Technology, 1988.

  31. C. Stanfill and D. Waltz, “Towards memory-based reasoning,” Communications of the ACM, vol. 29, no. 12, pp. 1213-1228, 1986.

    Google Scholar 

  32. M. Stone, “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society, vol. B, 36, pp. 111-147, 1974.

    Google Scholar 

  33. C.J. Watson, P. Billingsley, D.J. Croft and D.V. Huntsberger, Statistics for Management and Economics, 5th edition, Allyn and Bacon, Boston, 1993.

    Google Scholar 

  34. I.Witten and E. Frank, Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann: San Francisco, CA, 2000.

    Google Scholar 

  35. J.H. Wu, M.L. You, and K.L. Wen, “A modified grey relational analysis,” The Journal of Grey System, vol. 3, pp. 287-292, 1999.

    Google Scholar 

Download references

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, CC., Lee, HM. A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction. Applied Intelligence 20, 239–252 (2004). https://doi.org/10.1023/B:APIN.0000021416.41043.0f

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:APIN.0000021416.41043.0f

Navigation