A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

Huang, Chi-Chun; Lee, Hahn-Ming

doi:10.1023/B:APIN.0000021416.41043.0f

A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

Published: May 2004

Volume 20, pages 239–252, (2004)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chi-Chun Huang &
Hahn-Ming Lee

270 Accesses
33 Citations
Explore all metrics

Abstract

This paper proposes a grey-based nearest neighbor approach to predict accurately missing attribute values. First, grey relational analysis is employed to determine the nearest neighbors of an instance with missing attribute values. Accordingly, the known attribute values derived from these nearest neighbors are used to infer those missing values. Two datasets were used to demonstrate the performance of the proposed method. Experimental results show that our method outperforms both multiple imputation and mean substitution. Moreover, the proposed method was evaluated using five classification problems with incomplete data. Experimental results indicate that the accuracy of classification is maintained or even increased when the proposed method is applied for missing attribute value prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

D.W. Aha, D. Kibler, and M.K. Albert, “Instance-based learning algorithms,” Machine Learning, vol. 6, pp. 37-66, 1991.
Google Scholar
M.R. Berthold and K.-P. Huber, “Missing values and learning of fuzzy Rules,” Int. J. Uncertainty, Fuzziness, and Knowledgebased Systems, vol. 6, no. 2, 1998.
C.L. Blake and C.J. Merz, UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository. html]. Irvine, CA: University of California, Department of Information and Computer Science, 1998.
Google Scholar
W.L. Buntine and A.S. Weigend, “Bayesian backpropagation,” Complex Systems, vol. 5, pp. 603-643, 1991.
Google Scholar
B. Cestnik, I. Kononenko, and I. Bratko, “Assistant 86: A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, edited by I. Bratko and N. Lavrac, Sigma Press: Wilmslow, 1987, pp. 31-45.
Google Scholar
T.M. Cover and P.E. Hart, “Nearest neighbor pattern classification,” IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, series B, vol. 39, pp. 1-38, 1977.
Google Scholar
J. Deng, “The theory and method of socioeconomic grey systems,” Social Sciences in China, vol. 6, pp. 47-60, 1984 (in Chinese).
Google Scholar
J. Deng, “Introduction to grey system theory,” The Journal of Grey System, vol. 1, pp. 1-24, 1989.
Google Scholar
J. Deng, “Grey information space,” The Journal of Grey System, vol. 1, pp. 103-117, 1989.
Google Scholar
J.K. Dixon, “Pattern recognition with partly missing data,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 10, pp. 617-621, 1979.
Google Scholar
R. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics Part 2, vol. 7, pp. 179-188, 1936.
Google Scholar
E. Fix and J.L. Hodges, “Discriminatory analysis: Nonparametric discrimination: consistency properties,” Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolph Field, Texas, 1951.
Google Scholar
Y. Freund and L. Mason, “The alternating decision tree learning algorithm,” in Proc. of the 16th International Conference on Machine Learning, Bled, Slovenia, 1999, pp. 124-133.
Y. Freund and R.E. Schapire, “Large margin classification using the perceptron algorithm,” in Proc. 11th Annual Conf. on Comput. Learning Theory, ACM Press: New York, NY, 1998, pp. 209-217.
Google Scholar
J.H. Friedman, “A recursive partitioning decision rule for nonparametric classification,” IEEE Transactions on Computers, pp. 404-408, 1977.
R.C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, pp. 63-91, 1993.
Google Scholar
C.C. Huang and H.M. Lee, “An instance-based learning approach based on grey relational structure,” in Proc. of the UK Workshop on Computational Intelligence (UKCI-02), Birmingham, Sept., 2002.
19. G.H. John and P. Langley, “Estimating continuous distributions in bayesian classifiers,” in Proc. of the Eleventh Conference on Uncertainty in Artificial Intelligence, 1995, pp. 338-345.
20. G.C. John and E.T. Leonard, “K*: An instance-based learner using an entropic distance measure,” in Proc. of the 12th International Conference on Machine Learning, 1995, pp. 108-114.
G. King, J. Honaker, A. Joseph, and K. Scheve, “Analyzing incomplete political science data: An alternative algorithm for multiple imputation,” American Political Science Review, vol. 95, no. 1, pp. 49-69, 2001.
Google Scholar
R. Kohavi, “The power of decision tables,” in European Conference on Machine Learning, 1995.
C.T. Lin and S.Y.Yang, “Selection of home mortgage loans using grey relational analysis,” The Journal of Grey System, vol. 4, pp. 359-368, 1999.
Google Scholar
R.S. Michalski and R.L. Chilausky, “Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis,” International Journal of Policy Analysis and Information Systems, vol. 4, no. 2, 1980.
J.R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986.
Google Scholar
J.R. Quinlan, “Unknown attribute values in induction,” in Proc. of the Sixth International Machine Learning Workshop, Morgan Kaufmann: San Mateo, CA, 1989, pp. 164-168.
Google Scholar
J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers: San Mateo, CA, 1993.
Google Scholar
Y. Reich, “Converging to 'ideal' design knowledge by learning,” in Proc. of the First InternationalWorkshop on Formal Methods in Engineering Design, 1990, pp. 330-349.
D.B. Rubin, Multiple Imputation for Nonresponse in Surveys, Wiley: New York, 1987.
Google Scholar
S. Salzberg, “Exemplar-based learning: Theory and implementation,” Technical Report TR-10-88, Harvard University, Center for Research in Computing Technology, 1988.
C. Stanfill and D. Waltz, “Towards memory-based reasoning,” Communications of the ACM, vol. 29, no. 12, pp. 1213-1228, 1986.
Google Scholar
M. Stone, “Cross-validatory choice and assessment of statistical predictions,” Journal of the Royal Statistical Society, vol. B, 36, pp. 111-147, 1974.
Google Scholar
C.J. Watson, P. Billingsley, D.J. Croft and D.V. Huntsberger, Statistics for Management and Economics, 5th edition, Allyn and Bacon, Boston, 1993.
Google Scholar
I.Witten and E. Frank, Data Mining-Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann: San Francisco, CA, 2000.
Google Scholar
J.H. Wu, M.L. You, and K.L. Wen, “A modified grey relational analysis,” The Journal of Grey System, vol. 3, pp. 287-292, 1999.
Google Scholar

Download references

Authors

Chi-Chun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hahn-Ming Lee
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, CC., Lee, HM. A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction. Applied Intelligence 20, 239–252 (2004). https://doi.org/10.1023/B:APIN.0000021416.41043.0f

Download citation

Issue Date: May 2004
DOI: https://doi.org/10.1023/B:APIN.0000021416.41043.0f

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

Abstract

Access this article

Similar content being viewed by others

Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

Missing data imputation using decision trees and fuzzy clustering with iterative learning

References

Rights and permissions

About this article

Cite this article

Navigation

A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction

Abstract

Access this article

Similar content being viewed by others

Missing data imputation by K nearest neighbours based on grey relational structure and mutual information

A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors

Missing data imputation using decision trees and fuzzy clustering with iterative learning

References

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation