Skip to main content

Advertisement

Log in

An efficient method for estimating null values in relational databases

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Generally, a database system containing null value attributes will not operate properly. This study proposes an efficient and systematic approach for estimating null values in a relational database which utilizes clustering algorithms to cluster data, and a regression coefficient to determine the degree of influence between different attributes. Two databases are used to verify the proposed method: (1) Human resource database; and (2) Waugh's database. Furthermore, the mean of absolute error rate (MAER) and average error are used as evaluation criteria to compare the proposed method with other methods. It demonstrates that the proposed method is superior to existing methods for estimating null values in relational database systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Babad YM, Hoffer JA (1984) Even no data has a value. Commun ACM 27:748–757

    Article  Google Scholar 

  2. Bradley PS, Fayyad U (1968) Refining initial points for k-means clustering. In: Proceedings of the 15th international conference on machine learning, pp 91–99

  3. Bradley PS, Fayyad U, Reina C (1998) Scaling clustering algorithms to large databases. The 4th International conference on knowledge discovery and data mining, New York, USA

  4. Chen SM, Yeh MS (1997) Generating fuzzy rules from relational database systems for estimating null values. Cybern Syst: Int J 28:695–723

    Article  MATH  Google Scholar 

  5. Chen SM, Chen HH (2000) Estimating null values in the distributed relational databases environments. Cybern Syst: Int J 31:851–871

    Article  MATH  Google Scholar 

  6. Chen SM, Yeh MS (2002) A method for generating fuzzy rules from relational database systems for estimating null values. Intell Syst: Technol Appl 4:157–179

    Google Scholar 

  7. Chen SM, Huang CM (2003) Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Trans Fuzzy Syst 11:495–506

    Article  Google Scholar 

  8. Chen SM, Lee SW (2003) A new method to generate fuzzy rules from relational database systems for estimating null values. Cybern Syst: Int J 34:33–57

    Article  MATH  Google Scholar 

  9. Chen SM, Hsiao HR (2005) A new method to estimate null values in relational database systems based on automatic clustering techniques. Inf Sci: Int J 169:47–69

    MATH  Google Scholar 

  10. Codd EF (1979) Extending the database relational model to capture more meaning. ACM Trans Database Syst 4:397–434

    Article  Google Scholar 

  11. Daniel WW, Terrell JC (1995) Business statistics for management and economics, 7th edn. Houghton Mifflin Company, Botson, MA

    Google Scholar 

  12. Davidson JW, Savic DA, Walters GA (2003) Symbolic and numerical regression: experiments and applications. Inf Sci 150:95–117

    Article  Google Scholar 

  13. Draper NR, Smith H (1998) Applied regression analysis. Wiley, New York, NY

    MATH  Google Scholar 

  14. Feng L, Lu H (2004) Managing multiuser database buffers using data mining techniques. Knowledge Inf Syst 6(6):679–709

    Article  Google Scholar 

  15. Forgy E (1965) Cluster analysis of multivariate data: Efficiency vs. interpretability of classifications. Biometrics 21:768

    Google Scholar 

  16. Hair JF Jr, Anderson RE, Tatham RL, Black BW (1998) Multivariate data analysis with readings, 4th edn. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  17. Han J, Kamber M (2000) Data mining: Concepts and techniques. Morgan Kaufmann, NY

    Google Scholar 

  18. Holt JD, Chung SM (1996) Multipass algorithms for mining association rules in text databases. Knowledge Inf Syst 3(2):168–183

    Article  Google Scholar 

  19. Huang X, Zhu Q (2001) A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets. Pattern Recognit Lett 23:1613–1622

    Article  Google Scholar 

  20. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and probability, Berkeley, NY

  21. McClave TJ, Benson GP, Sincich T (2001) Statistics for Business and Economics 8th edn. Prentice-Hall, NJ

    Google Scholar 

  22. Parsons S (1996) Current approaches to handling imperfect information in data and knowledge bases. IEEE Trans Knowledge Data Eng 8:353–372

    Article  Google Scholar 

  23. Waugh F (1928) Quality factors influencing vegetable prices. J Farm Economics 10:185–196

    Article  Google Scholar 

  24. Young HP (1988) Condorcet's theory of voting. Am Political Sci Rev 82:1231–1244

    Article  Google Scholar 

  25. Zaniolo C (1984) Database relations with null values. Comput Syst Sci 28:142–166

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jia-Wen Wang.

Additional information

Jia-Wen Wang was born on September 5, 1978, in Taipei, Taiwan, Republic of China. She received the M.S. degree in information management from the National Yunlin University of Science and Technology, Yunlin, Taiwan, in 2003. Since 2003, she has been a PhD degree student in Information Management Department at the National Yunlin University of Science and Technology. Her current research interests include fuzzy systems, database systems, and artificial intelligence.

Ching-Hsue Cheng received the B.S. degree in mathematics from Chinese Military Academy, Taiwan, in 1982, the M.S. degree in applied mathematics from the Chung Yuan Christian University, Taiwan, in 1988, and the Ph.D. degree in system engineering and management from National Defence University, Taiwan, in 1994. Currently, he is a professor of the Department of Information Management, National YunLin University of Technology & Science. His research interests are in decision science, soft computing, software reliability, performance evaluation, and fuzzy time series. He has published more than 120 refereed papers in these areas. He has been a principal investigator and project leader in a number of projects with government, and other research-sponsoring agencies.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, JW., Cheng, CH. An efficient method for estimating null values in relational databases. Knowl Inf Syst 12, 379–394 (2007). https://doi.org/10.1007/s10115-006-0028-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-006-0028-4

Keywords

Navigation