Abstract
Missing data value is an extensive problem in both research and industrial developers. Two general approaches are there to deal with the problem of missing values in databases, they either could be ignored (removed) or imputed (filled in) with new values [10]. For some SQL tables it is possible that some candidate key of the table is not null-free and this needs to be handled. Possible keys and certain keys to deal with this situation were introduced in [17]. In the present paper we introduce an intermediate concept called strongly possible keys that is based on a data mining approach using only information already contained in the SQL table. A strongly possible key is a key that holds for some possible world which is obtained by replacing any occurrences of nulls with some values already appearing in the corresponding attributes. Implication among strongly possible keys is characterized and Armstrong tables are constructed. An algorithm to verify a strongly possible key is given applying bipartite matching. Connection between matroid intersection problem and system of strongly possible keys is established.
Keywords
Research of the second author was partially supported by the National Research, Development and Innovation Office (NKFIH) grant K–116769. This work is also connected to the scientific program of the “Development of quality-oriented and harmonized R+D+I strategy and functional model at BME” project, supported by the New Hungary Development Plan (Project ID: TÁMOP 4.2.1/B-09/1/KMR-2010-0002).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Acuña, E., Rodriguez, C.: The treatment of missing values and its effect on classifier accuracy. In: Banks, D., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering, and Data Mining Applications. Studies in Classification, Data Analysis, and Knowledge Organisation, pp. 639–647. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-642-17103-1_60
Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)
Sree Dhevi, A.T.: Imputing missing values using Inverse Distance Weighted Interpolation for time series data. In: Sixth International Conference on Advanced Computing (ICoAC), Chennai, pp. 255–259 (2014). https://doi.org/10.1109/ICoAC.2014.7229721
Chang, G., Ge, T.: Comparison of missing data imputation methods for traffic flow. In: Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), Changchun, pp. 639–642 (2011). https://doi.org/10.1109/TMEE.2011.6199284
Cheng, C., Wei, L., Lin, T.: Improving relational database quality based on adaptive learning method for estimating null value. In: Second International Conference on Innovative Computing, Informatio and Control (ICICIC 2007), Kumamoto, p. 81 (2007). https://doi.org/10.1109/ICICIC.2007.350
Codd, E.F.: The Relational Model for Database Management, Version 2. Addison-Wesley Publishing Company, Boston (1990)
Date, C.J.: NOT Is Not “Not”! (Notes on Three-Valued Logic and Related Matters) in Relational Database Writings 1985–1989. Addison-Wesley Reading, Boston (1990)
Fagin, R.: Horn clauses and database dependencies. J. ACM 29(4), 952–985 (1982)
Farhangfar, A., Kurgan, L.A., Pedrycz, W.: Experimental analysis of methods for imputation of missing values in databases. In: Proceedings of SPIE 5421, Intelligent Computing: Theory and Applications II, 12 April 2004. https://doi.org/10.1117/12.542509
Farhangfar, A., Kurgan, L.A., Pedrycz, W.: A novel framework for imputation of missing values in databases. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 37(5), 692–709 (2007)
Farhangfar, A., Kurgan, L.A., Dy, J.: Impact of imputation of missing values on classification error for discrete data. Pattern Recogn. 41(12), 3692–3705 (2008)
Ferrarotti, F., Hartmann, S., Le, V.B.T., Link, S.: Codd table representations under weak possible world semantics. In: Hameurlain, A., Liddle, S.W., Schewe, K.-D., Zhou, X. (eds.) DEXA 2011. LNCS, vol. 6860, pp. 125–139. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23088-2_9
Garey, M.R., Johnson, D.S.: Computers and Intractability. A guide to the Theory of NP-Completeness. Freeman, New York (1979)
Hartmann, S., Kirchberg, M., Link, S.: Design by example for SQL table definitions with functional dependencies. VLDB J. 21(1), 121–144 (2012)
Hartmann, S., Leck, U., Link, S.: On Codd families of keys over incomplete relations. Comput. J. 54(7), 1166–1180 (2010)
Grzymala-Busse, J.W., Hu, M.: A comparison of several approaches to missing attribute values in data mining. In: Ziarko, W., Yao, Y. (eds.) RSCTC 2000. LNCS (LNAI), vol. 2005, pp. 378–385. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45554-X_46
Köhler, H., Leck, U., Link, S., Zhou, X.: Possible and certain keys for SQL. VLDB J. 25(4), 571–596 (2016)
Köhler, H., Link, S.: SQL schema design: foundations, normal forms, and normalization. Inf. Syst. 76, 88–113 (2018)
Lawler, E.L.: Matroid intersection algorithms. Math. Program. 9, 31–56 (1975)
Levene, M., Loizou, G.: Axiomatisation of functional dependencies in incomplete relations. J. Theor. Comput. Sci. 206(1–2), 283–300 (1998)
Mannila, H., Rähä, K.-J.: Design of Relational Databases. Addison-Wesley, Boston (1992)
Sali, A., Schewe, K.-D.: Keys and Armstrong databases in trees with restructuring. Acta Cybernetica 18(3), 529–556 (2008)
Welsh, D.J.A.: Matroid Theory. Academic Press, New York (1976)
Zhang, S., Qin, Z., Ling, C.X., Sheng, S.: “Missing is Useful”: missing values in cost-sensitive decision trees. IEEE Trans. Knowl. Data Eng. 17(12), 1689–1693 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Alattar, M., Sali, A. (2019). Keys in Relational Databases with Nulls and Bounded Domains. In: Welzer, T., Eder, J., Podgorelec, V., Kamišalić Latifić, A. (eds) Advances in Databases and Information Systems. ADBIS 2019. Lecture Notes in Computer Science(), vol 11695. Springer, Cham. https://doi.org/10.1007/978-3-030-28730-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-28730-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28729-0
Online ISBN: 978-3-030-28730-6
eBook Packages: Computer ScienceComputer Science (R0)