Abstract
This paper describes a novel approach to estimate the size of database query results using neural networks. Using the proposed approach, three layer neural networks are constructed and trained to learn the cumulative distribution functions of attribute values in relations. With a trained network, the estimation of the query result size could be obtained instantly by simply computing the network output from the given query predicates. The basic computational model using a cumulative distribution function to compute the query result size is described. The network construction and training is discussed. Comprehensive experiments were conducted to study the effectiveness of the proposed approach. The results indicate that the approach produces estimates with accuracies that are comparable with or higher than those reported in the literature.
Similar content being viewed by others
References
S. Christodoulakis, “Estimating block transfers and join sizes,” in Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 1983, pp. 40–54.
J. Fedorowicz, “Database evaluation using multiple regression techniques,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Boston, MA, 1984, pp. 70–76.
G. Piatesky-Shapiro and C. Connell, “Accurate estimation of the number of tuples satisfying a condition,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Boston, MA, 1984, pp. 256–275.
W.-C. Hou, G. Ozsoyoglu, and B.K. Taneja, “Statistical estimators for relational algebra expressions,” in Proceedings of ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Austin, TX, March 1988, pp. 276–287.
C.A. Lynch, “Selectivity estimation and query optimization in large databases with highly skewed distribution of column values,” in Proceedings of 14th International Conference on Very Large Data Bases, Los Angeles, CA, 1988, pp. 240–251.
M. Muralikrishna and D. DeWitt, “Equidepth histograms for estimating selectivity factors for multi-dimensional queries,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Chicago, IL, 1988, pp. 28–36.
R.J. Lipton and J.F. Naughton, “Practical selectivity estimation through adaptive sampling,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, 1990, pp. 1–11.
W.-C. Hou and G. Ozsoyoglu, “Statistical estimator for aggregate relational algebra queries,” ACM Transactions on Database Systems, vol. 16, no. 4, pp. 600–654, 1991.
Y.E. Ioannidis and S. Christodoulakis, “On the propagation of errors in the size of join results,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Denver, CO, 1991, pp. 268–277.
P. Haas and A. Swami, “Sequential sampling procedures for query size estimation,” in Proceedings ofACM SIGMODConference on Data Management, San Diego, CA, May 1992, pp. 341–350.
W. Sun, Y. Ling, N. Rishe, and Y. Deng, “An instant and accurate size estimation method for joins and selection in a retrieval-intensive environment,” in Proceedings of ACM SIGMOD Conference on Data Management, Washington, DC, 1993, pp. 79–88.
C.M. Chen and N. Roussopoulos, “Adaptive selectivity estimating using query feedback,” in Proceedings of ACM SIGMOD Conference on Data Management, Minneapolis, MN, May 1994, pp. 161–172.
B. Harangsri, J. Shepherd, and A.H.H. Ngu, “Query size estimation using machine learning,” in Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), Melbourne, Australia, 1997, pp. 97–106.
G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, pp. 303–314, 1989.
K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.
R. Battiti, “First-and second-order methods for learning: between steepest descent and Newton's method,” Neural Computation, vol. 4, pp. 141–166, 1992.
R.L. Watrous, “Learning algorithms for connectionist networks: applied gradient methods for nonlinear optimization,” in Proceedings of IEEE First International Conference on Neural Networks, IEEE Press: New York, 1987, pp. 619–627.
D.F. Shanno and K.H. Phua, “Algorithm 500: minimization of unconstrained multivariate functions,” ACM Transaction on Mathematical Software, vol. 2, no. 1, pp. 87–96, 1976.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lu, H., Setiono, R. Effective Query Size Estimation Using Neural Networks. Applied Intelligence 16, 173–183 (2002). https://doi.org/10.1023/A:1014333932021
Issue Date:
DOI: https://doi.org/10.1023/A:1014333932021