Skip to main content
Log in

Effective Query Size Estimation Using Neural Networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper describes a novel approach to estimate the size of database query results using neural networks. Using the proposed approach, three layer neural networks are constructed and trained to learn the cumulative distribution functions of attribute values in relations. With a trained network, the estimation of the query result size could be obtained instantly by simply computing the network output from the given query predicates. The basic computational model using a cumulative distribution function to compute the query result size is described. The network construction and training is discussed. Comprehensive experiments were conducted to study the effectiveness of the proposed approach. The results indicate that the approach produces estimates with accuracies that are comparable with or higher than those reported in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Christodoulakis, “Estimating block transfers and join sizes,” in Proceedings of ACM SIGMOD International Conference on Management of Data, New York, 1983, pp. 40–54.

  2. J. Fedorowicz, “Database evaluation using multiple regression techniques,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Boston, MA, 1984, pp. 70–76.

  3. G. Piatesky-Shapiro and C. Connell, “Accurate estimation of the number of tuples satisfying a condition,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Boston, MA, 1984, pp. 256–275.

  4. W.-C. Hou, G. Ozsoyoglu, and B.K. Taneja, “Statistical estimators for relational algebra expressions,” in Proceedings of ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Austin, TX, March 1988, pp. 276–287.

  5. C.A. Lynch, “Selectivity estimation and query optimization in large databases with highly skewed distribution of column values,” in Proceedings of 14th International Conference on Very Large Data Bases, Los Angeles, CA, 1988, pp. 240–251.

  6. M. Muralikrishna and D. DeWitt, “Equidepth histograms for estimating selectivity factors for multi-dimensional queries,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Chicago, IL, 1988, pp. 28–36.

  7. R.J. Lipton and J.F. Naughton, “Practical selectivity estimation through adaptive sampling,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, 1990, pp. 1–11.

  8. W.-C. Hou and G. Ozsoyoglu, “Statistical estimator for aggregate relational algebra queries,” ACM Transactions on Database Systems, vol. 16, no. 4, pp. 600–654, 1991.

    Google Scholar 

  9. Y.E. Ioannidis and S. Christodoulakis, “On the propagation of errors in the size of join results,” in Proceedings of ACM SIGMOD International Conference on Management of Data, Denver, CO, 1991, pp. 268–277.

  10. P. Haas and A. Swami, “Sequential sampling procedures for query size estimation,” in Proceedings ofACM SIGMODConference on Data Management, San Diego, CA, May 1992, pp. 341–350.

  11. W. Sun, Y. Ling, N. Rishe, and Y. Deng, “An instant and accurate size estimation method for joins and selection in a retrieval-intensive environment,” in Proceedings of ACM SIGMOD Conference on Data Management, Washington, DC, 1993, pp. 79–88.

  12. C.M. Chen and N. Roussopoulos, “Adaptive selectivity estimating using query feedback,” in Proceedings of ACM SIGMOD Conference on Data Management, Minneapolis, MN, May 1994, pp. 161–172.

  13. B. Harangsri, J. Shepherd, and A.H.H. Ngu, “Query size estimation using machine learning,” in Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA), Melbourne, Australia, 1997, pp. 97–106.

  14. G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, pp. 303–314, 1989.

    Google Scholar 

  15. K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359–366, 1989.

    Google Scholar 

  16. R. Battiti, “First-and second-order methods for learning: between steepest descent and Newton's method,” Neural Computation, vol. 4, pp. 141–166, 1992.

    Google Scholar 

  17. R.L. Watrous, “Learning algorithms for connectionist networks: applied gradient methods for nonlinear optimization,” in Proceedings of IEEE First International Conference on Neural Networks, IEEE Press: New York, 1987, pp. 619–627.

    Google Scholar 

  18. D.F. Shanno and K.H. Phua, “Algorithm 500: minimization of unconstrained multivariate functions,” ACM Transaction on Mathematical Software, vol. 2, no. 1, pp. 87–96, 1976.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, H., Setiono, R. Effective Query Size Estimation Using Neural Networks. Applied Intelligence 16, 173–183 (2002). https://doi.org/10.1023/A:1014333932021

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014333932021

Navigation