Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Miled, Zina Ben; Liu, Jin; Bukhres, Omran; Li, Huian; Martin, Jesse; Balagopalakrishna, Chavali; Oppelt, Robert

doi:10.1023/B:JIIS.0000039533.13569.8e

Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Published: September 2004

Volume 23, pages 145–178, (2004)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Zina Ben Miled¹,
Jin Liu²,
Omran Bukhres²,
Huian Li²,
Jesse Martin³,
Chavali Balagopalakrishna³ &
…
Robert Oppelt³

71 Accesses
1 Altmetric
Explore all metrics

Abstract

Scientific databases, and in particular chemical and biological databases, have reached massive sizes in recent years due to the improvement of bench-side high throughput screening tools used by scientists. This rapid increase has caused a shift in the bottleneck in discovery and product development from the bench side to the computational side, thus, creating a need for new computational tools that can facilitate the access and interpretation of such massive data.

This paper discusses the design and implementation of the computation of a histogram to speed up access to large pharmaceutical databases. As opposed to traditional histograms in which approximate value distributions is obtained by grouping attribute values into buckets, the computation histogram proposed in this paper records the retrieval time and the calculation time of descriptors in a pharmaceutical drug candidate database. Both on-line and off-line update techniques are proposed to update the computation histogram so that an efficient query plan can be generated.

The efficiency of the proposed computation histogram is demonstrated by using a drug candidate database which is used in the pharmaceutical drug discovery process. The histogram allows the result of a query to be either computed using a computational algorithm or retrieved from the database. In addition to the pharmaceutical drug candidate database, the proposed approach is applicable to other scientific databases such as biological and agroscience databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

DB-GPT: Large Language Model Meets Database

Article Open access 19 January 2024

Big data preprocessing: methods and prospects

Article Open access 01 November 2016

References

Aboulnaga, A. and Chaudhuri, S. (1999). Self-Tuning Histograms: Building HistogramsWithout Looking at Data. In Proceedings of the ACM SIGMOD Conference.
Baru, C.K., et al. (1995). DB2 Parallel Edition. IBM Systems Journal, 34(2), 292–322.
Google Scholar
Ben Miled, Z., Zaitsev, A., Bukhres, O., Bem, M., Jones, R., and Opplet, R. (2000a). Efficient Data Representation in Very Large Datebases: A Case Study of a Pharmaceutical Data Repository. In International Conference on Computers and their Applications.
Ben Miled, Z., Liu, Y., Powers, D., Bukhres, O., Bem, M., Jones, R., Opplet, R., and Milosevich, S. (2000b). Data Access Performance in a Large and Dynamic Pharmaceutical Drug Candidate Database. IEEE Supercomputing.
Chen, C.M. and Roussopoulos, N. (1994). Adaptive Selectivity Estimation Using Query Feedback. In Proceedings of the ACM SIGMOD Conference (pp. 161–172).
Daylight Chemical Information Systems, Inc. SMILES Tutorial. Available at http://www.daylight.com/ dayhtml/smiles/smiles-intro.html.
Elmasri, R. and Navathe, S.B. (2000). Fundamentals of Database System. McGraw Hill.
Gibbons, P.B. and Matias, Y. (1998). New Sampling-Based Summary Statistics for Improving Approximate Query Answers. In Proceedings of the ACM SIGMOD Conference.
Hallmark, G. (1997). Oracle ParallelWarehouse Server. IEEE Transactions on Knowledge and Data Engineering, 314–320.
Informix. Informix Extended Parallel Server 8.3. Available at http://www.informix.com/xps/.
Ioannidis, Y. and Poosala, V. (1995). Balancing Histogram Optimality and Practicality for Query Result Size Estimation. In Proceedings of the ACM SIGMOD Conference (pp. 233–244).
Kabra, N. and DeWitt, D.J. (1998). Efficient Mid-Query Re-Optimization of Sub-Optimal Query Execution Plans. In Proceedings of the ACM SIGMOD Conference (pp. 106–117).
Kooi, R.P. (1980). The Optimization of Queries in Relational Databases. PhD thesis. Case Western Reserver University.
Liu, Y., Ben Miled, Z., Bukhres, O., Bem, M., Jones, R., and Oppelt, R. (2000). Efficient Schema Design for a Pharmaceutical Data Repository. In IEEE symposium on Computer Based Medical Systems.
Locke, P. (1999). Oracle Call Interface: Programmer's Guide. Oracle Corporation.
Matias, Y., Vitter, J.S., and Wang, M. (1998).Wavelet-Based Histograms for Selectivity Estimation. In Proceedings of the ACM SIGMOD Conference (pp. 448–459).
Microsoft. Microsoft SQL Server. Available at http://www.microsoft.com/sql/.
NCR. Teradata Database. Available at http://www.teradata.com/ter/.
Oracle. Oracle9i Release. Available at http://technet.oracle.com/docs/products/oracle9i/doc index.htm.
Poosala, V. (1997). Histogram-Based Estimation Techniques in Database Systems. PhD thesis. University of Wisconsin-Madison.
Shapiro, G.P. and Connell, C. (1984). Accurate Estimation of the Number of Tuples Satisfying a Condition. In Proceedings of the ACM SIGMOD Conference (pp. 256–276).
Stonebraker, M., et al. (1976). The Design and Implementation of INGRES. ACM Transactions on Database Systems, 1(3), 189–222.
Article Google Scholar
Sybase. Available at http://www.sybase.com/products/databaseservers/asiq/.

Download references

Author information

Authors and Affiliations

Electrical & Computer Engineering, School of Eng. & Tech., Indiana University Purdue University, Indianapolis, IN, 46202, USA
Zina Ben Miled
Computer & Information Science, School of Science, Indiana University Purdue University, Indianapolis, IN, 46202, USA
Jin Liu, Omran Bukhres & Huian Li
Eli Lilly & Company, Indianapolis, IN, 46202, USA
Jesse Martin, Chavali Balagopalakrishna & Robert Oppelt

Authors

Zina Ben Miled
View author publications
You can also search for this author in PubMed Google Scholar
Jin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Omran Bukhres
View author publications
You can also search for this author in PubMed Google Scholar
Huian Li
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Martin
View author publications
You can also search for this author in PubMed Google Scholar
Chavali Balagopalakrishna
View author publications
You can also search for this author in PubMed Google Scholar
Robert Oppelt
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Miled, Z.B., Liu, J., Bukhres, O. et al. Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository. Journal of Intelligent Information Systems 23, 145–178 (2004). https://doi.org/10.1023/B:JIIS.0000039533.13569.8e

Download citation

Issue Date: September 2004
DOI: https://doi.org/10.1023/B:JIIS.0000039533.13569.8e

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

DB-GPT: Large Language Model Meets Database

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

DB-GPT: Large Language Model Meets Database

Big data preprocessing: methods and prospects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation