Skip to main content

Advertisement

Log in

Effective and complete discovery of bidirectional order dependencies via set-based axioms

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Integrity constraints (ICs) are useful for expressing and enforcing application semantics. Formulating ICs manually, however, requires domain expertise, is prone to human error, and can be exceedingly time-consuming. Thus, methods for automatic discovery have been developed for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). ODs properly subsume FDs and can express business rules involving order; e.g., an employee who pays higher taxes has a higher salary than another employee. Bidirectional ODs further allow different ordering directions, ascending and descending, as in SQL’s order-by; e.g., a student with an alphabetically lower letter grade has a higher percentage grade than another student. We address the limitations of prior work on automatic OD discovery, which has factorial complexity, is incomplete, and is not concise. We present an efficient bidirectional OD discovery algorithm enabled by a novel polynomial mapping to a canonical form, and a sound and complete set of axioms for canonical bidirectional ODs to prune the search space. Our algorithm has exponential worst-case time complexity in the number of attributes and linear complexity in the number of tuples. We prove that it produces a complete and minimal set of bidirectional ODs, and we experimentally show orders of magnitude performance improvements over the prior state-of-the-art methodologies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. FASTOD-BID and TANE differ in many details, even for FD discovery; e.g., a pruning rule for removing nodes from the lattice (Sect. 4.5) and the key pruning rule (Sect. 4.6). Additionally, FASTOD-BID includes new OD-specific pruning rules.

  2. http://archive.ics.uci.edu/ml.

  3. http://metanome.de.

  4. www.uniprot.org.

  5. http://sourceforge.net/projects/dbtesma.

References

  1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.: Fast discovery of association rules. In: Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 307–328. AAAI Press, Menlo Park (1996)

    Google Scholar 

  2. Bläsius, T., Friedrich, T., Schirneck, M.: The parameterized complexity of dependency detection in relational databases. In: IPEC, pp. 6:1–6:13 (2016)

  3. Chu, X., Ilyas, I., Papotti, P.: Discovering denial constraints. PVLDB 6(13), 1498–1509 (2013)

    Google Scholar 

  4. Chu, X., Ilyas, I., Papotti, P.: Holistic data cleaning: putting violations into context. In: ICDE, pp. 458–469 (2013)

  5. Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB, pp. 315–326 (2007)

  6. Dong, J., Hull, R.: Applying approximate order dependency to reduce indexing space. In: SIGMOD, pp. 119–127 (1982)

  7. Ginsburg, S., Hull, R.: Order dependency in the relational model. TCS 26(1), 149–195 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  8. Golab, L., Karloff, H., Korn, F., Srivastava, D.: Sequential dependencies. PVLDB 2(1), 574–585 (2009)

    Google Scholar 

  9. Guravannavar, R., Ramanujam, H., Sudarshan, S.: Optimizing nested queries with parameter sort orders. In: VLDB, pp. 481–492 (2005)

  10. Huhtala, Y., Kärkkäinen, J., Porkka, P., Toivonen, H.: Efficient discovery of functional and approximate dependencies using partitions. In: ICDE, pp. 392–401 (1998)

  11. Ilyas, I., Markl, V., Haas, P., Brown, P., Aboulnaga, A.: CORDS: automatic discovery of correlations and soft functional dependencies. In: SIGMOD, pp. 647–658 (2004)

  12. Langer, P., Naumann, F.: Efficient order dependency detection. VLDB J. 25(2), 223–241 (2016)

    Article  Google Scholar 

  13. Malkemus, T., Padmanabhan, S., Bhattacharjee, B., Cranston, L.: Predicate derivation and monotonicity detection in DB2 UDB. In: ICDE, pp. 939–947 (2005)

  14. Mihaylov, A., Godfrey, P., Golab, L., Kargar, M., Srivastava, D., Szlichta, J.: FastOD: bringing order to data. In: ICDE, System demonstration (2018, to appear)

  15. Ng, W.: An extension of the relational data model to incorporate ordered domains. TODS 26(3), 344–383 (2001)

    Article  MATH  Google Scholar 

  16. Papenbrock, T., Ehrlich, J., Marten, J., Neubert, T., Rudolph, J., Schönberg, M., Zwiener, J., Naumann, F.: Functional dependency discovery: an experimental evaluation of seven algorithms. PVLDB 8(10), 1082–1093 (2015)

    Google Scholar 

  17. Papenbrock, T., Naumann, F.: A hybrid approach to functional dependency discovery. In: SIGMOD, pp. 821–833 (2016)

  18. Prokoshyna, N., Szlichta, J., Chiang, F., Miller, R., Srivastava, D.: Combining quantitative and logical data cleaning. PVLDB 9(4), 300–311 (2015)

    Google Scholar 

  19. Selinger, P., Astrahan, M., Chamberlin, D., Lorie, R., Price, T.: Access path selection in a relational database management system. In: SIGMOD, pp. 23–34 (1979)

  20. Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: SIGMOD, pp. 57–67 (1996)

  21. Sismanis, Y., Brown, P., Haas, P., Reinwald, B.: GORDIAN: efficient and scalable discovery of composite keys. In: VLDB, pp. 691–702 (2006)

  22. Szlichta, J., Godfrey, P., Golab, L., Kargar, M., Srivastava, D.: Effective and complete discovery of order dependencies via set-based axiomatization. PVLDB 10(7), 721–732 (2017)

    Google Scholar 

  23. Szlichta, J., Godfrey, P., Gryz, J.: Fundamentals of order dependencies. PVLDB 5(11), 1220–1231 (2012)

    Google Scholar 

  24. Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Pawluk, P., Zuzarte, C.: Queries on dates: fast yet not blind. In: EDBT, pp. 497–502 (2011)

  25. Szlichta, J., Godfrey, P., Gryz, J., Ma, W., Qiu, W., Zuzarte, C.: Business-intelligence queries with order dependencies in DB2. In: EDBT, pp. 750–761 (2014)

  26. Szlichta, J., Godfrey, P., Gryz, J., Zuzarte, C.: Expressiveness and complexity of order dependencies. PVLDB 6(14), 1858–1869 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaroslaw Szlichta.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Szlichta, J., Godfrey, P., Golab, L. et al. Effective and complete discovery of bidirectional order dependencies via set-based axioms. The VLDB Journal 27, 573–591 (2018). https://doi.org/10.1007/s00778-018-0510-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-018-0510-0

Keywords

Navigation