Skip to main content

Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS

  • Chapter
Database Support for Data Mining Applications

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2682))

  • 384 Accesses

Abstract

The conflict between resource consumption and query performance in the data mining context often has no satisfactory solution. This is in sharp contrast to the needs of the analysts for interactive response times and has rendered the seamless integration of data mining operators into common multiuser database systems a difficult and (so far) not very successful task. This paper describes an approach that allows to combine preprocessing and data mining operators into one common KDD-aware implementation algebra such that interactivity, scalability and resource efficiency can simultaneously be achieved. The basic idea of our framework is pipelining. However, since there is a danger of blocking pipelines, we introduce controlled ordering-, cardinality- and special-value-properties of the data stream across the whole query tree up to the complex data mining operators. The framework builds on a spezialized index that is basically an extension of the UB-Tree and efficiently provides various data orderings. These orderings and the remaining properties are then exploited by the KDD-algebra operators to release results and internal data structures early enough to allow pipelined, resource-efficient query processing with interactive response times. This paper describes the framework and demonstrates its benefits in preprocessing and in the parallel and interactive detection of outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bayer, R.: The universal B-tree for multidimensional indexing. Technical Report TUM-I9637, TU München (November 1996)

    Google Scholar 

  2. DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6), 85–98 (1992)

    Article  Google Scholar 

  3. Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: Proceedings of the 28th VLDB Conferende (2002)

    Google Scholar 

  4. Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, June 1–3, 1999, pp. 287–298. ACM Press, New York (1999)

    Chapter  Google Scholar 

  5. Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: Dmql: A data mining query language for relational databases. In: Proceddings of the SIGMOD’96 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Kanada, June 1996, pp. 27–34 (1996)

    Google Scholar 

  6. Hilbert, D.: Über die stetige Abbildung eine Linie auf ein Flächenstück. Mathematische Annalen (1891)

    Google Scholar 

  7. Ives, Z., Florescu, D., Friedmann, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: Proceddings of the ACM SIGMOD Conference (1999)

    Google Scholar 

  8. Jagadish, H.V.: Linear clustering of objects with multiple atributes. In: Garcia-Molina, H., Jagadish, H.V. (eds.) Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23–25, 1990, pp. 332–342. ACM Press, New York (1990)

    Chapter  Google Scholar 

  9. Johnson, T., Lakshmanan, L.V.S., Ng, R.T.: The 3w model and algebra for unified data mining. In: Proceedings of the 26th VLDB Conference, Kairo, Egypt (2000)

    Google Scholar 

  10. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, USA (1998)

    Google Scholar 

  11. Manegold, S., Waas, F., Kersten, M.L.: On optimal pipeline processing in parallel query execution. Technical report, CWI, Amsterdam (February 1998), http://www.cwi.nl/ftp/CWIreports/INS/INS-R9805.ps.Z

  12. Markl, V., Zirkel, M., Bayer, R.: Processing operations with restrictions in rdbms without external sorting: The tetris algorithm. In: Proceedings of the 15th International Conference on Data Engineering, Sydney, Austrialia, March 23–26, 1999, pp. 562–571. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  13. Orenstein, J.A., Merrett, T.H.: A class of data structures for associative searching. In: Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, Ontario, Canada, April 2–4, 1984, pp. 181–190. ACM, New York (1984)

    Chapter  Google Scholar 

  14. Philippsen, M., Zenger, M.: Javaparty - transparent remote objects in java. In: Concurrency: Practice and Experience (1997)

    Google Scholar 

  15. Raman, V., Raman, B., Hellerstein, J.M.: Online dynamic reordering for interactive data processing. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)

    Google Scholar 

  16. Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proc. of the SIGMOD Conference (1979)

    Google Scholar 

  17. Spiliopoulou, M., Hatzopoulos, M., Vassilakis, C.: A cost model for the estimation of query execution time in a parallel environment supporting pipeline. In: Computers and Artificial Intelligence (1996)

    Google Scholar 

  18. Urhan, T., Franklin, M.J.: Xjoin: A reactively-scheduled pipelining join operator. IEEE Data Engineering Bulletin (2000)

    Google Scholar 

  19. Urhan, T., Franklin, M.J.: Dynamic pipeline scheduling for improving interactive performance of online queries. In: Proceedings of the 27th Intl. Conference on Very Large Data Bases (2001)

    Google Scholar 

  20. Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the First International Conference on Parallel and Distributed Information Systems, Miami Beach, December 1991, pp. 68–77 (1991)

    Google Scholar 

  21. Wilschut, A.N., van Gils, S.A.: A model for pipelined query execution. In: Proceedings of the MASCOTS93 Syposium (1993)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gimbel, M., Klein, M., Lockemann, P.C. (2004). Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-44497-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22479-2

  • Online ISBN: 978-3-540-44497-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics