Abstract
The conflict between resource consumption and query performance in the data mining context often has no satisfactory solution. This is in sharp contrast to the needs of the analysts for interactive response times and has rendered the seamless integration of data mining operators into common multiuser database systems a difficult and (so far) not very successful task. This paper describes an approach that allows to combine preprocessing and data mining operators into one common KDD-aware implementation algebra such that interactivity, scalability and resource efficiency can simultaneously be achieved. The basic idea of our framework is pipelining. However, since there is a danger of blocking pipelines, we introduce controlled ordering-, cardinality- and special-value-properties of the data stream across the whole query tree up to the complex data mining operators. The framework builds on a spezialized index that is basically an extension of the UB-Tree and efficiently provides various data orderings. These orderings and the remaining properties are then exploited by the KDD-algebra operators to release results and internal data structures early enough to allow pipelined, resource-efficient query processing with interactive response times. This paper describes the framework and demonstrates its benefits in preprocessing and in the parallel and interactive detection of outliers.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bayer, R.: The universal B-tree for multidimensional indexing. Technical Report TUM-I9637, TU München (November 1996)
DeWitt, D.J., Gray, J.: Parallel database systems: The future of high performance database systems. Communications of the ACM 35(6), 85–98 (1992)
Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: Proceedings of the 28th VLDB Conferende (2002)
Haas, P.J., Hellerstein, J.M.: Ripple joins for online aggregation. In: Delis, A., Faloutsos, C., Ghandeharizadeh, S. (eds.) SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, June 1–3, 1999, pp. 287–298. ACM Press, New York (1999)
Han, J., Fu, Y., Wang, W., Koperski, K., Zaiane, O.: Dmql: A data mining query language for relational databases. In: Proceddings of the SIGMOD’96 Workshop on Research Issues on Data Mining and Knowledge Discovery, Montreal, Kanada, June 1996, pp. 27–34 (1996)
Hilbert, D.: Über die stetige Abbildung eine Linie auf ein Flächenstück. Mathematische Annalen (1891)
Ives, Z., Florescu, D., Friedmann, M., Levy, A., Weld, D.S.: An adaptive query execution system for data integration. In: Proceddings of the ACM SIGMOD Conference (1999)
Jagadish, H.V.: Linear clustering of objects with multiple atributes. In: Garcia-Molina, H., Jagadish, H.V. (eds.) Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data, Atlantic City, NJ, May 23–25, 1990, pp. 332–342. ACM Press, New York (1990)
Johnson, T., Lakshmanan, L.V.S., Ng, R.T.: The 3w model and algebra for unified data mining. In: Proceedings of the 26th VLDB Conference, Kairo, Egypt (2000)
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, USA (1998)
Manegold, S., Waas, F., Kersten, M.L.: On optimal pipeline processing in parallel query execution. Technical report, CWI, Amsterdam (February 1998), http://www.cwi.nl/ftp/CWIreports/INS/INS-R9805.ps.Z
Markl, V., Zirkel, M., Bayer, R.: Processing operations with restrictions in rdbms without external sorting: The tetris algorithm. In: Proceedings of the 15th International Conference on Data Engineering, Sydney, Austrialia, March 23–26, 1999, pp. 562–571. IEEE Computer Society, Los Alamitos (1999)
Orenstein, J.A., Merrett, T.H.: A class of data structures for associative searching. In: Proceedings of the Third ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Waterloo, Ontario, Canada, April 2–4, 1984, pp. 181–190. ACM, New York (1984)
Philippsen, M., Zenger, M.: Javaparty - transparent remote objects in java. In: Concurrency: Practice and Experience (1997)
Raman, V., Raman, B., Hellerstein, J.M.: Online dynamic reordering for interactive data processing. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D., Lorie, R.A., Price, T.G.: Access path selection in a relational database management system. In: Proc. of the SIGMOD Conference (1979)
Spiliopoulou, M., Hatzopoulos, M., Vassilakis, C.: A cost model for the estimation of query execution time in a parallel environment supporting pipeline. In: Computers and Artificial Intelligence (1996)
Urhan, T., Franklin, M.J.: Xjoin: A reactively-scheduled pipelining join operator. IEEE Data Engineering Bulletin (2000)
Urhan, T., Franklin, M.J.: Dynamic pipeline scheduling for improving interactive performance of online queries. In: Proceedings of the 27th Intl. Conference on Very Large Data Bases (2001)
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: Proceedings of the First International Conference on Parallel and Distributed Information Systems, Miami Beach, December 1991, pp. 68–77 (1991)
Wilschut, A.N., van Gils, S.A.: A model for pipelined query execution. In: Proceedings of the MASCOTS93 Syposium (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gimbel, M., Klein, M., Lockemann, P.C. (2004). Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In: Meo, R., Lanzi, P.L., Klemettinen, M. (eds) Database Support for Data Mining Applications. Lecture Notes in Computer Science(), vol 2682. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-44497-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-44497-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22479-2
Online ISBN: 978-3-540-44497-8
eBook Packages: Springer Book Archive