Abstract
MKL-tree is a hierarchical, height-balanced structure for high dimensional data indexing. This structure is based on data representation in a lower dimensional space by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The dynamical version of MKL-tree presents two main drawbacks: first, the incremental loading of data points can determine very different structures and, as a consequence, different query performance, depending on the insertion order; second, the creation of the index can be very expensive, due to the high number of updating required. Since, in real applications, a large dataset is usually available at the tree creation time, we propose a new bulk loading technique for MKL-tree, based on a recursive clustering of data objects. The new algorithm searches for an optimal partitioning of data points, in order to calculate the most suitable KL-subspaces to represent the dataset.
Experimental results show that bulk loading can significantly improve the index performance with respect to the incremental insertion procedure, both in terms of effectiveness of similarity searches and of efficiency of the loading procedure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. of ACM SIGMOD 1999 (1999)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. of ACM SIGMOD 1999 (1999)
Cappelli, R., Maio, D., Maltoni, D.: Multi-space KL for Pattern Recognition and Classification. IEEE Transactions on PAMI 23(9), 977–996 (2001)
Cappelli, R., Lumini, A., Maio, D.: MKL-tree: a hierarchical data structure for indexing multidimensional data. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 914–924. Springer, Heidelberg (2002)
Cappelli, R., Maio, D., Maltoni, D.: Similarity Search using Multi-space KL. In: Proc. of IWOSS 1999, Florence, Italy, pp. 155–160 (1999)
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proc. of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, pp. 202–209 (November 2000)
Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transaction on PAMI 24(3), 381–396 (2002)
Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proc. of ICPR 2002, Québec City (Canada), vol. 2, pp. 156–159 (August 2002)
Fukunaga, K.: Statistical Pattern Recognition. Academic Press, San Diego (1990)
Gaede, V., Günther, O.: Multidimensional Access Methods. ACM Computing Surveys 30(2) (1998)
Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, USA, pp. 47–57 (1984)
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: Proc. Of VLDB 1994, pp. 500–509 (1994)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: ACM Conf. on Multimedia, Seattle, USA (November 1997)
Samet, H.: The Design and Analysis of Spatial Data Structures. Addison Wesley, Reading (1990)
Swets, D.L., Weng, J.: Hierarchical Discriminant Analysis for Image Retrieval. IEEE Transactions on PAMI 21(5), 386–401 (1999)
Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proc. of VLDB 1997, Atene (Grecia), pp. 406–415 (1997)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Data Mining and Knowledge Discovery 1(2) (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Franco, A., Lumini, A., Maio, D. (2003). Bulk Loading the MKL-Tree. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-45227-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive