Bulk Loading the MKL-Tree

Franco, Annalisa; Lumini, Alessandra; Maio, Dario

doi:10.1007/978-3-540-45227-0_13

Annalisa Franco⁷,
Alessandra Lumini⁷ &
Dario Maio⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

647 Accesses

Abstract

MKL-tree is a hierarchical, height-balanced structure for high dimensional data indexing. This structure is based on data representation in a lower dimensional space by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The dynamical version of MKL-tree presents two main drawbacks: first, the incremental loading of data points can determine very different structures and, as a consequence, different query performance, depending on the insertion order; second, the creation of the index can be very expensive, due to the high number of updating required. Since, in real applications, a large dataset is usually available at the tree creation time, we propose a new bulk loading technique for MKL-tree, based on a recursive clustering of data objects. The new algorithm searches for an optimal partitioning of data points, in order to calculate the most suitable KL-subspaces to represent the dataset.

Experimental results show that bulk loading can significantly improve the index performance with respect to the incremental insertion procedure, both in terms of effectiveness of similarity searches and of efficiency of the loading procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. of ACM SIGMOD 1999 (1999)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. of ACM SIGMOD 1999 (1999)
Google Scholar
Cappelli, R., Maio, D., Maltoni, D.: Multi-space KL for Pattern Recognition and Classification. IEEE Transactions on PAMI 23(9), 977–996 (2001)
Google Scholar
Cappelli, R., Lumini, A., Maio, D.: MKL-tree: a hierarchical data structure for indexing multidimensional data. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 914–924. Springer, Heidelberg (2002)
Chapter Google Scholar
Cappelli, R., Maio, D., Maltoni, D.: Similarity Search using Multi-space KL. In: Proc. of IWOSS 1999, Florence, Italy, pp. 155–160 (1999)
Google Scholar
Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proc. of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, pp. 202–209 (November 2000)
Google Scholar
Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transaction on PAMI 24(3), 381–396 (2002)
Google Scholar
Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proc. of ICPR 2002, Québec City (Canada), vol. 2, pp. 156–159 (August 2002)
Google Scholar
Fukunaga, K.: Statistical Pattern Recognition. Academic Press, San Diego (1990)
MATH Google Scholar
Gaede, V., Günther, O.: Multidimensional Access Methods. ACM Computing Surveys 30(2) (1998)
Google Scholar
Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, USA, pp. 47–57 (1984)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Google Scholar
Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: Proc. Of VLDB 1994, pp. 500–509 (1994)
Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Google Scholar
Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: ACM Conf. on Multimedia, Seattle, USA (November 1997)
Google Scholar
Samet, H.: The Design and Analysis of Spatial Data Structures. Addison Wesley, Reading (1990)
Google Scholar
Swets, D.L., Weng, J.: Hierarchical Discriminant Analysis for Image Retrieval. IEEE Transactions on PAMI 21(5), 386–401 (1999)
Google Scholar
Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proc. of VLDB 1997, Atene (Grecia), pp. 406–415 (1997)
Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Data Mining and Knowledge Discovery 1(2) (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

DEIS Università di Bologna, viale Risorgimento 2, 40136, Bologna, Italy
Annalisa Franco, Alessandra Lumini & Dario Maio

Authors

Annalisa Franco
View author publications
You can also search for this author in PubMed Google Scholar
Alessandra Lumini
View author publications
You can also search for this author in PubMed Google Scholar
Dario Maio
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Vladimír Mařík
Johannes Kepler University Linz, Altenberger Str. 69, 4040, Linz, Austria
Werner Retschitzegger
Faculty of Electrical Engineering, The Gerstner Laboratory, Czech Technical University in Prague, Technická 2, 166 27, Prague 6, Czech Republic
Olga Štěpánková

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Franco, A., Lumini, A., Maio, D. (2003). Bulk Loading the MKL-Tree. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-45227-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40806-2
Online ISBN: 978-3-540-45227-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics