Skip to main content

Bulk Loading the MKL-Tree

  • Conference paper
Book cover Database and Expert Systems Applications (DEXA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2736))

Included in the following conference series:

  • 647 Accesses

Abstract

MKL-tree is a hierarchical, height-balanced structure for high dimensional data indexing. This structure is based on data representation in a lower dimensional space by means of the MKL transform, a multi-space generalization of the KL transform. A local dimensionality reduction is performed at each node of the tree, allowing more selective features to be extracted and thus increasing the discriminating power of the index. The dynamical version of MKL-tree presents two main drawbacks: first, the incremental loading of data points can determine very different structures and, as a consequence, different query performance, depending on the insertion order; second, the creation of the index can be very expensive, due to the high number of updating required. Since, in real applications, a large dataset is usually available at the tree creation time, we propose a new bulk loading technique for MKL-tree, based on a recursive clustering of data objects. The new algorithm searches for an optimal partitioning of data points, in order to calculate the most suitable KL-subspaces to represent the dataset.

Experimental results show that bulk loading can significantly improve the index performance with respect to the incremental insertion procedure, both in terms of effectiveness of similarity searches and of efficiency of the loading procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. In: Proc. of ACM SIGMOD 1999 (1999)

    Google Scholar 

  2. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proc. of ACM SIGMOD 1999 (1999)

    Google Scholar 

  3. Cappelli, R., Maio, D., Maltoni, D.: Multi-space KL for Pattern Recognition and Classification. IEEE Transactions on PAMI 23(9), 977–996 (2001)

    Google Scholar 

  4. Cappelli, R., Lumini, A., Maio, D.: MKL-tree: a hierarchical data structure for indexing multidimensional data. In: Hameurlain, A., Cicchetti, R., Traunmüller, R. (eds.) DEXA 2002. LNCS, vol. 2453, pp. 914–924. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  5. Cappelli, R., Maio, D., Maltoni, D.: Similarity Search using Multi-space KL. In: Proc. of IWOSS 1999, Florence, Italy, pp. 155–160 (1999)

    Google Scholar 

  6. Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A.: Vector approximation based indexing for non-uniform high dimensional data sets. In: Proc. of the 9th ACM Int. Conf. on Information and Knowledge Management, McLean, Virginia, pp. 202–209 (November 2000)

    Google Scholar 

  7. Figueiredo, M., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Transaction on PAMI 24(3), 381–396 (2002)

    Google Scholar 

  8. Franco, A., Lumini, A., Maio, D.: Eigenspace merging for model updating. In: Proc. of ICPR 2002, Québec City (Canada), vol. 2, pp. 156–159 (August 2002)

    Google Scholar 

  9. Fukunaga, K.: Statistical Pattern Recognition. Academic Press, San Diego (1990)

    MATH  Google Scholar 

  10. Gaede, V., Günther, O.: Multidimensional Access Methods. ACM Computing Surveys 30(2) (1998)

    Google Scholar 

  11. Guttman, A.: R-trees: A Dynamic Index Structure for Spatial Searching. In: Proc. ACM SIGMOD Int. Conf. on Management of Data, Boston, USA, pp. 47–57 (1984)

    Google Scholar 

  12. Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)

    Google Scholar 

  13. Kamel, I., Faloutsos, C.: Hilbert R-tree: An Improved R-tree using Fractals. In: Proc. Of VLDB 1994, pp. 500–509 (1994)

    Google Scholar 

  14. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. of the Fifth Berkeley Symposium on Mathematical statistics and probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  15. Ortega, M., Rui, Y., Chakrabarti, K., Mehrotra, S., Huang, T.S.: Supporting similarity queries in MARS. In: ACM Conf. on Multimedia, Seattle, USA (November 1997)

    Google Scholar 

  16. Samet, H.: The Design and Analysis of Spatial Data Structures. Addison Wesley, Reading (1990)

    Google Scholar 

  17. Swets, D.L., Weng, J.: Hierarchical Discriminant Analysis for Image Retrieval. IEEE Transactions on PAMI 21(5), 386–401 (1999)

    Google Scholar 

  18. Van den Bercken, J., Seeger, B., Widmayer, P.: A generic approach to bulk loading multidimensional index structures. In: Proc. of VLDB 1997, Atene (Grecia), pp. 406–415 (1997)

    Google Scholar 

  19. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and its Applications. Data Mining and Knowledge Discovery 1(2) (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Franco, A., Lumini, A., Maio, D. (2003). Bulk Loading the MKL-Tree. In: Mařík, V., Retschitzegger, W., Štěpánková, O. (eds) Database and Expert Systems Applications. DEXA 2003. Lecture Notes in Computer Science, vol 2736. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45227-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45227-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40806-2

  • Online ISBN: 978-3-540-45227-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics