Abstract
We propose a methodology based on a structure called neighborhood graphs for indexing and retrieving multi-dimensional data. In accordance with the increase of the quantity of data, it gets more and more important to process multi-dimensional data. Processing of data includes various tasks, for instance, mining, classifying, clustering, to name a few. However, to enable the effective processing of such multi-dimensional data, it is often necessary to locate each data precisely in the multi-dimensional space where the data reside so that each data can be effectively retrieved for processing. This amounts to solving the point location problem (neighborhood search) for multi-dimensional space. In this paper, in order to utilize the structure of neighborhood graphs as an indexing structure for multi-dimensional data, we propose the following: i) a local insertion and deletion method, and ii) an incremental neighborhood graph construction method. The first method enables to cope with the problem incurred from the updating of the graph. The second method realizes fast neighborhood graph construction from scratch, through the recursive application of the first method. Several experiments are conducted to evaluate the proposed approach, and the results indicate the effectiveness of our approach.
Similar content being viewed by others
Notes
This structure is the foundation of the other multi-dimensional topological models.
Relations are calculated in the original space and the illustration is given in the bi dimensional space.
References
Anderson, E. (1935). The irises of the gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading: Addison Wesley.
Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The r*-tree: An efficient and robust access method for points and rectangles. In SIGMOD conference (pp. 322–331).
Bei, C.-D., Gray, R. M. (1985). An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Transactions on Communications, 33, 1132–1133.
Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.
Berchtold, S., Böhm, C., Keim, D. A., & Kriegel, H.-P. (1997). A cost model for nearest neighbor search in high-dimensional data space. In PODS ’97: Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (pp. 78–86). New York: ACM.
Böhm, C., Berchtold, S., & Keim, D. A. (2001). Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3), 322–373.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees (pp. 43–49). Belmont, California: Wadsworth International Group.
Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Friedman, J. H., Baskett, F., & Shustek, L. J. (1975). An algorithm for finding nearest neighbors. IEEE Transactions on Computers, 24(10), 1000–1006.
Gabriel, K. R., & Sokal, R. R. (1969). A new statistical approach to geographic variation analysis. Systematic Zoology, 18, 259–278.
Gaede, V., & Günther, O. (1998a). Multidimensional access methods. ACM Computing Surveys, 30(2), 170–231.
Gaede, V., & Günther, O. (1998b). Multidimensional access methods. ACM Computing Surveys, 30(2), 170–231.
Guan, L., & Kamel, M. (1992). Equal-average hyperplane partitioning method for vector quantization of image data. Pattern Recognition Letters, 13(10), 693–699.
Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In SIGMOD conference (pp. 47–57).
Hacid, H. (2007). Neighborhood graphs for semi-automatic annotation of large image databases. In MMM (1) (pp. 586–595).
Hacid, H., & Zighed, D. (2006). Content-based image retrieval in large image databases. In IEEE international conference on granular computing (GrC 2006) (pp. 498–501). Atlanta, USA.
Hacid, H., & Zighed, D. A. (2005). An effective method for locally neighborhood graphs updating. In DEXA (pp. 930–939).
Henrich, A., Six, H.-W., & Widmayer, P. (1989). The lsd-tree: Spatial access to multidimensional point and nonpoint objects. In Proceedings 15th international conference on very large data bases. Amsterdam.
Hettich, S., Blake, C., & Merz, C. (1998). Uci repository of machine learning databases. http://www.ics.uci.edu/ mlearn/MLRepository.html.
Jaromczyk, J., & Toussaint, G. (1992). Relative neighborhood graphs and their relatives. P-IEEE, 80, 1502–1517.
Katajainen, J. (1988). The region approach for computing relative neighborhood graphs in the lp metric. Computing, 40, 147–161.
Katayama, N., & Satoh, S. (1997). The sr-tree: An index structure for high-dimensional nearest neighbor queries. In SIGMOD conference (pp. 369–380).
Lee, C.-H., & Chen, L. H. (1994). Fast closest codeword search algorithm for vector quantisation. IEE Proceedings Vision, Image, and Signal Process, 141, 143–148.
Lomet, D. B. (1992). A review of recent work on multi-attribute access methods. SIGMOD Record, 21(3), 56–63.
Lomet, D. B., & Salzberg, B. (1989). A robust multi-attribute search structure. In ICDE (pp. 296–304).
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5-th Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Preparata, F., & Shamos, M. I. (1985). Computationnal geometry-introduction. New York: Springer.
Robinson, J. T. (1981). The k-d-b-tree: A search structure for large multidimensional dynamic indexes. In SIGMOD conference (pp. 10–18).
Samet, H. (1984). The quadtree and related hierarchical data structures. ACM Computing Surveys, 16(2), 187–260.
Scuturici, M., Clech, J., Scuturici, V.-M., & Zighed, D. A. (2005). Topological representation model for image database query. Journal of Experimental and Theoretical Artificial Intelligence, 17(1–2), 145–160.
Smith, W. D. (1989). Studies in computational geometry motivated by mesh generation. PhD thesis, Princeton University.
Somervuo, P., & Kohonen, T. (1999). Self-organizing maps and learning vector quantization for feature sequences. Neural Processing Letters, 10(2), 151–159.
Toussaint, G. T. (1980). The relative neighborhood graphs in a finite planar set. Pattern Recognition, 12, 261–268.
Toussaint, G. T. (1991). Some insolved problems on proximity graphs. In D. W. Dearholt, & F. Harrary (Eds.), Proc. of the first workshop on proximity graphs. Memoranda in computer and cognitive science MCCS-91-224. Computing research lab. Las Cruces: State University Las Cruces.
White, D. A., & Jain, R. (1996). Similarity indexing: Algorithms and performance. In Stor. and retr. for image and video DBs(SPIE) (pp. 62–73).
Wojna, A. (2005). Analogy-based reasoning in classifier construction, transactions on. In Rough sets IV, subseries of lecture notes in computer science, LNCS 3700 (pp. 277–374). New York: Springer.
Yao, Y. Y. (2006). Neighborhood systems and approximate retrieval. Information Science, 176(23), 3431–3452.
Zighed, D., & Hacid, H. (2006). Proximity graphs and separability of classes. In The 11th international conference on information processing and management of uncertainty in knowledge-based systems (IPMU) (pp. 1488–1495). Paris, France.
Acknowledgements
This work was partially supported by Région Rhône Alpes under grant EMERGENCE 2004 and the grant-in-aid for scientific research (No. 20500123) funded by MEXT, Japan. The authors would like to thank the anonymous reviewers for their valuable comments that improved the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been mainly done when Hakim Hacid was a PhD student at the University of Lyon, France.
Rights and permissions
About this article
Cite this article
Hacid, H., Yoshida, T. Neighborhood graphs for indexing and retrieving multi-dimensional data. J Intell Inf Syst 34, 93–111 (2010). https://doi.org/10.1007/s10844-009-0081-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-009-0081-z