Abstract
Large amounts of spatio-temporal data are continuously collected through the use of location devices or sensor technologies. One of the techniques usually used to obtain a first insight on data is clustering. The Shared Nearest Neighbour (SNN) is a clustering algorithm that finds clusters with different densities, shapes and sizes, and also identifies noise in data, making it a good candidate to deal with spatial data. However, its time complexity is, in the worst case, O(n 2), compromising its scalability. This paper presents the use of a metric data structure, the kd-Tree, to index spatial data and support the SNN in querying for the k-nearest neighbours, improving the time complexity in the average case of the algorithm, when dealing with low dimensional data, to at most O(n ×logn). The proposed algorithm, the kd-SNN, was evaluated in terms of performance, showing huge improvements over existing approaches, allowing the identification of the main traffic routes by completely clustering a maritime data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andrienko, G., Andrienko, N., Jankowski, P., Keim, D., Kraak, M.J., MacEachren, A., Wrobel, S.: Geovisual analytics for spatial decision support: Setting the research agenda. Int. J. Geogr. Inf. Sci. 21(8), 839–857 (2007), http://dx.doi.org/10.1080/13658810701349011
Antunes, A., Santos, M.Y., Moreira, A.: Fast snn-based clustering approach for large geospatial data sets. In: Huerta, J., Schade, S., Granell, C. (eds.) Proceedings of the 17th AGILE Conference on Geographic Information Science. Springer, Castellón (2014)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975), http://doi.acm.org/10.1145/361002.361007
Bhavsar, H.B., Jivani, A.G.: The shared nearest neighbor algorithm with enclosures (SNNAE). In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 4, pp. 436–442. IEEE (April 2009)
de Campos, L.M., Fernández-Luna, J.M., Huete, J.F.: Bayesian networks and information retrieval: an introduction to the special issue. Information Processing & Management 40(5), 727–733 (2004), http://www.sciencedirect.com/science/article/pii/S0306457304000159
Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001), http://doi.acm.org/10.1145/502807.502808
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience (October 2000)
Ertöz, L., Steinbach, M., Kumar, V.: Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of Second SIAM International Conference on Data Mining (2003), http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.125.9670
Ferret, O.: Finding document topics for improving topic segmentation (2007), http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.127.5609
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977), http://doi.acm.org/10.1145/355744.355745
Ge, Y., Xiong, H., Zhou, W., Li, S., Sahoo, R.: Multifocal learning for customer problem analysis. ACM Trans. Intell. 24, 24:1–24:22 (2011), http://doi.acm.org/10.1145/1961189.1961196
Jarvis, R.A., Patrick, E.A.: Clustering using a similarity measure based on shared near neighbors. IEEE Transactions on Computers C-22(11), 1025–1034 (1973)
Karypis, G., Han, E.H., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8), 68–75 (1999)
Marshall, D.: Nearest neighbour searching in high dimensional metric space. Tech. rep. (2006)
Moreira, A., Santos, M.Y., Carneiro, S.: Density-based clustering algorithms DBSCAN and SNN (July 2005), http://get.dsi.uminho.pt/local/download/SNN&DBSCAN.pdf
Moreira, G., Santos, M.Y., Moura-Pires, J.: Snn input parameters: How are they related? In: Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2013). IEEE Computer Society, Seoul (2013)
Santos, M.Y., Silva, J.P., Moura-Pires, J., Wachowicz, M.: Automated traffic route identification through the shared nearest neighbour algorithm. In: Gensel, J., Josselin, D., Vandenbroucke, D. (eds.) Bridging the Geographic Information Sciences, pp. 231–248. Springer, Heidelberg (2012)
Shencottah, K.N.: Finding Clusters in Spatial Data. MSc thesis, University of Cincinnati (2007), http://etd.ohiolink.edu/view.cgi?acc_num=ucin1179521337
Twitter: REST API resources (2012), https://dev.twitter.com/docs/api
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Faustino, B.F., Moura-Pires, J., Santos, M.Y., Moreira, G. (2014). kd-SNN: A Metric Data Structure Seconding the Clustering of Spatial Data. In: Murgante, B., et al. Computational Science and Its Applications – ICCSA 2014. ICCSA 2014. Lecture Notes in Computer Science, vol 8579. Springer, Cham. https://doi.org/10.1007/978-3-319-09144-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-09144-0_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09143-3
Online ISBN: 978-3-319-09144-0
eBook Packages: Computer ScienceComputer Science (R0)