Abstract
OPTICS is a fundamental data clustering technique that has been widely applied in many fields. However, it suffers from performance degradation when faced with large datasets and expensive distance measures because of its quadratic complexity in terms of both time and distance function calls. In this paper, we introduce a novel anytime approach to tackle the above problems. The general idea is to use a sequence of lower-bounding (LB) distances of the true distance measure to produce multiple approximations of the true reachability plot of OPTICS. The algorithm quickly produces an approximation result using the first LB distance. It then continuously refines the results with subsequent LB distances and the results from the previous computations. At any time, users can suspend and resume the algorithm to examine the results, enabling them to stop the algorithm whenever they are satisfied with the obtained results, thereby saving computational cost. Our proposed algorithms, called Any-OPTICS and Any-OPTICS-XS, are built upon this anytime scheme and can be applied for many complex datasets. Our experiments show that Any-OPTICS obtains very good clustering results at early stages of execution, leading to orders of magnitudes speed up. Even when run to the final distance measure, the cumulative runtime of Any-OPTICS is faster than OPTICS and its extensions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Xseedlist consists of a list of objects called the object list (OL). Each object is associated with a so-called predecessor list (PL). Each item of the PL contains a tuple (Id, Flag, Predist) where Id is an object id, Flag indicates whether the Predist is a lower-bound or true distance, and Predist(p, q) contains the reachability distance from q to p. PL is sorted in an ascending order of Predist. OL is sorted in an ascending order of Predist of the first object in the PL of each object.
- 2.
http://www.cs.ucr.edu/~eamonn/time_series_data/. Note that these datasets are re-interpolated to the length of \(2^{\lfloor \log (m) \rfloor + 3}\) (where m is the dimension of each object) to use with the Haar wavelet transform.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Due to space limitation, we only summarize the result here without showing the figure.
References
Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P.: Online hierarchical clustering in a data warehouse environment. In: ICDM, pp. 10–17 (2005)
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)
Brecheisen, S., Kriegel, H., Pfeifle, M.: Efficient density-based clustering of complex objects. In: ICDM, pp. 43–50 (2004)
Breunig, M.M., Kriegel, H.-P., Kröger, P., Sander, J., Bubbles, D.: Quality preserving performance boosting for hierarchical clustering. In: SIGMOD Conference, pp. 79–90 (2001)
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: Adensity-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Kobayashi, T., Iwamura, M., Matsuda, T., Kise, K.: An anytime algorithm for camera-based character recognition. In: ICDAR, pp. 1140–1144 (2013)
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: ICDM, pp. 249–258 (2009)
Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–256 (2004)
Lin, J., Vlachos, M., Keogh, E.J., Gunopulos, D.: Iterative incremental clustering of time series. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004)
Mai, S.T., Goebl, S., Plant, C.: A similarity model and segmentation algorithm for white matter fiber tracts. In: ICDM, pp. 1014–1019 (2012)
Mai, S.T., He, X., Feng, J., Böhm, C.: Efficient anytime density-based clustering. In: SDM, pp. 112–120 (2013)
Mai, S.T., He, X., Feng, J., Plant, C., Böhm, C.: Anytime density-based clustering of complex data. Knowl. Inf. Syst. 45(2), 319–355 (2015)
Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: ICDM, pp. 508–517 (2013)
Chan, K.-P., Fu, A.W.-C.: Efficient time series matching by wavelets. In: ICDE, pp. 126–133 (1999)
Sakurai, Y., Yoshikawa, M., Faloutsos, C.: FTW: fast similarity search under the time warping distance. In: PODS, pp. 326–337 (2005)
Sofman, B., Bagnell, J., Stentz, A.: Anytime online novelty detection for vehicle safeguarding. In: ICRA, pp. 1247–1254, May 2010
Ueno, K., Xi, X., Keogh, E.J., Lee, D.-J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp. 623–632 (2006)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080 (2009)
Zhu, Q., Batista, G.E.A.P.A., Rakthanmanon, T., Keogh, E.J.: A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets. In: SDM, pp. 999–1010 (2012)
Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)
Acknowledgement
We would like to thank Sean Chester for his helps during the preparation of the paper. We special thank anonymous reviewers for their very helpful and constructive comments. Part of this research was funded by a Villum postdoc fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Mai, S.T., Assent, I., Le, A. (2016). Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-32025-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32024-3
Online ISBN: 978-3-319-32025-0
eBook Packages: Computer ScienceComputer Science (R0)