Skip to main content

Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9642))

Included in the following conference series:

Abstract

OPTICS is a fundamental data clustering technique that has been widely applied in many fields. However, it suffers from performance degradation when faced with large datasets and expensive distance measures because of its quadratic complexity in terms of both time and distance function calls. In this paper, we introduce a novel anytime approach to tackle the above problems. The general idea is to use a sequence of lower-bounding (LB) distances of the true distance measure to produce multiple approximations of the true reachability plot of OPTICS. The algorithm quickly produces an approximation result using the first LB distance. It then continuously refines the results with subsequent LB distances and the results from the previous computations. At any time, users can suspend and resume the algorithm to examine the results, enabling them to stop the algorithm whenever they are satisfied with the obtained results, thereby saving computational cost. Our proposed algorithms, called Any-OPTICS and Any-OPTICS-XS, are built upon this anytime scheme and can be applied for many complex datasets. Our experiments show that Any-OPTICS obtains very good clustering results at early stages of execution, leading to orders of magnitudes speed up. Even when run to the final distance measure, the cumulative runtime of Any-OPTICS is faster than OPTICS and its extensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Xseedlist consists of a list of objects called the object list (OL). Each object is associated with a so-called predecessor list (PL). Each item of the PL contains a tuple (Id, Flag, Predist) where Id is an object id, Flag indicates whether the Predist is a lower-bound or true distance, and Predist(p, q) contains the reachability distance from q to p. PL is sorted in an ascending order of Predist. OL is sorted in an ascending order of Predist of the first object in the PL of each object.

  2. 2.

    http://www.cs.ucr.edu/~eamonn/time_series_data/. Note that these datasets are re-interpolated to the length of \(2^{\lfloor \log (m) \rfloor + 3}\) (where m is the dimension of each object) to use with the Haar wavelet transform.

  3. 3.

    http://archive.ics.uci.edu/ml/.

  4. 4.

    http://cvrr.ucsd.edu/bmorris/datasets/dataset_trajectory_clustering.html.

  5. 5.

    http://www.fhwa.dot.gov/publications/research/operations/07029/index.cfm.

  6. 6.

    http://www.cs.ucr.edu/~eamonn/time_series_data/.

  7. 7.

    http://www1.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  8. 8.

    Due to space limitation, we only summarize the result here without showing the figure.

References

  1. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P.: Online hierarchical clustering in a data warehouse environment. In: ICDM, pp. 10–17 (2005)

    Google Scholar 

  2. Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp. 49–60 (1999)

    Google Scholar 

  3. Brecheisen, S., Kriegel, H., Pfeifle, M.: Efficient density-based clustering of complex objects. In: ICDM, pp. 43–50 (2004)

    Google Scholar 

  4. Breunig, M.M., Kriegel, H.-P., Kröger, P., Sander, J., Bubbles, D.: Quality preserving performance boosting for hierarchical clustering. In: SIGMOD Conference, pp. 79–90 (2001)

    Google Scholar 

  5. Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.J.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2), 1542–1552 (2008)

    Google Scholar 

  6. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: Adensity-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  7. Kobayashi, T., Iwamura, M., Matsuda, T., Kise, K.: An anytime algorithm for camera-based character recognition. In: ICDAR, pp. 1140–1144 (2013)

    Google Scholar 

  8. Kranen, P., Assent, I., Baldauf, C., Seidl, T.: Self-adaptive anytime stream clustering. In: ICDM, pp. 249–258 (2009)

    Google Scholar 

  9. Kröger, P., Kriegel, H.-P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–256 (2004)

    Google Scholar 

  10. Lin, J., Vlachos, M., Keogh, E.J., Gunopulos, D.: Iterative incremental clustering of time series. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 106–122. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Mai, S.T., Goebl, S., Plant, C.: A similarity model and segmentation algorithm for white matter fiber tracts. In: ICDM, pp. 1014–1019 (2012)

    Google Scholar 

  12. Mai, S.T., He, X., Feng, J., Böhm, C.: Efficient anytime density-based clustering. In: SDM, pp. 112–120 (2013)

    Google Scholar 

  13. Mai, S.T., He, X., Feng, J., Plant, C., Böhm, C.: Anytime density-based clustering of complex data. Knowl. Inf. Syst. 45(2), 319–355 (2015)

    Article  Google Scholar 

  14. Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: ICDM, pp. 508–517 (2013)

    Google Scholar 

  15. Chan, K.-P., Fu, A.W.-C.: Efficient time series matching by wavelets. In: ICDE, pp. 126–133 (1999)

    Google Scholar 

  16. Sakurai, Y., Yoshikawa, M., Faloutsos, C.: FTW: fast similarity search under the time warping distance. In: PODS, pp. 326–337 (2005)

    Google Scholar 

  17. Sofman, B., Bagnell, J., Stentz, A.: Anytime online novelty detection for vehicle safeguarding. In: ICRA, pp. 1247–1254, May 2010

    Google Scholar 

  18. Ueno, K., Xi, X., Keogh, E.J., Lee, D.-J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp. 623–632 (2006)

    Google Scholar 

  19. Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp. 1073–1080 (2009)

    Google Scholar 

  20. Zhu, Q., Batista, G.E.A.P.A., Rakthanmanon, T., Keogh, E.J.: A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets. In: SDM, pp. 999–1010 (2012)

    Google Scholar 

  21. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)

    Google Scholar 

Download references

Acknowledgement

We would like to thank Sean Chester for his helps during the preparation of the paper. We special thank anonymous reviewers for their very helpful and constructive comments. Part of this research was funded by a Villum postdoc fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son T. Mai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Mai, S.T., Assent, I., Le, A. (2016). Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, X., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9642. Springer, Cham. https://doi.org/10.1007/978-3-319-32025-0_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32025-0_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32024-3

  • Online ISBN: 978-3-319-32025-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics