Skip to main content

Active Learning Strategies for Semi-Supervised DBSCAN

  • Conference paper
Book cover Advances in Artificial Intelligence (Canadian AI 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8436))

Included in the following conference series:

Abstract

The semi-supervised, density-based clustering algorithm SSDBSCAN extracts clusters of a given dataset from different density levels by using a small set of labeled objects. A critical assumption of SSDBSCAN is, however, that at least one labeled object for each natural cluster in the dataset is provided. This assumption may be unrealistic when only a very few labeled objects can be provided, for instance due to the cost associated with determining the class label of an object. In this paper, we introduce a novel active learning strategy to select “most representative” objects whose class label should be determined as input for SSDBSCAN. By incorporating a Laplacian Graph Regularizer into a Local Linear Reconstruction method, our proposed algorithm selects objects that can represent the whole data space well. Experiments on synthetic and real datasets show that using the proposed active learning strategy, SSDBSCAN is able to extract more meaningful clusters even when only very few labeled objects are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. ACM SIGKDD, pp. 226–231 (1996)

    Google Scholar 

  2. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering points to identify the clustering structure. In: Proc. ACM SIGMOD, pp. 49–60 (1999)

    Google Scholar 

  3. Lelis, L., Sander, J.: Semi-supervised density-based clustering. In: Proc. IEEE ICDM, pp. 842–847 (2009)

    Google Scholar 

  4. Settles, B.: Active learning literature survey. University of Wisconsin, Madison (2010)

    Google Scholar 

  5. Zhang, L., Chen, C., Bu, J., Cai, D., He, X., Huang, T.: Active learning based on locally linear reconstruction. IEEE TPAMI 33(10), 2026–2038 (2011)

    Article  Google Scholar 

  6. Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proc. ICML, pp. 577–584 (2001)

    Google Scholar 

  7. Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: Proc. ICML, pp. 19–26 (2002)

    Google Scholar 

  8. Basu, S., Davidson, I., Wagstaff, K.: Constrained clustering: Advances in algorithms, theory, and applications. CRC Press (2008)

    Google Scholar 

  9. Böhm, C., Plant, C.: Hissclu: a hierarchical density-based method for semi-supervised clustering. In: Proc. EDBT, pp. 440–451 (2008)

    Google Scholar 

  10. Huang, S.J., Jin, R., Zhou, Z.H.: Active learning by querying informative and representative examples. In: Proc. NIPS, pp. 892–900 (2010)

    Google Scholar 

  11. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proc. ACM SIGIR, pp. 3–12 (1994)

    Google Scholar 

  12. McCallum, A., Nigam, K.: et al.: Employing EM in pool-based active learning for text classification. In: Proc. ICML, pp. 350–358 (1998)

    Google Scholar 

  13. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. COLT Workshop, pp. 287–294 (1992)

    Google Scholar 

  14. Atkinson, A.C., Donev, A.N., Tobias, R.D.: Optimum experimental designs, with SAS, vol. 34. Oxford University Press, Oxford (2007)

    MATH  Google Scholar 

  15. Mallapragada, P.K., Jin, R., Jain, A.K.: Active query selection for semi-supervised clustering. In: Proc. ICPR, pp. 1–4 (2008)

    Google Scholar 

  16. Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proc. ICML, pp. 623–630 (2004)

    Google Scholar 

  17. Vu, V.V., Labroche, N., Bouchon-Meunier, B.: Active learning for semi-supervised k-means clustering. In: Proc. IEEE ICTAI, pp. 12–15 (2010)

    Google Scholar 

  18. Basu, S., Banerjee, A., Mooney, R.J.: Active semi-supervision for pairwise constrained clustering. In: Proc. SAIM SDM, pp. 333–344 (2004)

    Google Scholar 

  19. Xiong, S., Azimi, J., Fern, X.Z.: Active learning of constraints for semi-supervised clustering. IEEE TKDE 26(1), 43–54 (2014)

    Google Scholar 

  20. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)

    Article  Google Scholar 

  21. Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. JMLR 7, 2399–2434 (2006)

    MATH  MathSciNet  Google Scholar 

  22. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)

    Article  Google Scholar 

  23. Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam Library of Object Images. Int. Journal of Computer Vision 61(1), 103–112 (2005)

    Article  Google Scholar 

  24. Horta, D., Campello, R.J.G.B.: Automatic aspect discrimination in data clustering. Pattern Recognition 45(12), 4370–4388 (2012)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Li, J., Sander, J., Campello, R., Zimek, A. (2014). Active Learning Strategies for Semi-Supervised DBSCAN. In: Sokolova, M., van Beek, P. (eds) Advances in Artificial Intelligence. Canadian AI 2014. Lecture Notes in Computer Science(), vol 8436. Springer, Cham. https://doi.org/10.1007/978-3-319-06483-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06483-3_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06482-6

  • Online ISBN: 978-3-319-06483-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics