Skip to main content

Interactive Exploration of Subspace Clusters on Multicore Processors

  • Chapter
  • First Online:
  • 298 Accesses

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 11310))

Abstract

The PreDeCon clustering algorithm finds arbitrarily shaped clusters in high-dimensional feature spaces, which remains an active research topic with many potential applications. However, it suffers from poor runtime performance, as well as a lack of user interaction. Our new method AnyPDC introduces a novel approach to cope with these problems by casting PreDeCon into an anytime algorithm. In this anytime scheme, it quickly produces an approximate result and iteratively refines it toward the result of PreDeCon at the end. AnyPDC not only significantly speeds up PreDeCon clustering but also allows users to interact with the algorithm during its execution. Moreover, by maintaining an underlying cluster structure consisting of so-called primitive clusters and by block processing of neighborhood queries, AnyPDC can be efficiently executed in parallel on shared memory architectures such as multi-core processors. Experiments on large real world datasets show that AnyPDC achieves high quality approximate results early on, leading to orders of magnitude speedup compared to PreDeCon. Moreover, while anytime techniques are usually slower than batch ones, the algorithmic solution in AnyPDC is actually faster than PreDeCon even if run to the end. AnyPDC also scales well with the number of threads on multi-cores CPUs.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://openmp.org/wp/.

  2. 2.

    http://www.cs.ucr.edu/~eamonn/time_series_data/.

  3. 3.

    http://www.bbci.de/competition/iii/.

  4. 4.

    http://www.cru.uea.ac.uk/data/.

  5. 5.

    Since Ideal ignores the cluster expansion process of PreDeCon, its runtime is obviously lower than that of PreDeCon itself.

References

  1. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Finding hierarchies of subspace clusters. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 446–453. Springer, Heidelberg (2006). https://doi.org/10.1007/11871637_42

    Chapter  Google Scholar 

  2. Achtert, E., Böhm, C., Kriegel, H.-P., Kröger, P., Müller-Gorman, I., Zimek, A.: Detection and visualization of subspace cluster hierarchies. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 152–163. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71703-4_15

    Chapter  Google Scholar 

  3. Aggarwal, C.C., Procopiuc, C.M., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD, pp. 61–72 (1999)

    Article  Google Scholar 

  4. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp. 94–105 (1998)

    Google Scholar 

  5. Assent, I., Kranen, P., Baldauf, C., Seidl, T.: AnyOut: anytime outlier detection on streaming data. In: Lee, S., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012. LNCS, vol. 7238, pp. 228–242. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29038-1_18

    Chapter  Google Scholar 

  6. Böhm, C., Kailing, K., Kriegel, H.P., Kröger, P.: Density connected clustering with local subspace preferences. In: ICDM, pp. 27–34 (2004)

    Google Scholar 

  7. Chapman, B., Jost, G., Pas, R.: Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). The MIT Press, Cambridge (2007)

    Google Scholar 

  8. Dang, M.T., Luong, A.V., Vu, T.-T., Nguyen, Q.V.H., Nguyen, T.T., Stantic, B.: An ensemble system with random projection and dynamic ensemble selection. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10751, pp. 576–586. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75417-8_54

    Chapter  Google Scholar 

  9. Deng, X., Dou, Y., Lv, T., Nguyen, Q.V.H.: A novel centrality cascading based edge parameter evaluation method for robust influence maximization. IEEE Access 5, 22119–22131 (2017)

    Article  Google Scholar 

  10. Duong, C.T., Nguyen, Q.V.H., Wang, S., Stantic, B.: Provenance-based rumor detection. In: Huang, Z., Xiao, X., Cao, X. (eds.) ADC 2017. LNCS, vol. 10538, pp. 125–137. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68155-9_10

    Chapter  Google Scholar 

  11. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

    Google Scholar 

  12. Greiner, J.: A comparison of parallel algorithms for connected components. In: SPAA, pp. 16–25 (1994)

    Google Scholar 

  13. Hinneburg, A., Aggarwal, C.C., Keim, D.A.: What is the nearest neighbor in high dimensional spaces? In: VLDB, pp. 506–515 (2000)

    Google Scholar 

  14. Hung, N.Q.V., Anh, D.T.: Combining sax and piecewise linear approximation to improve similarity search on financial time series. In: ISITC, pp. 58–62 (2007)

    Google Scholar 

  15. Hung, N.Q.V., Anh, D.T.: An improvement of PAA for dimensionality reduction in large time series databases. In: Ho, T.-B., Zhou, Z.-H. (eds.) PRICAI 2008. LNCS (LNAI), vol. 5351, pp. 698–707. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89197-0_64

    Chapter  Google Scholar 

  16. Hung, N.Q.V., Anh, D.T.: Using motif information to improve anytime time series classification. In: SoCPaR, pp. 1–6 (2013)

    Google Scholar 

  17. Hung, N.Q.V., et al.: Argument discovery via crowdsourcing. VLDB J. 26, 511–535 (2017)

    Google Scholar 

  18. Hung, N.Q.V., Jeung, H., Aberer, K.: An evaluation of model-based approaches to sensor data compression. TKDE 25, 2434–2447 (2013)

    Article  Google Scholar 

  19. Hung, N.Q.V., Luong, X.H., Miklós, Z., Quan, T.T., Aberer, K.: An MAS negotiation support tool for schema matching. In: AAMAS, pp. 1391–1392 (2013)

    Google Scholar 

  20. Hung, N.Q.V., Sathe, S., Duong, C.T., Aberer, K.: Towards enabling probabilistic databases for participatory sensing. In: CollaborateCom, pp. 114–123 (2014)

    Google Scholar 

  21. Quoc Viet Hung, N., Tam, N.T., Tran, L.N., Aberer, K.: An evaluation of aggregation techniques in crowdsourcing. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8181, pp. 1–15. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41154-0_1

    Chapter  Google Scholar 

  22. Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: On leveraging crowdsourcing techniques for schema matching networks. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) DASFAA 2013. LNCS, vol. 7826, pp. 139–154. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37450-0_10

    Chapter  Google Scholar 

  23. Hung, N.Q.V., Tam, N.T., Miklós, Z., Aberer, K.: Reconciling schema matching networks through crowdsourcing. EAI 1, e2 (2014)

    Google Scholar 

  24. Hung, N.Q.V., et al.: Answer validation for generic crowdsourcing tasks with minimal efforts. VLDB J. 26, 855–880 (2017)

    Article  Google Scholar 

  25. Hung, N.Q.V., Thang, D.C., Weidlich, M., Aberer, K.: Minimizing efforts in validating crowd answers. In: SIGMOD, pp. 999–1014 (2015)

    Google Scholar 

  26. Nguyen, Q.V.H., Do, S.T., Nguyen, T.T., Aberer, K.: Tag-based paper retrieval: minimizing user effort with diversity awareness. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 510–528. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18120-2_30

    Chapter  Google Scholar 

  27. Hung, N.Q.V., Viet, H.H., Tam, N.T., Weidlich, M., Yin, H., Zhou, X.: Computing crowd consensus with partial agreement. IEEE Trans. Knowl. Data Eng. 30(1), 1–14 (2018)

    Article  Google Scholar 

  28. Quoc Viet Nguyen, H., et al.: Minimizing human effort in reconciling match networks. In: Ng, W., Storey, V.C., Trujillo, J.C. (eds.) ER 2013. LNCS, vol. 8217, pp. 212–226. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41924-9_19

    Chapter  Google Scholar 

  29. Kleinberg, R.D.: Anytime algorithms for multi-armed bandit problems. In: SODA, pp. 928–936 (2006)

    Google Scholar 

  30. Kriegel, H.-P., Kröger, P., Ntoutsi, I., Zimek, A.: Density based subspace clustering over dynamic data. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 387–404. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22351-8_24

    Chapter  Google Scholar 

  31. Kriegel, H.P., Kröger, P., Zimek, A.: Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. TKDD 3(1), 1 (2009)

    Article  Google Scholar 

  32. Kristensen, J., Mai, S.T., Assent, I., Jacobsen, J., Vo, B., Le, A.: Interactive exploration of subspace clusters for high dimensional data. In: Benslimane, D., Damiani, E., Grosky, W.I., Hameurlain, A., Sheth, A., Wagner, R.R. (eds.) DEXA 2017. LNCS, vol. 10438, pp. 327–342. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64468-4_25

    Chapter  Google Scholar 

  33. Kröger, P., Kriegel, H.P., Kailing, K.: Density-connected subspace clustering for high-dimensional data. In: SDM, pp. 246–256 (2004)

    Google Scholar 

  34. Kumar, V.: Introduction to Parallel Computing, 2nd edn. Addison-Wesley Longman Publishing Co., Inc., Boston (2002)

    Google Scholar 

  35. Kywe, W.W., Fujiwara, D., Murakami, K.: Scheduling of image processing using anytime algorithm for real-time system. In: ICPR, vol. 3, pp. 1095–1098 (2006)

    Google Scholar 

  36. Mai, S.T., et al.: Scalable interactive dynamic graph clustering on multicore CPUs. TKDE

    Google Scholar 

  37. Mai, S.T., Amer-Yahia, S., Chouakria, A.D.: Scalable active temporal constrained clustering. In: EDBT, pp. 449–452 (2018)

    Google Scholar 

  38. Mai, S.T., Amer-Yahia, S., Chouakria, A.D., Nguyen, K.T., Nguyen, A.-D.: Scalable active constrained clustering for temporal data. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds.) DASFAA 2018. LNCS, vol. 10827, pp. 566–582. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91452-7_37

    Chapter  Google Scholar 

  39. Mai, S.T., Assent, I., Jacobsen, J., Dieu, M.S.: Anytime parallel density-based clustering. Data Min. Knowl. Discov. 32(4), 1121–1176 (2018)

    Article  MathSciNet  Google Scholar 

  40. Mai, S.T., Assent, I., Le, A.: Anytime OPTICS: an efficient approach for hierarchical density-based clustering. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 164–179. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_11

    Chapter  Google Scholar 

  41. Mai, S.T., Assent, I., Storgaard, M.: AnyDBC: an efficient anytime density-based clustering algorithm for very large complex datasets. In: SIGKDD, pp. 1025–1034 (2016)

    Google Scholar 

  42. Mai, S.T., Dieu, M.S., Assent, I., Jacobsen, J., Kristensen, J., Birk, M.: Scalable and interactive graph clustering algorithm on multicore CPUs. In: ICDE, pp. 349–360 (2017)

    Google Scholar 

  43. Mai, S.T., He, X., Feng, J., Böhm, C.: Efficient anytime density-based clustering. In: SDM, pp. 112–120 (2013)

    Google Scholar 

  44. Mai, S.T., He, X., Feng, J., Plant, C., Böhm, C.: Anytime density-based clustering of complex data. Knowl. Inf. Syst. 45(2), 319–355 (2015)

    Article  Google Scholar 

  45. Mai, S.T., He, X., Hubig, N., Plant, C., Böhm, C.: Active density-based clustering. In: ICDM, pp. 508–517 (2013)

    Google Scholar 

  46. Ntoutsi, I., Zimek, A., Palpanas, T., Kröger, P., Kriegel, H.: Density-based projected clustering over high dimensional data streams. In: SDM, pp. 987–998 (2012)

    Chapter  Google Scholar 

  47. Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: a review. SIGKDD Explor. 6(1), 90–105 (2004)

    Article  Google Scholar 

  48. Peixoto, D.A., Hung, N.Q.V.: Scalable and fast top-k most similar trajectories search using mapreduce in-memory. In: Cheema, M.A., Zhang, W., Chang, L. (eds.) ADC 2016. LNCS, vol. 9877, pp. 228–241. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46922-5_18

    Chapter  Google Scholar 

  49. Peixoto, D.A., Zhou, X., Hung, N.Q.V., He, D., Stantic, B.: A system for spatial-temporal trajectory data integration and representation. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds.) DASFAA 2018. LNCS, vol. 10828, pp. 807–812. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91458-9_53

    Chapter  Google Scholar 

  50. Settles, B.: Active learning literature survey. Computer Sciences Technical report 1648, University of Wisconsin-Madison (2009)

    Google Scholar 

  51. Shieh, J., Keogh, E.J.: Polishing the right apple: anytime classification also benefits data streams with constant arrival times. In: ICDM, pp. 461–470 (2010)

    Google Scholar 

  52. Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Discov. 26(2), 332–397 (2013)

    Article  MathSciNet  Google Scholar 

  53. Smyth, P., Wolpert, D.: Anytime exploratory data analysis for massive data sets. In: KDD, pp. 54–60 (1997)

    Google Scholar 

  54. Tam, N.T., Hung, N.Q.V., Weidlich, M., Aberer, K.: Result selection and summarization for web table search. In: ICDE, pp. 231–242 (2015)

    Google Scholar 

  55. Tam, N.T., Weidlich, M., Thang, D.C., Yin, H., Hung, N.Q.V.: Retaining data from streams of social platforms with minimal regret. In: IJCAI, pp. 2850–2856 (2017)

    Google Scholar 

  56. Thang, D.C., Tam, N.T., Hung, N.Q.V., Aberer, K.: An evaluation of diversification techniques. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9262, pp. 215–231. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22852-5_19

    Chapter  Google Scholar 

  57. Toan, N.T., Cong, P.T., Tam, N.T., Hung, N.Q.V., Stantic, B.: Diversifying group recommendation. IEEE Access 6, 17776–17786 (2018)

    Article  Google Scholar 

  58. Ueno, K., Xi, X., Keogh, E.J., Lee, D.J.: Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp. 623–632 (2006)

    Google Scholar 

  59. Wang, W., Yin, H., Huang, Z., Sun, X., Hung, N.Q.V.: Restricted Boltzmann machine based active learning for sparse recommendation. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds.) DASFAA 2018. LNCS, vol. 10827, pp. 100–115. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91452-7_7

    Chapter  Google Scholar 

  60. Yin, H., Chen, H., Sun, X., Wang, H., Wang, Y., Nguyen, Q.V.H.: SPTF: a scalable probabilistic tensor factorization model for semantic-aware behavior prediction. In: ICDM, pp. 585–594 (2017)

    Google Scholar 

  61. Yin, H., Chen, L., Wang, W., Du, X., Hung, N.Q.V., Zhou, X.: Mobi-SAGE: a sparse additive generative model for mobile app recommendation. In: ICDE, pp. 75–78 (2017)

    Google Scholar 

  62. Yin, H., et al.: Discovering interpretable geo-social communities for user behavior prediction. In: ICDE, pp. 942–953 (2016)

    Google Scholar 

  63. Yin, H., Zhou, X., Cui, B., Wang, H., Zheng, K., Hung, N.Q.V.: Adapting to user interest drift for POI recommendation. TKDE 28, 2566–2581 (2016)

    Article  Google Scholar 

  64. Zaki, M.J., Meira Jr., W.: Data Mining and Analysis: Fundamental Concepts and Algorithms. Cambridge University Press, New York (2014)

    Google Scholar 

  65. Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)

    Google Scholar 

  66. Zilberstein, S., Russell, S.J.: Anytime sensing planning and action: a practical model for robot control. In: IJCAI, pp. 1402–1407 (1993)

    Google Scholar 

Download references

Acknowledgments

We special thank to anonymous reviewers for their helpful comments. Part of this research was funded by a Villum postdoc fellowship, Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.05-2015.10 and the CDP Life Project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Son T. Mai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Pham, T.H. et al. (2018). Interactive Exploration of Subspace Clusters on Multicore Processors. In: Hameurlain, A., Wagner, R., Benslimane, D., Damiani, E., Grosky, W. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXIX. Lecture Notes in Computer Science(), vol 11310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58415-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58415-6_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58414-9

  • Online ISBN: 978-3-662-58415-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics