Skip to main content
Log in

A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Cluster extraction is a vital part of data mining; however, humans and computers perform it very differently. Humans tend to estimate, perceive or visualize clusters cognitively, while digital computers either perform an exact extraction, follow a fuzzy approach, or organize the clusters in a hierarchical tree. In real data sets, the clusters are not only of different densities, but have embedded noise and are nested, thus making their extraction more challenging. In this paper, we propose a density-based technique for extracting connected rectangular clusters that may go undetected by traditional cluster extraction techniques. The proposed technique is inspired by the human cognition approach of appropriately scaling the level of detail, by going from low level of detail, i.e., one-way clustering to high level of detail, i.e., biclustering, in the dimension of interest, as in online analytical processing. A number of experiments were performed using simulated and real data sets and comparison of the proposed technique made with four popular cluster extraction techniques (DBSCAN, CLIQUE, k-medoids and k-means) with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Card SK, Mackinlay JD, Shneiderman B. Readings in information visualization—using vision to think. San Francisco: Morgan Kaufmann Publishers; 1999.

  2. Ravindra K, Naik D. Multivariate data reduction and discrimination with SAS software. Cary: SAS Institute; 2000. p. 2.

  3. Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67(337):123–9.

    Article  Google Scholar 

  4. Mirkin B. Mathematical classification and clustering. Norwell: Kluwer Academic Publishers; 1996.

  5. Heather T. www.heatherturner.net/turnerchapter3.pdf; 2013.

  6. Sheikholeslami G et al. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta A, Shmueli O, Widom J, editors. In: Proceedings of 24th international conference very large data bases. New York City, Morgan Kaufmann; 1998, p. 428–438.

  7. Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. New York: Wiley-Interscience; 2009.

  8. Ester M, Kriegel HP, Xu X. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In: Egenhofer M, Herring J, editors. Advances in spatial databases. Berlin: Springer; 1995. p. 67–82.

  9. Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, vol 27; 1998. p. 94–105.

  10. Ahsan A, Amir H. A new biclustering technique based on crossing minimization. Neurocomputing. 2006;69(16):1882–96.

    Google Scholar 

  11. Pang-Ning T, Steinbach M, Kumar V. Introduction to data mining. Boston: Addison-Wesley Publishers; 2006.

  12. Card SK, Mackinlay JD, Shneiderman B, editors. Readings in information visualization: using vision to think. San Francisco: Morgan Kaufmann; 1999.

    Google Scholar 

  13. Chen K, Liu L. VISTA: validating and refining clusters via visualization. J Inf Visual. 2004;3(4):257–70.

    Article  Google Scholar 

  14. Kaski S, Sinkkonen J, Peltonen J. Data visualization and analysis with self-organizing maps in learning metrics. DaWaK 2001, LNCS 2114, (2001), p. 162–173.

  15. Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data; 2004, p. 455–466.

  16. Appice A, Lanza A, Varlaro A. Spatial clustering of related structured objects for topographic map interpretation. In: Proceedings of the workshop on mining spatio-temporal data (MSTD) in conjunction with ECML/PKDD 2005, 9–21, Porto, Portugal.

  17. Lu W, Han J et al. Discovery of general knowledge in large spatial databases, 2005, In: Proceedings of far east workshop on geographic information systems. Singapore; 1993. p. 275–289.

  18. http://www.ret.gov.au/resources/upstream_petroleum.

  19. Ng RT, Han J. Efficient and effective clustering methods for spatial data mining. In: Bocca JB, Jarke M, Zaniolo C, editors. In: Proceedings of the 20th international conference very large data bases (VLDB’94). Santiago de Chile: Morgan Kaufmann; 1994. p. 144–155.

  20. Ankerst M et al. OPTICS: ordering points to identify the clustering structure. In Delis A, Faloutsos C, Ghandeharizadeh S, editors. In: Proceedings 1999 ACM SIGMOD international conference on management of data Philadelphia: ACM Press; 1999. p. 49–60.

  21. Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, KDD-98, New York: AAAI Press; 1998. p. 58–65.

  22. Gibson D, Kleinberg JM, Raghavan P. Clustering categorical data: an approach based on dynamical systems. In: Gupta A, Shmueli O, Widom J, editors. In: Proceedings of 24th international conference on very large data bases. New York City: Morgan Kaufmann; 1998. p. 311–322.

  23. Codd EF, Codd SB, Salley CT. Providing OLAP to user-analysts: an IT mandate. Technical report. E. F. Codd & Associates; 1993.

  24. Rivest Sonia, Bedard Yvan, Marchand Pierre. Toward better support for spatial decision making: defining the characteristics of spatial on-line analytical processing (SOLAP). GEOMATICA-OTTAWA. 2001;55(4):539–55.

    Google Scholar 

  25. Tucker LR. The extension of factor analysis to three-dimensional matrices. In: Frederiksen N, Gulliksen H, editors. Contributions to mathematical psychology. New York: Holt, Rinehart, and Winston; 1964. p. 109–27.

    Google Scholar 

  26. Cheng Y, Church GM. Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, vol 8; 2000, p. 93–103.

  27. Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. Comput Biol Bioinform IEEE/ACM Trans. 2004;1(1):24–45.

    Article  CAS  Google Scholar 

  28. Orlin J. Containment in graph theory: covering graphs with cliques. Nederl Akad Wetensch Indag Math. 1977;39:211–8.

    Google Scholar 

  29. Yang D, Rundensteiner EA, Ward MO. Summarization and matching of complex patterns in streaming environment. In: Proceedings of the VLDB endowment, vol 5; 2011. p. 121–132. http://vldb.org/pvldb/vol5/p121_diyang_vldb2012.pdf.

  30. Qiu BZ, Zhang L. An effective nonparametric grid-based clustering algorithm. J Inform Comput Sci. 2008;5(1):1-6.

  31. Xiaoyun C, Yi C, Xiaoli Q, Min Y, Yanshan H. PGMCLU: a novel parallel grid-based clustering algorithm for multi-density datasets. In Web Society, SWS’09. 1st IEEE Symposium on, 2009, p. 166–171.

  32. Akodjènou-Jeannin MI, Salamatian K, Gallinari P. Flexible grid-based clustering. In: Kok JN, Koronacki J, Lopez de Mantaras R, Matwin S, Mladenič D, Skowron A, editors. Knowledge discovery in databases: PKDD. Berlin: Springer; 2007. p. 350–357.

  33. Schikuta E. Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of the 13th international conference on pattern recognition, vol. 2; 1996. p. 101–105.

  34. Chu SC, Roddick JF, Pan JS. An efficient k-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria, and partial distance search. In Data warehousing and knowledge discovery, Berlin: Springer; 2002, p. 63–72.

  35. Zhang Q, Couloigner I. A new and efficient k-medoid algorithm for spatial clustering. In: Computational science and its applications—ICCSA. Berlin: Springer; 2005, p. 181–189.

  36. Achtert E, Böhm C, David J, Kröger P, Zimek A. Robust clustering in arbitrarily oriented subspaces. In: Proceedings of SDM. 2008.

  37. Tan J, Zhang J, Li W. An improved clustering algorithm based on density distribution function. Comput Inf Sci. 2010;3(3):23.

    Google Scholar 

  38. Erten C, Sözdinler M. Biclustering expression data based on expanding localized substructures. In: Bioinformatics and computational biology. Berlin: Springer; 2009. p. 224–235.

  39. Zhou J, Lazarevic A, Hsu KW, Srivastava J, Fu Y, Wu Y. Unsupervised learning based distributed detection of global anomalies. Int J Inf Technol Decis Mak. 2010;9(06):935–57.

    Article  Google Scholar 

  40. Zhou A, Zhou S, Cao J, Fan Y, Hu Y. Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol. 2000;15(6):509–26.

    Article  Google Scholar 

  41. Sander J, Ester M, Kriegel H-P, Xu X. Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Disc. 1998;2(2):169–94.

    Article  Google Scholar 

  42. Qian Weining, Gong XueQing, Zhou AoYing. Clustering in very large databases based on distance and density. J Comput Sci Technol. 2003;18(1):67–76.

    Article  Google Scholar 

  43. Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press; 1996, p. 226–231.

  44. Viswanath P, Pinkesh R. l-dbscan: a fast hybrid density based clustering method. In: Pattern recognition, 2006. ICPR 2006. 18th international conference, vol 1, 2006. p. 912–915.

  45. Ilango MR, Mohan V. A survey of grid based clustering algorithms. Int J Eng Sci Technol. 2010;2(8):3441–6.

    Google Scholar 

  46. Sander J, Qin X, Lu Z, Niu N, Kovarsky A. Automatic extraction of clusters from hierarchical clustering representations. In: Kyu-Young W, Jongwoo J, Kyuseok S, Jaideep S, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2003, p. 75–87.

  47. Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. New York: Cambridge University Press; 2008.

  48. Lee DJ, Lane RM, Chang GH. Three-dimensional reconstruction for high-speed volume measurement. In: Proceedings of the international society for optical engineering, machine vision and three-dimensional imaging systems for inspection and metrology, vol 4189; 2001. p. 258–267.

  49. Liu B. A fast density-based clustering algorithm for large databases. In: Proceedings of International Conference on Machine Learning and Cybernetics; 2006, p. 996–1000.

  50. Dash M, Liu H, Xu X. ‘1 + 1 > 2’: merging distance and density based clustering. In: Proceedings of seventh international conference on database systems for advanced applications; 2001, p. 32–39.

  51. Bach JR, Horowitz B. Indexing method for image search engine. U.S. Patent No. 6,084,595. 4 Jul. 2000.

  52. Welton B, Samanas E, Miller BP. Mr. scan: extreme scale density-based clustering using a tree-based network of gpgpu nodes. In Proceedings of SC13: international conference for high performance computing, networking, storage and analysis. 2013;13:84.

  53. Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl. 2004;6(1):90–105.

    Article  Google Scholar 

  54. Walker RJ. An enumerative technique for a class of combinatorial problems. In: Bellman R, Hall M Jr, editors. Combinatorial analysis. In: Proceedings of symposium applied mathematics 10. Providence, Rhode Island: Ame. Math. Society; 1960. p. 91–94.

  55. Buckner C. A property cluster theory of cognition. Philos Psychol (ahead-of-print); 2013;1–30.

  56. Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C. 1979;28(1):100–8.

    Google Scholar 

  57. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record. 25(2):103–114; 1996.

Download references

Acknowledgments

This project was supported by the NSTIP strategic technologies program in the Kingdom of Saudi Arabia—Project No. (12-AGR2709-3). The authors also, acknowledge with thanks Science and Technology Unit, King Abdulaziz University for technical support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahsan Abdullah.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdullah, A., Hussain, A. A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data. Cogn Comput 7, 161–182 (2015). https://doi.org/10.1007/s12559-014-9281-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-014-9281-0

Keywords

Navigation