A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data

Abdullah, Ahsan; Hussain, Amir

doi:10.1007/s12559-014-9281-0

A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data

Published: 18 June 2014

Volume 7, pages 161–182, (2015)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Ahsan Abdullah¹ &
Amir Hussain²

290 Accesses
8 Citations
1 Altmetric
Explore all metrics

Abstract

Cluster extraction is a vital part of data mining; however, humans and computers perform it very differently. Humans tend to estimate, perceive or visualize clusters cognitively, while digital computers either perform an exact extraction, follow a fuzzy approach, or organize the clusters in a hierarchical tree. In real data sets, the clusters are not only of different densities, but have embedded noise and are nested, thus making their extraction more challenging. In this paper, we propose a density-based technique for extracting connected rectangular clusters that may go undetected by traditional cluster extraction techniques. The proposed technique is inspired by the human cognition approach of appropriately scaling the level of detail, by going from low level of detail, i.e., one-way clustering to high level of detail, i.e., biclustering, in the dimension of interest, as in online analytical processing. A number of experiments were performed using simulated and real data sets and comparison of the proposed technique made with four popular cluster extraction techniques (DBSCAN, CLIQUE, k-medoids and k-means) with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Density-Based Clustering Using Automatic Parameter Detection

A Clustering Algorithm for Multi-density Datasets

An effective density based approach to detect complex data clusters using notion of neighborhood difference

Article 29 December 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Card SK, Mackinlay JD, Shneiderman B. Readings in information visualization—using vision to think. San Francisco: Morgan Kaufmann Publishers; 1999.
Ravindra K, Naik D. Multivariate data reduction and discrimination with SAS software. Cary: SAS Institute; 2000. p. 2.
Hartigan JA. Direct clustering of a data matrix. J Am Stat Assoc. 1972;67(337):123–9.
Article Google Scholar
Mirkin B. Mathematical classification and clustering. Norwell: Kluwer Academic Publishers; 1996.
Heather T. www.heatherturner.net/turnerchapter3.pdf; 2013.
Sheikholeslami G et al. WaveCluster: a multi-resolution clustering approach for very large spatial databases. In: Gupta A, Shmueli O, Widom J, editors. In: Proceedings of 24th international conference very large data bases. New York City, Morgan Kaufmann; 1998, p. 428–438.
Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis, vol. 344. New York: Wiley-Interscience; 2009.
Ester M, Kriegel HP, Xu X. Knowledge discovery in large spatial databases: Focusing techniques for efficient class identification. In: Egenhofer M, Herring J, editors. Advances in spatial databases. Berlin: Springer; 1995. p. 67–82.
Agrawal R, Gehrke J, Gunopulos D, Raghavan P. Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, vol 27; 1998. p. 94–105.
Ahsan A, Amir H. A new biclustering technique based on crossing minimization. Neurocomputing. 2006;69(16):1882–96.
Google Scholar
Pang-Ning T, Steinbach M, Kumar V. Introduction to data mining. Boston: Addison-Wesley Publishers; 2006.
Card SK, Mackinlay JD, Shneiderman B, editors. Readings in information visualization: using vision to think. San Francisco: Morgan Kaufmann; 1999.
Google Scholar
Chen K, Liu L. VISTA: validating and refining clusters via visualization. J Inf Visual. 2004;3(4):257–70.
Article Google Scholar
Kaski S, Sinkkonen J, Peltonen J. Data visualization and analysis with self-organizing maps in learning metrics. DaWaK 2001, LNCS 2114, (2001), p. 162–173.
Böhm C, Kailing K, Kröger P, Zimek A. Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD international conference on management of data; 2004, p. 455–466.
Appice A, Lanza A, Varlaro A. Spatial clustering of related structured objects for topographic map interpretation. In: Proceedings of the workshop on mining spatio-temporal data (MSTD) in conjunction with ECML/PKDD 2005, 9–21, Porto, Portugal.
Lu W, Han J et al. Discovery of general knowledge in large spatial databases, 2005, In: Proceedings of far east workshop on geographic information systems. Singapore; 1993. p. 275–289.
http://www.ret.gov.au/resources/upstream_petroleum.
Ng RT, Han J. Efficient and effective clustering methods for spatial data mining. In: Bocca JB, Jarke M, Zaniolo C, editors. In: Proceedings of the 20th international conference very large data bases (VLDB’94). Santiago de Chile: Morgan Kaufmann; 1994. p. 144–155.
Ankerst M et al. OPTICS: ordering points to identify the clustering structure. In Delis A, Faloutsos C, Ghandeharizadeh S, editors. In: Proceedings 1999 ACM SIGMOD international conference on management of data Philadelphia: ACM Press; 1999. p. 49–60.
Hinneburg A, Keim DA. An efficient approach to clustering in large multimedia databases with noise. In: Proceedings of international conference on knowledge discovery and data mining, KDD-98, New York: AAAI Press; 1998. p. 58–65.
Gibson D, Kleinberg JM, Raghavan P. Clustering categorical data: an approach based on dynamical systems. In: Gupta A, Shmueli O, Widom J, editors. In: Proceedings of 24th international conference on very large data bases. New York City: Morgan Kaufmann; 1998. p. 311–322.
Codd EF, Codd SB, Salley CT. Providing OLAP to user-analysts: an IT mandate. Technical report. E. F. Codd & Associates; 1993.
Rivest Sonia, Bedard Yvan, Marchand Pierre. Toward better support for spatial decision making: defining the characteristics of spatial on-line analytical processing (SOLAP). GEOMATICA-OTTAWA. 2001;55(4):539–55.
Google Scholar
Tucker LR. The extension of factor analysis to three-dimensional matrices. In: Frederiksen N, Gulliksen H, editors. Contributions to mathematical psychology. New York: Holt, Rinehart, and Winston; 1964. p. 109–27.
Google Scholar
Cheng Y, Church GM. Biclustering of expression data. In: Proceedings of the eighth international conference on intelligent systems for molecular biology, vol 8; 2000, p. 93–103.
Madeira SC, Oliveira AL. Biclustering algorithms for biological data analysis: a survey. Comput Biol Bioinform IEEE/ACM Trans. 2004;1(1):24–45.
Article CAS Google Scholar
Orlin J. Containment in graph theory: covering graphs with cliques. Nederl Akad Wetensch Indag Math. 1977;39:211–8.
Google Scholar
Yang D, Rundensteiner EA, Ward MO. Summarization and matching of complex patterns in streaming environment. In: Proceedings of the VLDB endowment, vol 5; 2011. p. 121–132. http://vldb.org/pvldb/vol5/p121_diyang_vldb2012.pdf.
Qiu BZ, Zhang L. An effective nonparametric grid-based clustering algorithm. J Inform Comput Sci. 2008;5(1):1-6.
Xiaoyun C, Yi C, Xiaoli Q, Min Y, Yanshan H. PGMCLU: a novel parallel grid-based clustering algorithm for multi-density datasets. In Web Society, SWS’09. 1st IEEE Symposium on, 2009, p. 166–171.
Akodjènou-Jeannin MI, Salamatian K, Gallinari P. Flexible grid-based clustering. In: Kok JN, Koronacki J, Lopez de Mantaras R, Matwin S, Mladenič D, Skowron A, editors. Knowledge discovery in databases: PKDD. Berlin: Springer; 2007. p. 350–357.
Schikuta E. Grid-clustering: an efficient hierarchical clustering method for very large data sets. In: Proceedings of the 13th international conference on pattern recognition, vol. 2; 1996. p. 101–105.
Chu SC, Roddick JF, Pan JS. An efficient k-medoids-based algorithm using previous medoid index, triangular inequality elimination criteria, and partial distance search. In Data warehousing and knowledge discovery, Berlin: Springer; 2002, p. 63–72.
Zhang Q, Couloigner I. A new and efficient k-medoid algorithm for spatial clustering. In: Computational science and its applications—ICCSA. Berlin: Springer; 2005, p. 181–189.
Achtert E, Böhm C, David J, Kröger P, Zimek A. Robust clustering in arbitrarily oriented subspaces. In: Proceedings of SDM. 2008.
Tan J, Zhang J, Li W. An improved clustering algorithm based on density distribution function. Comput Inf Sci. 2010;3(3):23.
Google Scholar
Erten C, Sözdinler M. Biclustering expression data based on expanding localized substructures. In: Bioinformatics and computational biology. Berlin: Springer; 2009. p. 224–235.
Zhou J, Lazarevic A, Hsu KW, Srivastava J, Fu Y, Wu Y. Unsupervised learning based distributed detection of global anomalies. Int J Inf Technol Decis Mak. 2010;9(06):935–57.
Article Google Scholar
Zhou A, Zhou S, Cao J, Fan Y, Hu Y. Approaches for scaling DBSCAN algorithm to large spatial databases. J Comput Sci Technol. 2000;15(6):509–26.
Article Google Scholar
Sander J, Ester M, Kriegel H-P, Xu X. Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min Knowl Disc. 1998;2(2):169–94.
Article Google Scholar
Qian Weining, Gong XueQing, Zhou AoYing. Clustering in very large databases based on distance and density. J Comput Sci Technol. 2003;18(1):67–76.
Article Google Scholar
Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd international conference on knowledge discovery and data mining. AAAI Press; 1996, p. 226–231.
Viswanath P, Pinkesh R. l-dbscan: a fast hybrid density based clustering method. In: Pattern recognition, 2006. ICPR 2006. 18th international conference, vol 1, 2006. p. 912–915.
Ilango MR, Mohan V. A survey of grid based clustering algorithms. Int J Eng Sci Technol. 2010;2(8):3441–6.
Google Scholar
Sander J, Qin X, Lu Z, Niu N, Kovarsky A. Automatic extraction of clusters from hierarchical clustering representations. In: Kyu-Young W, Jongwoo J, Kyuseok S, Jaideep S, editors. Advances in knowledge discovery and data mining. Berlin: Springer; 2003, p. 75–87.
Manning CD, Raghavan P, Schütze H. Introduction to information retrieval. New York: Cambridge University Press; 2008.
Lee DJ, Lane RM, Chang GH. Three-dimensional reconstruction for high-speed volume measurement. In: Proceedings of the international society for optical engineering, machine vision and three-dimensional imaging systems for inspection and metrology, vol 4189; 2001. p. 258–267.
Liu B. A fast density-based clustering algorithm for large databases. In: Proceedings of International Conference on Machine Learning and Cybernetics; 2006, p. 996–1000.
Dash M, Liu H, Xu X. ‘1 + 1 > 2’: merging distance and density based clustering. In: Proceedings of seventh international conference on database systems for advanced applications; 2001, p. 32–39.
Bach JR, Horowitz B. Indexing method for image search engine. U.S. Patent No. 6,084,595. 4 Jul. 2000.
Welton B, Samanas E, Miller BP. Mr. scan: extreme scale density-based clustering using a tree-based network of gpgpu nodes. In Proceedings of SC13: international conference for high performance computing, networking, storage and analysis. 2013;13:84.
Parsons L, Haque E, Liu H. Subspace clustering for high dimensional data: a review. ACM SIGKDD Explor Newsl. 2004;6(1):90–105.
Article Google Scholar
Walker RJ. An enumerative technique for a class of combinatorial problems. In: Bellman R, Hall M Jr, editors. Combinatorial analysis. In: Proceedings of symposium applied mathematics 10. Providence, Rhode Island: Ame. Math. Society; 1960. p. 91–94.
Buckner C. A property cluster theory of cognition. Philos Psychol (ahead-of-print); 2013;1–30.
Hartigan JA, Wong MA. Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc Ser C. 1979;28(1):100–8.
Google Scholar
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record. 25(2):103–114; 1996.

Download references

Acknowledgments

This project was supported by the NSTIP strategic technologies program in the Kingdom of Saudi Arabia—Project No. (12-AGR2709-3). The authors also, acknowledge with thanks Science and Technology Unit, King Abdulaziz University for technical support.

Author information

Authors and Affiliations

King Abdulaziz University, Jeddah, Saudi Arabia
Ahsan Abdullah
University of Stirling, Stirling, Scotland, UK
Amir Hussain

Authors

Ahsan Abdullah
View author publications
You can also search for this author inPubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ahsan Abdullah.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abdullah, A., Hussain, A. A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data. Cogn Comput 7, 161–182 (2015). https://doi.org/10.1007/s12559-014-9281-0

Download citation

Received: 23 August 2013
Accepted: 02 June 2014
Published: 18 June 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s12559-014-9281-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Cognitively Inspired Approach to Two-Way Cluster Extraction from One-Way Clustered Data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Density-Based Clustering Using Automatic Parameter Detection

A Clustering Algorithm for Multi-density Datasets

An effective density based approach to detect complex data clusters using notion of neighborhood difference

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now