The MST-kNN with Paracliques

Arefin, Ahmed Shamsul; Riveros, Carlos; Berretta, Regina; Moscato, Pablo

doi:10.1007/978-3-319-14803-8_29

The MST-kNN with Paracliques

Ahmed Shamsul Arefin²²,
Carlos Riveros²²,
Regina Berretta²² &
…
Pablo Moscato²²

Conference paper

1673 Accesses
1 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8955))

Abstract

In this work, we incorporate new edges from a paraclique-identification approach to the output of the MST-kNN graph partitioning method. We present a statistical analysis of the results on a dataset originated from a computational linguistic study of 84 Indo-European languages. We also present results from a computational stylistic study of 168 plays of the Shakespearean era. For the latter, results of the Kruskal-Wallis test 1 (observed vs. all permutations) showed a p-value of a 1.62E-11 and a Wilcoxon test a p-value of 8.1E-12. Overall, our results clearly show in both cases that the modified approach provides statistically more significant results than the use of the MST-kNN alone, thus providing a highly-scalable alternative and statistically sound approach for data clustering.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anders, K.-H.: A hierarchical graph-clustering approach to find groups of objects. In: Proceedings 5th Workshop on Progress in Automated Map Generalization, pp. 1–8 (2003)
Google Scholar
Arefin, A.S., Inostroza-Ponta, M., Mathieson, L., Berretta, R., Moscato, P.: Clustering nodes in large-scale biological networks using external memory algorithms. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 375–386. Springer, Heidelberg (2011)
Chapter Google Scholar
Arefin, A.S., Mathieson, L., Johnstone, D., Berretta, R., Moscato, P.: Unveiling clusters of RNA transcript pairs associated with markers of Alzheimers disease progression. PloS One 7(9), e45535 (2012)
Google Scholar
Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU. In: 2012 7th International Conference on Computer Science & Education (ICCSE), pp. 585–590. IEEE (2012)
Google Scholar
Arefin, A.S., Vimieiro, R., Riveros, C., Craig, H., Moscato, P.: An Information Theoretic clustering approach for unveiling authorship affnities in Shakespearean era plays and poems. PLoS ONE 9(10), e111445 (2014)
Google Scholar
Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: kNN-borůvka-GPU: A fast and scalable MST construction from kNN graphs on GPU. In: Murgante, B., Gervasi, O., Misra, S., Nedjah, N., Rocha, A.M.A.C., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2012, Part I. LNCS, vol. 7333, pp. 71–86. Springer, Heidelberg (2012)
Chapter Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Grouping multidimensional data, pp. 25–71. Springer (2006)
Google Scholar
Bryant, D., Filimon, F., Gray, R.D.: Untangling our past: languages, trees, splits and networks. In: The Evolution of Cultural Diversity: Pylogenetic Approaches, pp. 67–84 (2005)
Google Scholar
Capp, A., Inostroza-Ponta, M., Bill, D., Moscato, P., Lai, C., Christie, D., Lamb, D., Turner, S., Joseph, D., Matthews, J.: Is there more than one proctitis syndrome? a revisitation using data from the TROG 96.01 trial. Radiotherapy and Oncology 90(3), 400–407 (2009)
Article Google Scholar
Chesler, E., Langston, M.: Combinatorial genetic regulatory network analysis tools for high throughput transcriptomic data. In: Eskin, E., Ideker, T., Raphael, B., Workman, C. (eds.) RECOMB 2005. LNCS (LNBI), vol. 4023, pp. 150–165. Springer, Heidelberg (2007)
Chapter Google Scholar
Craig, H., Whipp, R.: Old spellings, new methods: automated procedures for indeterminate linguistic data. Literary and Linguistic Computing 25(1), 37–52 (2010)
Article Google Scholar
Csardi, G., Nepusz, T.: The igraph software package for complex network research. Inter Journal, Complex Systems 1695(5) (2006)
Google Scholar
Dyen, I., Kruskal, J.B., Black, P.: An Indoeuropean classification: a lexicostatistical experiment. Transactions of the American Philosophical Society, iii–132 (1992)
Google Scholar
Feige, U., Goldwasser, S., Lovsz, L., Safra, S., Szegedy, M.: Approximating clique is almost NP-complete. In: 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 2–12. IEEE Comput. Soc. Press (1991)
Google Scholar
Gonzlez-Barrios, J.M., Quiroz, A.J.: A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree. Statistics & Probability Letters 62(1), 23–34 (2003)
Article MathSciNet Google Scholar
Hollander, M., Wolfe, D.A., Chicken, E.: Nonparametric statistical methods, vol. 751. John Wiley & Sons (2013)
Google Scholar
Huh, M.H., Jhun, M.: Random permutation testing in multiple linear regression. Communications in Statistics-Theory and Methods 30(10), 2023–2032 (2001)
Article MATH MathSciNet Google Scholar
Inostroza-Ponta, M.: An integrated and scalable approach based on combinatorial optimization techniques for the analysis of microarray data. NOVA | The University of Newcastle’s Digital Repository (2008)
Google Scholar
Inostroza-Ponta, M., Berretta, R., Mendes, A., Moscato, P.: An automatic graph layout procedure to visualize correlated data. In: Bramer, M. (ed.) Artificial Intelligence in Theory and Practice. IFIP, vol. 217, pp. 179–188. Springer, Heidelberg (2006)
Google Scholar
Inostroza-Ponta, M., Berretta, R., Moscato, P.: QAPgrid: A two level QAP-based approach for large-scale data analysis and visualization. PloS One 6(1), e14468 (2011)
Google Scholar
Inostroza-Ponta, M., Mendes, A., Berretta, R., Moscato, P.: An integrated QAP-based approach to visualize patterns of gene expression similarity. In: Randall, M., Abbass, H.A., Wiles, J. (eds.) ACAL 2007. LNCS (LNAI), vol. 4828, pp. 156–167. Springer, Heidelberg (2007)
Chapter Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys (CSUR) 31(3), 264–323 (1999)
Article Google Scholar
Knuth, D.E.: The Art of Computer Programming. Seminumerical algorithms, vol. 2, pp. 229–279. Addison-Wesley, Reading (1969)
MATH Google Scholar
Mahata, P., Costa, W., Cotta, C., Moscato, P.: Hierarchical clustering, languages and cancer. In: Rothlauf, F. (ed.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 67–78. Springer, Heidelberg (2006)
Chapter Google Scholar
Marsden, J., Budden, D., Craig, H., Moscato, P.: Language individuation and marker words: Shakespeare and his Maxwell’s demon. PloS One 8(6), e66813 (2013)
Google Scholar
Menndez, M., Pardo, J., Pardo, L., Pardo, M.: The Jensen-Shannon divergence. Journal of the Franklin Institute 334(2), 307–318 (1997)
Article MathSciNet Google Scholar
Ngomo, A.-C.N.: Clique-based clustering. Evaluation 1, 10 (2006)
Google Scholar
Rosso, O.A., Craig, H., Moscato, P.: Shakespeare and other english renaissance authors as characterized by information theory complexity quantifiers. Physica A: Statistical Mechanics and its Applications 388(6), 916–926 (2009)
Article Google Scholar
Schmidt, M.C., Samatova, N.F., Thomas, K., Park, B.-H.: A scalable, parallel algorithm for maximal clique enumeration. J. Parallel Distrib. Comput. 69(4), 417–428 (2009)
Article Google Scholar
Sharan, R., Maron-Katz, A., Shamir, R.: CLICK and EXPANDER: A system for clustering and visualizing gene expression data. Bioinformatics 19(14), 1787–1799 (2003)
Article Google Scholar
Zemel, R.S., Carreira-Perpin, M.A.: Proximity graphs for clustering and manifold learning, pp. 225–232. MIT Press (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

The Priority Research Centre for Bioinformatics Biomarker Discovery and Information-based Medicine, Information-based Medicine Program, Hunter Medical Research Institute, School of Electrical Engineering and Computer Science, The University of Newcastle, Australia
Ahmed Shamsul Arefin, Carlos Riveros, Regina Berretta & Pablo Moscato

Authors

Ahmed Shamsul Arefin
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Riveros
View author publications
You can also search for this author in PubMed Google Scholar
Regina Berretta
View author publications
You can also search for this author in PubMed Google Scholar
Pablo Moscato
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical Engineering and Computer Science, The University of Newcastle, 2308, Callaghan, NSW, Australia
Stephan K. Chalup
School of Computer Science and Engineering, The University of New South Wales, 2052, Sydney, NSW, Australia
Alan D. Blair
Faculty of Business, Bond University, 4226, Robina, QLD, Australia
Marcus Randall

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Arefin, A.S., Riveros, C., Berretta, R., Moscato, P. (2015). The MST-kNN with Paracliques. In: Chalup, S.K., Blair, A.D., Randall, M. (eds) Artificial Life and Computational Intelligence. ACALCI 2015. Lecture Notes in Computer Science(), vol 8955. Springer, Cham. https://doi.org/10.1007/978-3-319-14803-8_29

Download citation

DOI: https://doi.org/10.1007/978-3-319-14803-8_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14802-1
Online ISBN: 978-3-319-14803-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics