Abstract
Data clustering has been considered as one of the most important techniques for unsupervised learning in diverse applications. Gene clustering is to find out groups of genes similarly expressed in large size of microarray data. Meanwhile, recent development of microarray technology generates a very large number of microarray data with low cost and handles more than 10,000 genes simultaneously in one chip. Thus, high performance computing of gene clustering has become increasingly important in microarray data analysis. In this paper, we propose a scalable parallel gene clustering method using the MapReudce programming model. The proposed method utilizes the k-means algorithm for identifying similar groups of genes. Experiment results show that the proposed method can offer good scalability with data size increases, and different numbers of nodes, and it can also provide effective clustering results against real microarray data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Castekkanos-Garzon, J.A., Diaz, F.: An evolutionary computationary model applied cluster analysis of DNA microarray data. Expert Syst. Appl. 40, 2575–2951 (2013)
Yi, G., Sze, S.-H., Thon, M.R.: Indenifying clustering functionally related genes in genomes. Bioinformatics 23(9), 1053–1060 (2007)
Zhihua, D., Wang, Y., Ji, Z.: PK-means: A new algorithm for gene clustering. Comput. Biol. Chem. 32, 243–247 (2008)
Cordeiro, R.L.F., Traina, C. Jr., Traina, A.J.M., Lopez, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: International Conference on Knowledge and Data Discovery (2011)
Hartigan, J.A., Wong, M.A.: A K-means clustering algorithm. Appl. Stat. 28, 126–130 (1979)
Lam, Y.K., Tsang, P.W.M.: eXploratory K-means: a new simple and efficient algorithm for gene clustering. Appl. Soft Comput. 12, 1149–1157 (2012)
Greene, W.A.: Unsupervised hierarchical clustering via genetic algorithm. In: Congress on Evolutionary Computation, pp. 998–1005 (2003)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov. Accessed 03 Feb 2014
The Saccharomyces Genome Database(SGD). http://www.yeastgenome.org. Accessed 03 Feb 2014
The Gene Ontology project(GO). http://www.geneontology.org/. Accessed 02 March 2014
The XAMPP open source package. http://www.apachefriends.org/en/xampp.html. Accessed 02 March 2014
Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P.: Gene expression correlates of clinical prostate cancer behavior. Cancer cell 1(2), 203–209 (2002)
Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., et al.: Functional discovery via a compendium of expression profiles. Cell 102(1), 109–126 (2000). Elsevier
Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D., Futcher, B.: Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell, Am. Soc. Cell Biol. 9(12), 3273–3297 (1998)
Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on mapReduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) CloudCom 2009. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Sun, Z.: A parallel clustering method study based on mapReduce. In: 1st International Workshop on Cloud Computing and Information Security, Atlantis Press (2013)
Acknowledgment
This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (NRF-2013R1A1A2006236).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Islam, A.K.M.T., Lim, CG., Jeong, BS. (2014). Parallel Gene Clustering Using MapReduce. In: Chen, Y., et al. Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science(), vol 8597. Springer, Cham. https://doi.org/10.1007/978-3-319-11538-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-11538-2_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11537-5
Online ISBN: 978-3-319-11538-2
eBook Packages: Computer ScienceComputer Science (R0)