Abstract
Grid computing has been noticed as an issue to solve complex problems of large-scale bioinformatics applications and helps to improve data accuracy and processing speed on multiple computation platforms. Outlier detection helps classification success rate high and makes processing time reduce. This paper focuses on a data clustering and classification method with outlier detection which is an important bioinformatics application in grid environment. This paper proposes a grid-based and outlier detection-based clustering and classification(GODDCC) using grid computational resources with geographically distributed bioinformatics data sets. This GODDCC is able to operate large-scale bioinformatics applications in guaranteeing high bio-data accuracy with reasonable grid resources. This paper evaluates performance of GODDCC in comparing to the data clustering and classification(DCC) without outlier detection. The average of processing time of the GODDCC model records the lowest processing time and provides the highest resources utilization than the other DCC models. The outlier detection method reduces processing time for DCC models with maintaining high classification success rate and grid computing gives a great promise of high performance processing with geographically distributed and large-scale bio-data sets in bioinformatics applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, San Francisco (1998)
Rajapakse, J.C., Wong, L., Acharya, R.: Pattern Recognition in Bioinformatics: An Introduction. In: Rajapakse, J.C., Wong, L., Acharya, R. (eds.) PRIB 2006. LNCS (LNBI), vol. 4146, pp. 1–3. Springer, Heidelberg (2006)
Carpenter, G.A., Grossberg, S.: Adaptive resonance theory: Stable self-organization of neural recognition codes in response to arbitrary lists of input patterns. In: Proceedings of the 8th Conference of the Cognitive Science Society, Hillsdale, NJ, pp. 45–62 (1988)
Fumikazu, K., Hiroyuki, U., Kenji, S., Akihiko, K.: A network design for Open Bioin-formatics Grid(GBIGrid). In: Proc. The 3rd Annual Meeting, Chem-Bio Informatics Society, pp. 192–193 (2002)
Stevens, R.D., Robinson, A.J., Goble, C.A.: MyGrid: personalised bioinformatics on the information grid. Bioinformatics, 302–304 (2003)
Li, K.B.: Clustal W-MPI:ClustalW Analysis Using Distributed and Parallel Computing. Bioinformatics 19, 1585–1586 (2003)
DMSO, HLA RTI-1.3 NG Programmer’s Guide Version 3.2
Zong, W., Wang, Y., Cai, W., Turner, S.J.: Grid Services and Service Discovery for HLA-Based Distributed Simulation. In: 8th IEEE International Workshop on Distributed Simulation and Real-Time Applications, pp. 116–124. IEEE Computer Society, Los Alamitos (2004)
Cai, W., Yuan, Z., Low, M.Y.H., Turner, S.J.: Federate migration in HLA-based simulation. Future Generation Computer Systems, 87–95 (2005)
Rycerz, K., Bubak, M., Malawski, M., Sloot, P.M.A.: HLA Grid Based Support for Simulation of Vascular Reconstruction. In: Proceedings of the CoreGRID Workshop: Integrated Research in Grid Computing, pp. 165–174 (2005)
Rycerz, K., Bubak, M., Malawski, M., Sloot, P.M.A.: A Framework for HLA-Based Interactive Simulation on the Grid. Simulation, 67–76 (2005)
Rycerz, K., Bubak, M., Malawski, M., Sloot, P.M.A.: A Grid Service for Management of Multiple HLA Federate Processes. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, pp. 699–706. Springer, Heidelberg (2006)
Vuong, S., Cai, X., Li, J., Pramanik, S., Suttles, D., Chen, R.: FedGrid: An HLA approach to federating grids. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3038, pp. 889–896. Springer, Heidelberg (2004)
Bolton, R., Hand, D.J.: Statistical Fraud Detection: A Review. Statistical Science 17(3), 235–255 (2002)
Watson, S., Arkinstall, S.: The G-protein Linked Receptor Facts Book. Academic Press, Burlington (1994)
Jefferson, M.F., Narayanan, M.N., Lucas, S.B.: A neural network computer method to model the INR response of individual patients anticoagulated with warfarin. Br. J. Haematol. 89(1), 29 (1995)
Weston, J., Watkins, C.: Multi-class support vector machines, Technical Report CSD-TR-98-04, Royal Holloway, University of London (1998)
Cho, K.C., Park, D.H., Ma, Y.B., Lee, J.S.: Optimal Clustering-based ART1 Classification in Bioinformatics: G-Protein Coupled Receptors Classification. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006. LNCS, vol. 4221, pp. 588–597. Springer, Heidelberg (2006)
Cho, K.C., Park, D.H., Lee, J.S.: Computational Grid-based ART1 Classification for Bioinformatics Applications. In: ICCSA 2006, Glasgow, UK, pp. 131–133 (2006)
Kapolka, A.: The Extensible Run-Time Infrastructure (XRTI): An Experimental Implemen-tation of Proposed Improvements to the High Level Architecture. Master’s Thesis, Naval Postgraduate School (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cho, K.C., Lee, J.S. (2011). Grid-Based and Outlier Detection-Based Data Clustering and Classification. In: Kim, Th., Adeli, H., Robles, R.J., Balitanas, M. (eds) Ubiquitous Computing and Multimedia Applications. UCMA 2011. Communications in Computer and Information Science, vol 150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20975-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-20975-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20974-1
Online ISBN: 978-3-642-20975-8
eBook Packages: Computer ScienceComputer Science (R0)