ABSTRACT
Visualization of large-scale high dimensional data tool is highly valuable for scientific discovery in many fields. We present Pub Chem Browse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in GIS browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel MDS (Multi-Dimensional Scaling) and GTM (Generative Topographic Mapping) services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype use with Chem2Bio2RDF system using SPARQL query language to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated Pub Chem Browse application and outline its use in drug discovery. The same core technologies can be used to develop similar high dimensional browsers in other scientific areas.
- }}B. Chen, D. Wild, Q. Zhu, Y. Ding, X. Dong, M. Sankaranarayanan, H. Wang, and Y. Sun, "Chem2bio2rdf: A linked open data portal for chemical biology," in Future of the Web in Collaboratice Science (FWCS) 2010, 2010.Google Scholar
- }}D. G. York, "The sloan digital sky survey: Technical summary," Astron. J., vol. 120, pp. 1579--1587, 2000.Google ScholarCross Ref
- }}R. Kuhn, D. Karolchik, A. Zweig, T. Wang, K. Smith, K. Rosenbloom, B. Rhead, B. Raney, A. Pohl, M. Pheasant, et al., "The UCSC genome browser database: update 2009," Nucleic acids research, vol. 37, no. Database issue, p. D755, 2009.Google Scholar
- }}T. Hubbard, D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, et al., "The Ensembl genome database project," Nucleic acids research, vol. 30, no. 1, p. 38, 2002.Google ScholarCross Ref
- }}J. Qiu, J. Ekanayake, T. Gunarathne, J. Y. Choi, S.-H. Bae, Y. Ruan, S. Ekanayake, S. Wu, S. Beason, G. Fox, M. Rho, and H. Tang, "Data intensive computing for bioinformatics," in Data Intensive Distributed Computing, IGI Publishers, 2010.Google Scholar
- }}D. Maniyar, I. Nabney, B. Williams, and A. Sewing, "Data visualization during the early stages of drug discovery," Journal of chemical information and modeling, vol. 46, no. 4, pp. 1806--1818, 2006.Google ScholarCross Ref
- }}S.-H. Bae, J. Y. Choi, J. Qiu, and G. Fox, "Dimension reduction and visualization of large high-dimensional data via interpolation," in Proceeding of HPDC 2010 (to appear), June 2010. Google ScholarDigital Library
- }}J. Y. Choi, S.-H. Bae, X. Qiu, and G. Fox, "High performance dimension reduction and visualization for large high-dimensional data analysis," in Proceedings of CCGRID 2010 (to appear), 2010. Google ScholarDigital Library
- }}J. B. Kruskal and M. Wish, Multidimensional Scaling. Beverly Hills, CA, U.S.A.: Sage, 1978.Google Scholar
- }}C. Bishop, M. Svensén, and C. Williams, "GTM: A principled alternative to the self-organizing map," Advances in neural information processing systems, pp. 354--360, 1997. Google ScholarDigital Library
- }}J. de Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, pp. 133--145, 1977.Google Scholar
- }}M. Allesø, F. van den Berg, C. Cornett, F. Jørgensen, B. Halling-Sørensen, H. de Diego, L. Hovgaard, J. Aaltonen, and J. Rantanen, "Solvent diversity in polymorph screening," Journal of pharmaceutical sciences, vol. 97, no. 6, pp. 2145--2159, 2007.Google ScholarCross Ref
Index Terms
- Browsing large scale cheminformatics data with dimension reduction
Recommendations
Browsing large-scale cheminformatics data with dimension reduction
Visualization of large-scale high dimensional data is highly valuable for data analysis facilitating scientific discovery in many fields. We present PubChemBrowse, a customized visualization tool for cheminformatics research. It provides a novel 3D data ...
Novel insight into the molecular interaction of catalase and sucrose: A combination of in silico and in planta assays study
Highlights- In silico 3D structure modelling of rice catalase-A (CatA) protein and its molecular interaction with sucrose was studied.
AbstractOsmolytes are known to be an important factor for the stabilization and proficient functioning of proteins. However, the stabilization mechanism of proteins by the interaction of osmolytes is still not well explored. Here, we performed ...
Computational insights into potential marine natural products as selective inhibitors of Mycobacterium tuberculosis InhA: A structure-based virtual screening study
AbstractSeveral factors are associated with the emergence of drug resistance mechanisms, such as impermeable cell walls, gene mutations, and drug efflux systems. Consequently, bacteria acquire resistance, leading to a decrease in drug efficacy. A new and ...
Graphical AbstractDisplay Omitted
Highlights- Marine natural products were screened using structure-based virtual screening to identify potential leads for anti-MTB treatment.
- The selected compounds adhered to Lipinski’s rule of five and exhibited no toxicity in ProTox-II.
- The ...
Comments