skip to main content
10.1145/1851476.1851549acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Browsing large scale cheminformatics data with dimension reduction

Published:21 June 2010Publication History

ABSTRACT

Visualization of large-scale high dimensional data tool is highly valuable for scientific discovery in many fields. We present Pub Chem Browse, a customized visualization tool for cheminformatics research. It provides a novel 3D data point browser that displays complex properties of massive data on commodity clients. As in GIS browsers for Earth and Environment data, chemical compounds with similar properties are nearby in the browser. PubChemBrowse is built around in-house high performance parallel MDS (Multi-Dimensional Scaling) and GTM (Generative Topographic Mapping) services and supports fast interaction with an external property database. These properties can be overlaid on 3D mapped compound space or queried for individual points. We prototype use with Chem2Bio2RDF system using SPARQL query language to access over 20 publicly accessible bioinformatics databases. We describe our design and implementation of the integrated Pub Chem Browse application and outline its use in drug discovery. The same core technologies can be used to develop similar high dimensional browsers in other scientific areas.

References

  1. }}B. Chen, D. Wild, Q. Zhu, Y. Ding, X. Dong, M. Sankaranarayanan, H. Wang, and Y. Sun, "Chem2bio2rdf: A linked open data portal for chemical biology," in Future of the Web in Collaboratice Science (FWCS) 2010, 2010.Google ScholarGoogle Scholar
  2. }}D. G. York, "The sloan digital sky survey: Technical summary," Astron. J., vol. 120, pp. 1579--1587, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  3. }}R. Kuhn, D. Karolchik, A. Zweig, T. Wang, K. Smith, K. Rosenbloom, B. Rhead, B. Raney, A. Pohl, M. Pheasant, et al., "The UCSC genome browser database: update 2009," Nucleic acids research, vol. 37, no. Database issue, p. D755, 2009.Google ScholarGoogle Scholar
  4. }}T. Hubbard, D. Barker, E. Birney, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, et al., "The Ensembl genome database project," Nucleic acids research, vol. 30, no. 1, p. 38, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. }}J. Qiu, J. Ekanayake, T. Gunarathne, J. Y. Choi, S.-H. Bae, Y. Ruan, S. Ekanayake, S. Wu, S. Beason, G. Fox, M. Rho, and H. Tang, "Data intensive computing for bioinformatics," in Data Intensive Distributed Computing, IGI Publishers, 2010.Google ScholarGoogle Scholar
  6. }}D. Maniyar, I. Nabney, B. Williams, and A. Sewing, "Data visualization during the early stages of drug discovery," Journal of chemical information and modeling, vol. 46, no. 4, pp. 1806--1818, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  7. }}S.-H. Bae, J. Y. Choi, J. Qiu, and G. Fox, "Dimension reduction and visualization of large high-dimensional data via interpolation," in Proceeding of HPDC 2010 (to appear), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. }}J. Y. Choi, S.-H. Bae, X. Qiu, and G. Fox, "High performance dimension reduction and visualization for large high-dimensional data analysis," in Proceedings of CCGRID 2010 (to appear), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. }}J. B. Kruskal and M. Wish, Multidimensional Scaling. Beverly Hills, CA, U.S.A.: Sage, 1978.Google ScholarGoogle Scholar
  10. }}C. Bishop, M. Svensén, and C. Williams, "GTM: A principled alternative to the self-organizing map," Advances in neural information processing systems, pp. 354--360, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. }}J. de Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, pp. 133--145, 1977.Google ScholarGoogle Scholar
  12. }}M. Allesø, F. van den Berg, C. Cornett, F. Jørgensen, B. Halling-Sørensen, H. de Diego, L. Hovgaard, J. Aaltonen, and J. Rantanen, "Solvent diversity in polymorph screening," Journal of pharmaceutical sciences, vol. 97, no. 6, pp. 2145--2159, 2007.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Browsing large scale cheminformatics data with dimension reduction

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
            June 2010
            911 pages
            ISBN:9781605589428
            DOI:10.1145/1851476

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate166of966submissions,17%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader