Skip to main content

Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples

  • Conference paper
Data Integration in the Life Sciences (DILS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3615))

Included in the following conference series:

Abstract

Integrating data involving chemical structures is simplified when unique identifiers (UIDs) can be associated with chemical structures. For example, these identifiers can be used as database keys. One common approach is to use the Unique SMILES notation introduced in [2]. The Unique SMILES views a chemical structure as a graph with atoms as nodes and bonds as edges and uses a depth first traversal of the graph to generate the SMILES strings. The algorithm establishes a node ordering by using certain symmetry properties of the graphs. In this paper, we present certain molecular graphs for which the algorithm fails to generate UIDs. Indeed, we show that different graphs in the same symmetry class employed by the Unique SMILES algorithm have different Unique SMILES IDs. We tested the algorithm on the National Cancer Institute (NCI) database [7] and found several molecular structures for which the algorithm also failed. We have also written a python script that generates molecular graphs for which the algorithm fails.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Weininger, D.: SMILES, a Chemical Language and Information System 1: Introduction to Methodology and Encoding Rules, Medicinal Chemistry Project, Pomona College (1988)

    Google Scholar 

  2. Weininger, D., Weininger, A., Weininger, J.L.: SMILES 2: Algorithm for Generation of Unique SMILES Notation, Daylight Chemical Information Systems, Irvine, California 92714 (1989); Note that although the Unique SMILES implementation has been changed by the Daylight Chemical Information System, this appears to be the most recent publication describing the algorithm

    Google Scholar 

  3. Weininger, D.: SMILES 3: Depicting Graphical Depiction of Chemical Structures, Daylight Chemical Information Systems, New Orleans, Louisiana

    Google Scholar 

  4. A SMILES to graph translation can be found at: http://www.daylight.com/daycgi/depict

  5. A SMILES to UNIQUE SMILES translation can be found at, http://cactus.nci.nih.gov/services/translate/

  6. More counter examples can be found at the web site, http://ncdm171.lac.uic.edu/neglur/USMILES/USMILES.html

  7. NCI database, retrieved from http://129.43.27.140/ncidb2/ on (March 2, 2005)

  8. Sample adjacency list used -{1:[[’C’,1,6,’0’,3],[[1,2]]], 2:[[’C’,2,6,’0’,2],[[1,1],[1,3]]], 3:[[’C’,4,6,’0’,0],[[1,2],[2,4],[1,11]]], 4:[[’C’,3,6,’0’,1],[[2,3],[1,5]]], 5:[[’C’,4,6,’0’,0],[[1,4],[1,6],[2,8]]], 6:[[’C’,2,6,’0’,2],[[1,5],[1,7]]], 7:[[’C’,1,6,’0’,3],[[1,6]]], 8:[[’C’,3,6,’0’,1],[[2,5],[1,9]]], 9:[[’C’,4,6,’0’,0],[[1,8],[1,10],[2,11]]], 10:[[’C’,1,6,’0’,3],[[1,9]]], 11:[[’C’,3,6,’0’,1],[[1,3],[2,9]]]}

    Google Scholar 

  9. CANON Algorithm (Extract from Reference [2])- (1) Set the atomic vector to initial invariants. Go to step 3. (2) Set vector to product of primes corresponding to neighbors’ ranks. (3) Sort vector, maintaining stability over previous ranks. (4) Rank atomic vector. (5) If not invariant partitioning, go to step 2. (6) On first pass, save partitioning as symmetry classes. (7) If highest rank is smaller than number of nodes, break ties, go to step 2. (8)... else done

    Google Scholar 

  10. See http://bioweb.dataspaceweb.org/chemicalKeys (retrieved on March 2, 2005)

  11. Beyer, T., Proskurowski, A.: Symmetries in graph coding. In: Proceedings of Northwest 1976 ACM–CIPS Pacific Regional Symposium, pp. 198–203 (1976)

    Google Scholar 

  12. HM, C.B., Santolini, A.: A quasi-decision algorithm for the p-equivalence of two matrices. ICC Bull. 8(1), 57–69 (1964)

    Google Scholar 

  13. IUPAC, Nomenclature of Organic Chemistry. Pergamon Press, Oxford (1979)

    Google Scholar 

  14. Klin, M.H., Lebedev, O.V., Pivina, T.S., Zefirov, N.S.: Nonisomorphic cycles of maximum length in a series of chemical graphs and the problem of application of IUPAC nomenclature rules. MATCH 27, 133–151 (1992)

    MATH  MathSciNet  Google Scholar 

  15. See http://www.iupac.org/projects/2000/2000-025-1-800.html (retrieved on March 2, 2005)

  16. Randic, M., Brissey, G.M., Wilkins, C.L.: Computer perception of topological symmetry via canonical numbering of atoms. Journal of Chemical Information and Computer Sciences 21(1), 52–59 (1981)

    Google Scholar 

  17. McKay, B.: Practical Graph Isomorphism. Congr. Numer. 30, 45–87 (1981)

    MathSciNet  Google Scholar 

  18. Morgan, H.L.: The Generation of a Unique Machine Description for Chemical Structures – A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965)

    Article  Google Scholar 

  19. Braun, J., Gugisch, R., Kerber, A., Laue, R., Meringer, M., Rcker, C.: MOLGEN-CID, A Canonizer for Molecules and Graphs Accessible through the Internet. Journal of Chemical Information and Computer Sciences 44, 542–548 (2004)

    Google Scholar 

  20. Grossman, R., Hamelberg, D., Kasturi, P., Liu, B.: Experimental Studies of the Universal Chemical Key (UCK) Algorithm on the NCI Database of Chemical Compounds. In: Proceedings of the 2003 IEEE Computer Society Bioinformatics Conference (CSB 2003), pp. 244–250. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neglur, G., Grossman, R.L., Liu, B. (2005). Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples. In: Ludäscher, B., Raschid, L. (eds) Data Integration in the Life Sciences. DILS 2005. Lecture Notes in Computer Science(), vol 3615. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11530084_13

Download citation

  • DOI: https://doi.org/10.1007/11530084_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27967-9

  • Online ISBN: 978-3-540-31879-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics