Skip to main content
Log in

From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

The scientific literature is important source of experimental and chemical structure data. Very often this data has been harvested into smaller or bigger data collections leaving the data quality and curation issues on shoulders of users. The current research presents a systematic and reproducible workflow for collecting series of data points from scientific literature and assembling a database that is suitable for the purposes of high quality modelling and decision support. The quality assurance aspect of the workflow is concerned with the curation of both chemical structures and associated toxicity values at (1) single data point level and (2) collection of data points level. The assembly of a database employs a novel “timeline” approach. The workflow is implemented as a software solution and its applicability is demonstrated on the example of the Tetrahymena pyriformis acute aquatic toxicity endpoint. A literature collection of 86 primary publications for T. pyriformis was found to contain 2,072 chemical compounds and 2,498 unique toxicity values, which divide into 2,440 numerical and 58 textual values. Every chemical compound was assigned to a preferred toxicity value. Examples for most common chemical and toxicological data curation scenarios are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Two chemical naming systems are used through the manuscript: (i) Standardized chemical names, i.e. Preferred IUPAC Name (PIN) and (ii) original chemical names from referenced publications. They are distinguished by the font: PIN is in font Courier+Italic and original names in font Courier.

References

  1. Nendza M, Aldenberg T, Benfenati E, Benigni R, Cronin MTD, Escher S, Fernandez A, Gabbert S, Giralt F, Hewitt M, Hrovat M, Jeram S, Kroese D, Madden JC, Mangelsdorf I, Rallo R, Roncaglioni A, Rorije E, Segner H, Simon-Hettich B, Vermeire T (2010) Data quality assessment for in silico methods: a survey of approaches and needs. In: Cronin MTD, Madden JC (eds) Silico toxicology: principles and applications. The Royal Society of Chemistry, Cambridge, pp 59–117

    Chapter  Google Scholar 

  2. Przybylak KR, Madden JC, Cronin MTD, Hewitt M (2012) Assessing toxicological data quality: basic principles, existing schemes and current limitations. SAR QSAR Environ Res 23:435–459

    Article  CAS  Google Scholar 

  3. OECD principles for the validation, for regulatory purposes, of (quantitative) structure-activity relationship models, 37th joint meeting of the chemicals committee and working party on chemicals, pesticides and biotechnology (2004). http://www.oecd.org/dataoecd/33/37/37849783.pdf Accessed 10 Dec 2012

  4. Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inf 29:476–488

    Article  CAS  Google Scholar 

  5. Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct? QSAR Comb Sci 27:1337–1345

    Article  CAS  Google Scholar 

  6. Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50:1189–1204

    Article  CAS  Google Scholar 

  7. Zhao CY, Boriani E, Chana A, Roncaglioni A, Benfenati E (2008) A new hybrid system of QSAR models for predicting bioconcentration factors (BCF). Chemosphere 73:1701–1707

    Article  CAS  Google Scholar 

  8. Clark RD, Waldman M (2012) Lions and tigers and bears, oh my! Three barriers to progress in computer-aided molecular design. J Comput Aided Mol Des 26:29–34

    Article  CAS  Google Scholar 

  9. Bolton EE, Wang Y, Thiessen PA, Bryant SH (2008) PubChem: integrated platform of small molecules and biological activities. In Cornell W (ed) Annual reports in computational chemistry, volume 4, American Chemical Society, Washington, DC, Chapter 12. http://pubchem.ncbi.nlm.nih.gov/. Accessed 10 Dec 2012

  10. ChemaIDpluss database. http://chem.sis.nlm.nih.gov/chemidplus/. Accessed 10 Dec 2012

  11. Williams AJ, Ekins S (2011) A quality alert and call for improved curation of public chemistry databases. Drug Discov Today 16:747–750

    Article  CAS  Google Scholar 

  12. Williams AJ, Ekins S, Tkachenko V (2012) Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today 17:685–701

    Article  CAS  Google Scholar 

  13. Fu X, Wojak A, Neagu D, Ridley M, Travis K (2011) Data governance in predictive toxicology: a review. J Cheminf 3:24

    Article  Google Scholar 

  14. TETRATOX web-site. http://www.vet.utk.edu/TETRATOX/ Accessed 10 Dec 2012

  15. Google Scholar. http://scholar.google.com/. Accessed 10 Aug 2012

  16. Selected Works of Terry W Schultz. http://works.bepress.com/terry_schultz/doctype.html#article. Accessed 10 Aug 2012

  17. Schultz TW, Cajina-Quezada M, Dumont JN (1980) Structure-toxicity relationships of selected nitrogenous heterocyclic compounds. Arch Environ Contam Toxicol 9:591–598

    Article  CAS  Google Scholar 

  18. ISO 32000-1:2008, Document management—portable document format—Part 1: PDF 1.7. http://www.iso.org/iso/catalogue_detail.htm?csnumber=51502. Accessed 10 Dec 2012

  19. Apache PDFBox—Java PDF Library. http://pdfbox.apache.org/. Accessed 10 Dec 2012

  20. QSAR DataBank. http://www.qsardb.org/. Accessed 10 Dec 2012

  21. IUPAC project, Preferred names in the nomenclature of organic compounds. http://www.iupac.org/nc/home/projects/project-db/project-details.html?tx_wfqbe_pi1[project_nr]=2001-043-1-800. Accessed 10 Dec 2012

  22. MarvinBeans Java chemoinformatics library, version 5.5.0. http://www.chemaxon.com/products/marvin/. Accessed 10 Dec 2012

  23. Chemical Abstracts Service Registry. http://www.cas.org/content/chemical-substances. Accessed 10 Dec 2012

  24. CAS Common Chemistry web service. http://www.commonchemistry.org/. Accessed 10 Dec 2012

  25. NIH/CADD Chemical Identifier Resolver service. http://cactus.nci.nih.gov/chemical/structure/documentation. Accessed 10 Dec 2012

  26. ChemSpider web service. http://www.chemspider.com/AboutServices.aspx. Accessed 10 Dec 2012

  27. Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comp Sci 28:31–36

    Article  CAS  Google Scholar 

  28. Daylight SMILES, Daylight Chemical Information Systems, Inc., Laguna Niguel (CA) USA. http://www.daylight.com/smiles/. Accessed 10 Dec 2012

  29. OpenSMILES. http://www.opensmiles.org/. Accessed 10 Dec 2012

  30. InChI Trust Website. http://www.inchi-trust.org/. Accessed 10 Dec 2012

  31. QsarDB tools. http://qsardb.googlecode.com/. Accessed 10 Dec 2012

  32. Check Digit Verification of CAS Registry Numbers. http://www.cas.org/content/chemical-substances/checkdig. Accessed 10 Dec 2012

  33. Schultz TW (1983) Aquatic toxicology of nitrogen heterocyclic molecules: quantitative structure-activity relationships. In Nriagu JO (ed) Aquatic toxicology. Wiley, New York, pp 401–424

  34. Schultz TW, Lin DT, Wilke TS, Arnold LM (1990) Quantitative structure-activity relationships for the Tetrahymena pyriformis population growth endpoint: a mechanism of action approach. In: Devillers J, Karcher W (eds) Practical applications of quantitative structure-activity relationships (QSAR) in environmental chemistry and toxicology. Joint Research Centre, Italy, pp 241–262

    Google Scholar 

  35. Schultz TW (1996) Tetrahymena in aquatic toxicology: QSARs and ecological hazard assessment. In: Berger S, Pauli W (eds) Proceedings of the international workshop on a protozoan test protocol with tetrahymena in aquatic toxicity testing. German Federal Environmental Agency, Germany, pp 31–65

    Google Scholar 

  36. Schultz TW (1997) TETRATOX: tetrahymena pyriformis population growth impairment endpoint—a surrogate for fish lethality. Toxicol Mech Meth 7:289–309

    Article  CAS  Google Scholar 

  37. Könemann H (1981) Quantitative structure-activity relationships in fish toxicity studies Part 1: relationship for 50 industrial pollutants. Toxicology 19:209–221

    Article  Google Scholar 

  38. Cronin MTD, Aptula AO, Duffy JC, Netzeva TI, Rowe PH, Valkova IV, Schultz TW (2002) Comparative assessment of methods to develop QSARs for the prediction of the toxicity of phenols to Tetrahymena pyriformis. Chemosphere 49:1201–1221

    Article  CAS  Google Scholar 

  39. Schultz TW, Cajina-Quezada M (1982) Structure-toxicity relationships of selected nitrogenous heterocyclic compounds II. Dinitrogen molecules. Arch Environ Contam Toxicol 11:353–361

    CAS  Google Scholar 

  40. Schultz TW, Applehans FM (1985) Correlations for the acute toxicity of multiple nitrogen substituted aromatic molecules. Ecotox Environ Safe 10:75–85

    Article  CAS  Google Scholar 

  41. Schultz TW, Riggin GW (1985) Predictive correlations for the toxicity of alkyl- and halogen-substituted phenols. Toxicol Lett 25:47–54

    Article  CAS  Google Scholar 

  42. Schultz TW, Moulton BA (1985) Structure-activity relationships for nitrogen-containing aromatic molecules. Environ Toxicol Chem 4:353–359

    Article  CAS  Google Scholar 

  43. Schultz TW, Moulton BA (1985) Structure-activity relationships of selected pyridines: I. Substituent constant analysis. Ecotox Environ Safe 10:97–111

    Article  CAS  Google Scholar 

  44. Schultz TW, Holcombe GW, Phipps GL (1986) Relationships of quantitative structure-activity to comparative toxicity of selected phenols in the Pimephales promelas and Tetrahymena pyriformis test systems. Ecotox Environ Saf 12:146–153

    Article  CAS  Google Scholar 

  45. Schultz TW, Applehans FM, Riggin GW (1987) Structure-activity relationships of selected pyridines: III. Log Kow analysis. Ecotox Environ Saf 13:76–83

    Article  CAS  Google Scholar 

  46. Schultz TW, Dawson DA, Lin DT (1989) Comparative toxicity of selected nitrogen-containing aromatic compounds in the Tetrahymena pyriformis and Pimephales promelas test systems. Chemosphere 18:2283–2291

    CAS  Google Scholar 

  47. Schultz TW, Arnold LM, Wilke TS, Moulton MP (1989) Relationships of quantitative structure-activity for normal aliphatic alcohols. Ecotox Environ Saf 19:243–253

    Article  Google Scholar 

  48. Cajina-Quezada M, Schultz TW (1990) Structure-toxicity relationships for selected weak acid respiratory uncouplers. Aquat Toxicol 17:239–252

    Article  CAS  Google Scholar 

  49. Schultz TW, Wyatt NL, Lin DT (1990) Structure-toxicity relationships for nonpolar narcotics: a comparison of data from the tetrahymena, photobacterium and pimephales systems. Bull Environ Contam Toxicol 44:67–72

    Article  CAS  Google Scholar 

  50. Schultz TW, Wilke TS, Bryant SE, Hosein LM (1991) QSARs for selected aliphatic and aromatic amines. Sci Total Environ 109:581–587

    Article  Google Scholar 

  51. Schultz TW, Lin DT, Wesley SK (1992) QSARs for monosubstituted phenols and the polar narcosis mechanism of toxicity. Quality Assur Good Pract Regul Law 1:132–143

    CAS  Google Scholar 

  52. Jaworska JS, Schultz TW (1993) Quantitative relationships of structure-activity and volume fraction for selected nonpolar and polar narcotic chemicals. SAR QSAR Environ Res 1:3–19

    Article  CAS  Google Scholar 

  53. Schultz TW, Tichy M (1993) Structure-toxicity relationships for unsaturated alcohols to Tetrahymena pyriformis: C5 and C6 analogs and primary propargylic alcohols. Bull Environ Contam Toxicol 51:681–688

    Article  CAS  Google Scholar 

  54. Bryant SE, Schultz TW (1994) Toxicological assessment of biotransformation products of pentachlorophenol: tetrahymena population growth impairment. Arch Environ Con Tox 26:299–303

    Article  CAS  Google Scholar 

  55. Schultz TW, Kissel TS, Tichy M (1994) Structure-toxicity relationships for unsaturated alcohols to Tetrahymena pyriformis: 3-alkyn-1-ols and 2-alken-1-ols. Bull Environ Contam Toxicol 53:179–185

    Article  CAS  Google Scholar 

  56. Cronin MTD, Bryant SE, Dearden JC, Schultz TW (1995) Quantitative structure-activity study of the toxicity of benzonitriles to the ciliate Tetrahymena pyriformis. SAR QSAR Environ Res 3:1–13

    Article  CAS  Google Scholar 

  57. Dearden JC, Cronin MTD, Schultz TW, Lin DT (1995) QSAR study of the toxicity of nitrobenzenes to Tetrahymena pyriformis. QSAR Comb Sci 14:427–432

    CAS  Google Scholar 

  58. Jaworska JS, Hunter RS, Schultz TW (1995) Quantitative structure-toxicity relationships and volume fraction analyses for selected esters. Arch Environ Contam Toxicol 29:86–93

    Article  CAS  Google Scholar 

  59. Schultz TW, Sinks GD, Hunter RS (1995) Structure-toxicity relationships for alkanones and alkenones. SAR QSAR Environ Res 3:27–36

    Article  CAS  Google Scholar 

  60. Piršelová K, Baláž Š, Schultz TW (1996) Model-based QSAR for ionizable compounds: toxicity of phenols against Tetrahymena pyriformis. Arch Environ Con Tox 30:170–177

    Article  Google Scholar 

  61. Cronin MTD, Schultz TW (1996) Structure-toxicity relationships for phenols to Tetrahymena pyriformis. Chemosphere 32:1453–1468

    Article  CAS  Google Scholar 

  62. Schultz TW, Bearden AP, Jaworska JS (1996) A novel QSAR approach for estimating toxicity of phenols. SAR QSAR Environ Res 5:99–112

    Article  CAS  Google Scholar 

  63. Bearden AP, Schultz TW (1997) Structure-activity relationships for Pimephales and Tetrahymena: a mechanism of action approach. Environ Toxicol Chem 16:1311–1317

    CAS  Google Scholar 

  64. Jaworska JS, Hunter RS, Gobble JR, Schultz TW (1997) Structure-activity relationships for diesters: aquatic toxicity to Tetrahymena. In: Schüürmann G, Chen F (eds) Quantitative structure-activity relationships in environmental sciences. SETAC Press, New York, pp 277–283

    Google Scholar 

  65. Schultz TW, Sinks GD, Cronin MTD (1997) Identification of mechanisms of toxic action of phenols to Tetrahymena pyriformis from molecular descriptors. In: Schüürmann G, Chen F (eds) Quantitative structure-activity relationships in environmental sciences. SETAC Press, New York, pp 329–342

    Google Scholar 

  66. Schultz TW, Sinks GD, Cronin MTD (1997) Quinone-induced toxicity to Tetrahymena: structure-activity relationships. Aquat Toxicol 39:267–278

    Article  CAS  Google Scholar 

  67. Bearden AP, Schultz TW (1998) Comparison of Tetrahymena and Pimephales toxicity based on mechanism of action. SAR QSAR Environ Res 9:127–153

    Article  CAS  Google Scholar 

  68. Cronin MTD, Gregory BW, Schultz TW (1998) Quantitative structure-activity analyses of nitrobenzene toxicity to Tetrahymena pyriformis. Chem Res Toxicol 11:902–908

    Article  CAS  Google Scholar 

  69. Schultz TW, Sinks GD, Bearden AP. QSAR in aquatic toxicology: a mechanism of action approach comparing toxic potency to Pimephales promelas, Tetrahymena pyriformis, and Vibrio fischeri. In: Devillers J (ed) Comparative QSAR. Taylor & Francis, UK, pp 51–109

  70. Schultz TW, Bearden AP (1998) Structure-toxicity relationships for selected naphthoquinones to Tetrahymena pyriformis. Bull Environ Contam Toxicol 61:405–410

    Article  CAS  Google Scholar 

  71. Sinks GD, Carver TA, Schultz TW (1998) Structure-toxicity relationships for aminoalkanols: a comparison with alkanols and alkanamines. SAR QSAR Environ Res 9:217–228

    Article  CAS  Google Scholar 

  72. Akers KS, Sinks GD, Schultz TW (1999) Structure-toxicity relationships for selected halogenated aliphatic chemicals. Environ Toxicol Pharmacol 7:33–39

    Article  CAS  Google Scholar 

  73. Muccini M, Layton AC, Sayler GS, Schultz TW (1999) Aquatic toxicities of halogenated benzoic acids to Tetrahymena pyriformis. Bull Environ Contam Toxicol 62:616–622

    Article  CAS  Google Scholar 

  74. Schultz TW, Cronin MTD (1999) Response-surface analyses for toxicity to Tetrahymena pyriformis: reactive carbonyl-containing aliphatic chemicals. J Chem Inf Comp Sci 39:304–309

    Article  CAS  Google Scholar 

  75. Schultz TW (1999) Structure-toxicity relationships for benzenes evaluated with Tetrahymena pyriformis. Chem Res Toxicol 12:1262–1267

    Article  CAS  Google Scholar 

  76. Schultz TW, DeWeese AD (1999) Structure-toxicity relationships for selected lactones to Tetrahymena pyriformis. Bull Environ Contam Toxicol 62:463–468

    Article  CAS  Google Scholar 

  77. Seward JR, Schultz TW (1999) QSAR analyses of the toxicity of aliphatic carboxylic acids and salts to Tetrahymena Pyriformis. SAR QSAR Environ Res 10:557–568

    Article  CAS  Google Scholar 

  78. Seward JR, Sinks GD, Schultz TW (2000) Population growth kinetics of Tetrahymena pyriformis exposed to selected pyridines. Europ J Protistol 36:139–149

    Article  Google Scholar 

  79. Cronin MTD, Schultz TW (2001) Development of quantitative structure-activity relationships for the toxicity of aromatic compounds to Tetrahymena pyriformis: comparative assessment of the methodologies. Chem Res Toxicol 14:1284–1295

    Article  CAS  Google Scholar 

  80. Cronin MTD, Manga N, Seward JR, Sinks GD, Schultz TW (2001) Parametrization of electrophilicity for the prediction of the toxicity of aromatic compounds. Chem Res Toxicol 14:1498–1505

    Article  CAS  Google Scholar 

  81. DeWeese AD, Schultz TW (2001) Structure-activity relationships for aquatic toxicity to Tetrahymena: halogen-substituted aliphatic esters. Environ Toxicol 16:54–60

    Article  CAS  Google Scholar 

  82. Schultz TW, Sinks GD, Miller LA (2001) Population growth impairment of sulfur-containing compounds to Tetrahymena pyriformis. Environ Toxicol 16:543–549

    Article  CAS  Google Scholar 

  83. Seward JR, Sinks GD, Schultz TW (2001) Reproducibility of toxicity across mode of toxic action in the Tetrahymena population growth impairment assay. Aquat Toxicol 53:33–47

    Article  CAS  Google Scholar 

  84. Seward JR, Cronin MTD, Schultz TW (2001) Structure-toxicity analyses of Tetrahymena pyriformis exposed to pyridines—an examination into extension of surface-response domains. SAR QSAR Environ Res 11:489–512

    Article  CAS  Google Scholar 

  85. Sinks GD, Schultz TW (2001) Correlation of Tetrahymena and Pimephales toxicity: evaluation of 100 additional compounds. Environ Toxicol Chem 20:917–921

    CAS  Google Scholar 

  86. Baláž Š, Lukacova V (2002) Subcellular pharmacokinetics and its potential for library focusing. J Mol Graph Model 20:479–490

    Article  Google Scholar 

  87. Aptula AO, Netzeva TI, Valkova IV, Cronin MTD, Schultz TW, Kühne R, Schüürmann G (2002) Multivariate discrimination between modes of toxic action of phenols. Quant Struct-Act Relat 21:12–22

    Article  CAS  Google Scholar 

  88. Kaiser KL, Niculescu SP, Schultz TW (2002) Probabilistic neural network modeling of the toxicity of chemicals to Tetrahymena pyriformis with molecular fragment descriptors. SAR QSAR Environ Res 13:57–67

    Article  CAS  Google Scholar 

  89. Schultz TW, Cronin MTD, Netzeva TI, Aptula AO (2002) Structure-toxicity relationships for aliphatic chemicals evaluated with Tetrahymena pyriformis. Chem Res Toxicol 15:1602–1609

    Article  CAS  Google Scholar 

  90. Seward JR, Cronin MTD, Schultz TW (2002) The effect of precision of molecular orbital descriptors on toxicity modeling of selected pyridines. SAR QSAR Environ Res 13:325–340

    Article  CAS  Google Scholar 

  91. Seward JR, Hamblen EL, Schultz TW (2002) Regression comparisons of Tetrahymena pyriformis and Poecilia reticulata toxicity. Chemosphere 47:93–101

    Article  CAS  Google Scholar 

  92. Cottrell MB, Schultz TW (2003) Structure-toxicity relationships for methyl esters of cyanoacetic acids to Tetrahymena pyriformis. Bull Environ Contam Toxicol 70:549–556

    Article  CAS  Google Scholar 

  93. Schüürmann G, Aptula AO, Kühne R, Ebert RU (2003) Stepwise discrimination between four modes of toxic action of phenols in the Tetrahymena pyriformis Assay. Chem Res Toxicol 16:974–987

    Article  Google Scholar 

  94. Netzeva TI, Schultz TW, Aptula AO, Cronin MTD (2003) Partial least squares modelling of the acute toxicity of aliphatic compounds to Tetrahymena pyriformis. SAR QSAR Environ Res 14:265–283

    Article  CAS  Google Scholar 

  95. Netzeva TI, Aptula AO, Chaudary SH, Duffy JC, Schultz TW, Schüürmann G, Cronin MTD (2003) Structure-activity relationships for the toxicity of substituted poly-hydroxylated benzenes to Tetrahymena pyriformis: influence of free radical formation. QSAR Comb Sci 22:575–582

    Article  CAS  Google Scholar 

  96. Ren S, Frymier PD, Schultz TW (2003) An exploratory study of the use of multivariate techniques to determine mechanisms of toxic action. Ecotox Environ Saf 55:86–97

    Article  CAS  Google Scholar 

  97. Schultz TW, Netzeva TI, Cronin MTD (2003) Selection of data sets for QSARS: analyses of tetrahymena toxicity from aromatic compounds. SAR QSAR Environ Res 14:59–81

    Article  CAS  Google Scholar 

  98. Schultz TW, Tucker VA (2003) Structure-toxicity relationships for the effects of N- and N, N’-alkyl thioureas to Tetrahymena pyriformis. Bull Environ Contam Toxicol 70:1251–1258

    Article  CAS  Google Scholar 

  99. Dimitrov S, Koleva Y, Schultz TW, Walker JD, Mekenyan O (2004) Interspecies quantitative structure-activity relationship model for aldehydes: aquatic toxicity. Environ Toxicol Chem 23:463–470

    Article  CAS  Google Scholar 

  100. Schultz TW, Netzeva TI (2004) Development and evaluation of QSARs for ecotoxic endpoints: the benzene response-surface model for Tetrahymena toxicity. In: Livingstone DJ, Cronin MTD (eds) Predicting chemical toxicity and fate. CRC Press, Boca Raton, FL, pp 265–284

    Google Scholar 

  101. Schultz TW, Seward-Nagel J, Foster KA, Tucker VA (2004) Population growth impairment of aliphatic alcohols to Tetrahymena. Environ Toxicol 19:1–10

    Article  CAS  Google Scholar 

  102. Schultz TW, Yarbrough JW (2004) Trends in structure-toxicity relationships for carbonyl-containing α, β-unsaturated compounds. SAR QSAR Environ Res 15:139–146

    Article  CAS  Google Scholar 

  103. Aptula AO, Jeliazkova NG, Schultz TW, Cronin MTD (2005) The better predictive model: high q2 for the training set or low root mean square error of prediction for the test set? QSAR Comb Sci 24:385–396

    Article  CAS  Google Scholar 

  104. Aptula AO, Roberts DW, Cronin MTD, Schultz TW (2005) Chemistry-toxicity relationships for the effects of di- and trihydroxybenzenes to Tetrahymena pyriformis. Chem Res Toxicol 18:844–854

    Article  CAS  Google Scholar 

  105. Gagliardi SR, Schultz TW (2005) Regression comparisons of aquatic toxicity of benzene derivatives: tetrahymena pyriformis and Rana japonica. Bull Environ Contam Toxicol 74:256–262

    Article  CAS  Google Scholar 

  106. Netzeva TI, Schultz TW (2005) QSARs for the aquatic toxicity of aromatic aldehydes from Tetrahymena data. Chemosphere 61:1632–1643

    Article  CAS  Google Scholar 

  107. Schultz TW, Netzeva TI, Roberts DW, Cronin MTD (2005) Structure-toxicity relationships for the effects to Tetrahymena pyriformis of aliphatic, carbonyl-containing, α, β-unsaturated chemicals. Chem Res Toxicol 18:330–341

    Article  CAS  Google Scholar 

  108. Schultz TW, Yarbrough JW, Woldemeskel M (2005) Toxicity to Tetrahymena and abiotic thiol reactivity of aromatic isothiocyanates. Cell Biol Toxicol 21:181–189

    Article  CAS  Google Scholar 

  109. Schultz TW, Yarbrough JW, Koss SK (2006) Identification of reactive toxicants: structure-activity relationships for amides. Cell Biol Toxicol 22:339–349

    Article  CAS  Google Scholar 

  110. Schultz TW, Hewitt M, Netzeva TI, Cronin MTD (2007) Assessing applicability domains of toxicological QSARs: definition, confidence in predicted values, and the role of mechanisms of action. QSAR Comb Sci 26:238–254

    Article  CAS  Google Scholar 

  111. Schultz TW, Yarbrough JW, Pilkington TB (2007) Aquatic toxicity and abiotic thiol reactivity of aliphatic isothiocyanates: effects of alkyl-size and –shape. Environ Toxicol Pharmacol 23:10–17

    Article  CAS  Google Scholar 

  112. Schultz TW, Ralston KE, Roberts DW, Veith GD, Aptula AO (2007) Structure-activity relationships for abiotic thiol reactivity and aquatic toxicity of halo-substituted carbonyl compounds. SAR QSAR Environ Res 18:21–29

    Article  CAS  Google Scholar 

  113. Yarbrough JW, Schultz TW (2007) Abiotic sulfhydryl reactivity: a predictor of aquatic toxicity for carbonyl-containing α, β-unsaturated compounds. Chem Res Toxicol 20:558–562

    Article  CAS  Google Scholar 

  114. Ellison CM, Cronin MTD, Madden JC, Schultz TW (2008) Definition of the structural domain of the baseline non-polar narcosis model for Tetrahymena pyriformis. SAR QSAR Environ Res 19:751–783

    Article  CAS  Google Scholar 

  115. Böhme A, Thaens D, Schramm F, Paschke A, Schüürmann G (2010) Thiol reactivity and its impact on the ciliate toxicity of α, β-unsaturated aldehydes, ketones, and esters. Chem Res Toxicol 23:1905–1912

    Article  Google Scholar 

  116. Roberts DW, Schultz TW, Wolf EM, Aptula AO (2010) Experimental reactivity parameters for toxicity modeling: application to the acute aquatic toxicity of SN2 electrophiles to Tetrahymena pyriformis. Chem Res Toxicol 23:228–234

    Article  CAS  Google Scholar 

  117. Schultz TW, Sparfkin CL, Aptula AO (2010) Reactivity-based toxicity modelling of five-membered heterocyclic compounds: application to Tetrahymena pyriformis. SAR QSAR Environ Res 7:681–691

    Article  Google Scholar 

  118. Bajot F, Cronin MTD, Roberts DW, Schultz TW (2011) Reactivity and aquatic toxicity of aromatic compounds transformable to quinone-type Michael acceptors. SAR QSAR Environ Res 22:51–65

    Article  CAS  Google Scholar 

  119. QsarDB collection of TETRATOX primary publications. http://hdl.handle.net/10967/7. Accessed 15 Dec 2012

  120. LOGKOW™, A databank of evaluated octanol-water partition coefficients, Sangster Research Laboratories, Montréal, QC, Canada. http://logkow.cisti.nrc.ca/logkow/. Accessed 15 Dec 2012

  121. ClogP, BioByte Corp. Claremont (CA), USA. http://www.biobyte.com/bb/prod/clogp40.html. Accessed 15 Dec 2012

  122. Estimation Program Interface (EPI) Suite, U.S. Environmental Protection Agency, Washington (DC), USA. http://www.epa.gov/oppt/exposure/pubs/episuite.htm. Accessed 15 Dec 2012

  123. Nonylphenol, Wikipedia, The free encyclopedia. http://en.wikipedia.org/wiki/Nonylphenol. Accessed 15 Dec 2012

  124. Personal communication with Prof. Schultz TW, College of Veterinary Medicine, The University of Tennessee, 2407 River Drive, Knoxville, TN 37996 July 2012

  125. Schultz TW, Cronin MTD, Walker JD, Aptula AO (2003) Quantitative structure-activity relationships (QSARs) in toxicology: a historical perspective. J Mol Struc-THEOCHEM 622:1–22

    Article  CAS  Google Scholar 

  126. Intelligent Modelling Algorithms for the General Evaluation of TOXicities (IMAGETOX), EU 5-th FP, # HPRN-CT-1999-00015, duration 2001–2004, participating institutions: Mario Negri Institute for Pharmacological Research (Milan, Italy), Liverpool John Moores University (UK), Umweltforschungszentrum Leipzig-Halle GmbH (Germany), Polytechnic of Milan (Italy), National Institute of Chemistry (Ljubljana, Slovenia), Utrecht University (Netherlands), University of Tartu (Estonia)

Download references

Acknowledgments

Estonian Science Foundation (Grant 7709) and Estonian Ministry for Education and Research (Grant SF0140031Bs09) for financial support. Authors are grateful to Prof. T.W. Schultz (University of Tennessee) for his assistance in resolving selected data points. Authors are thankful to Dr. Sulev Sild (University of Tartu, Estonia) for the discussion at final stages of manuscript preparation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Uko Maran.

Additional information

Disclaimer on fair use conditions. The data set which accompanies current publication should be regarded as a derivative work or the earlier publications of Prof. T.W. Schultz (University of Tennessee) and his co-workers. Further users are expected to properly credit and reference the most significant of them.

Disclaimer on undiscovered and new data. Authors did their best effort to find all possible T. pyriformis acute aquatic toxicity primary publications. Despite of that it can occur that something is missed or new data becomes available. In either case we are most grateful for references to such potential primary publications.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10822_2013_9664_MOESM1_ESM.doc

Supplementary Material includes Tables with examples for data curation with respective detailed discussion and fully curated dataset. (DOC 612 kb)

Supplementary material 2 (XLS 737 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ruusmann, V., Maran, U. From data point timelines to a well curated data set, data mining of experimental data and chemical structure data from scientific articles, problems and possible solutions. J Comput Aided Mol Des 27, 583–603 (2013). https://doi.org/10.1007/s10822-013-9664-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-013-9664-4

Keywords

Navigation