Skip to main content

The Extraction of Information and Knowledge from Trained Neural Networks

  • Protocol
Artificial Neural Networks

Abstract

In the past, neural networks were viewed as classification and regression systems whose internal representations were incomprehensible. It is now becoming apparent that algorithms can be designed that extract comprehensible representations from trained neural networks, enabling them to be used for data mining and knowledge discovery, that is, the discovery and explanation of previously unknown relationships present in data. This chapter reviews existing algorithms for extracting comprehensible representations from neural networks and outlines research to generalize and extend the capabilities of one of these algorithms, TREPAN. This algorithm has been generalized for application to bioinformatics data sets, including the prediction of splice site junctions in human DNA sequences, and cheminformatics. The results generated on these data sets are compared with those generated by a conventional data mining technique (C5) and appropriate conclusions are drawn.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  2. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont, CA.

    Google Scholar 

  3. Craven MW, Shavlik JW (1994) Using sampling and queries to extract rules from trained neural networks. In: Proc. of the 11th international conference on machine learning. Morgan Kaufmann, San Mateo, CA, pp. 37–45.

    Google Scholar 

  4. Bullinaria JA (1997) Analysing the internal representations of trained neural networks. In: Browne A (ed) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 3–26

    Google Scholar 

  5. Browne A (1997) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK.

    Google Scholar 

  6. Gallant SI (1998) Connectionist expert systems, Communications of the ACM 31:152–169.

    Article  Google Scholar 

  7. Gallant SI, Hayashi Y (1990) A neural network expert system with confidence measurements. IPMU:562–567.

    Google Scholar 

  8. Saito K, Nakano R (1988) Medical diagnostic expert system based on PDP model. In: Proc. of IEEE international conf. on neural networks, pp. 255–262.

    Google Scholar 

  9. Shavlik J, Towell G (1989) An approach to combining explanation-based and neural learning algorithms, Connection Science 1:233–255.

    Article  Google Scholar 

  10. Baba K, Enbutu I, Yoda M (1990) Explicit representation of knowledge acquired from plant historical data using neural networks. Neural Networks. 3:155–160.

    Google Scholar 

  11. Bochereau L, Boutgine P (1990) Extraction of semantic features and logical rules from multilayer neural networks. In: International joint conference on neural networks, Washington, DC, vol. 2, pp. 579–582.

    Google Scholar 

  12. Goh TH (1993) Semantic extraction using neural network modelling and sensitivity analysis. In Proc. international joint conf. on neural networks, Nagoya, Japan, pp.1031–1034.

    Google Scholar 

  13. McMillan C, Mozer M, Smolensky P (1993) Dynamic conflict resolution in a connectionist rule-based system. In: Proc. of the 13th IJCAI, pp.1366–1371.

    Google Scholar 

  14. Yeung D, Fong H (1994) Knowledge matrix: AN explanation and knowledge refinement facility for a rule induced neural network. In: Proc. 12th national conf. on artificial intelligence, vol. 2, pp. 889–894

    Google Scholar 

  15. Yoon B, Lacher R (1994) Extracting rules by destructive learning. In: Neural networks, 1994. IEEE world congress on computational intelligence, pp. 1766–1771.

    Google Scholar 

  16. Sethi I, Yoo J (1994) Symbolic approximation of feedforward networks. In: Gesema E, Kanal L (eds) Pattern recognition in practice, IV: multiple paradigms, comparative studies and hybrid systems. North-Holland, Amsterdam, pp. 313–324.

    Google Scholar 

  17. Fletcher G, Hinde C (1995) Using neural networks as a tool for constructive rule based architectures. Knowledge Based Systems 8:183–187.

    Article  Google Scholar 

  18. Thrun SB (1995) Extracting rules from artificial neural networks with distributed representations. In: Tesauro G, Touretzky D, Leen T (eds) Advances in neural information processing systems MIT Press, San Mateo, CA, pp. 505–512.

    Google Scholar 

  19. Benitez J, Castro J, Requina JI (1997) Are artificial neural networks black boxes? IEEE Trans Neural Networks 8:1156–1164.

    Article  CAS  Google Scholar 

  20. Taha I, Ghosh J (1997) Evaluating and ordering of rules extracted from feedforward networks. In: Proc. IEEE international conf. on neural networks, pp. 408–413.

    Google Scholar 

  21. Ampratwum CS, Picton PD, Browne A (1998) Rule extraction from neural network models of chemical species in optical emission spectra. In: Proc. workshop on recent advances in soft computing, pp. 53–64.

    Google Scholar 

  22. Maire F (1999) Rule extraction by backpropagation of polyhedrons. Neural Networks, 12:717–725.

    Article  PubMed  Google Scholar 

  23. Ishikawa M (2000) Rule extraction by successive regularization. Neural Networks 13:1171–183.

    Article  CAS  PubMed  Google Scholar 

  24. Setiono R (2000) Extracting m-of-n rules from trained neural networks. IEEE Trans Neural Networks 11:512–519.

    Article  CAS  Google Scholar 

  25. Ultsch A, Mantyk R, Halmans G (1993) Connectionist knowledge aquisition tool CONKAT. In: Hand J (ed) Artificial intelligence frontiers in statistics AI and statistics, vol. III, Chapman and Hall, London, pp. 256–263.

    Google Scholar 

  26. Giles C, Omlin C (1993) Extraction, insertion, and refinement of symbolic rules in dynamically driven recurrent networks. Connection Science 5:307–328.

    Article  Google Scholar 

  27. Giles C, Omlin C (1993) Rule refinement with recurrent neural networks. In: Proc. IEEE international conf. on neural networks, pp. 801–806.

    Google Scholar 

  28. McGarry K, Wermter S, MacIntyre J (1999) Knowledge extraction from radial basis function networks and multi layer perceptrons. In: Proc. international joint conf. on neural networks (Washington, DC), pp. 2494–2497.

    Google Scholar 

  29. Andrews R, Tickle AB, Golea M, Diederich J (1997) Rule extraction from trained artificial neural networks. In: Browne A (ed.) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 61–100.

    Google Scholar 

  30. Tickle, A, Maire, F, Bologna, G, Andrews, R, Diederich J (2000) Lessons from past, current issues, and future research directions in extracting knowledge embedded in artificial neural networks. In: Wermter S, Sun R (eds) Hybrid neural systems. Springer-Verlag, Berlin, pp. 226–239.

    Chapter  Google Scholar 

  31. Shavlik J (1994) Combining symbolic and neural learning. Machine Learning 14:321–331

    Google Scholar 

  32. Bologna G (2000) Rule extraction from a multilayer perceptron with staircase activation functions. In: Proc. international joint conf. on neural networks, Como, Italy, pp. 419–424.

    Google Scholar 

  33. Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384

    Article  CAS  PubMed  Google Scholar 

  34. Browne A (1998) Detecting systematic structure in distributed representations. Neural Networks 11:815–824.

    Article  PubMed  Google Scholar 

  35. Browne A, Picton P (1999) Two analysis techniques for feed-forward networks. Behaviormetrika 26:75–87.

    Article  Google Scholar 

  36. Hayashi Y (1991) A neural expert system with automated extraction of fuzzy if-then rules and its application to medical diagnosis. In: Lippmann R, Moody J, Touretzky D (eds) Advances in neural information processing systems, vol. 3. Morgan Kaufmann, San Mateo, CA.

    Google Scholar 

  37. Halgamuge S.K, Glesner M (1994) Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems 65:1–12.

    Article  Google Scholar 

  38. Carpenter G, Tan, A.H. (1995) Rule extraction: From neural architecture to symbolic representation. Connect. Sci 7:3–27.

    Article  Google Scholar 

  39. Mitra S, Hayashi Y (2000) Neuro-fuzzy rule generation: survey in a soft computing framework. IEEE Trans Neural Networks 11:748–768.

    Article  CAS  Google Scholar 

  40. Sun R, Peterson T (1998) Autonomous learning of sequential tasks: experiments and analyses. IEEE Trans Neural Networks 9:1217–1234.

    Article  CAS  Google Scholar 

  41. Towell G, Shavlik JW (1993) The extraction of refined rules from knowledge based neural networks. Machine Learning 31:71–101.

    Google Scholar 

  42. Fu L (1994) Rule generation from neural networks. IEEE Trans Systems, Man and Cybernetics, 24:1114–1124.

    Article  Google Scholar 

  43. Thrun SB (1994) Extracting provably correct rules from neural networks. Technical report IAI-TR-93–5, Institut fur Informatik III Universitat Bonn.

    Google Scholar 

  44. Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384.

    Article  CAS  PubMed  Google Scholar 

  45. Matlab. The Mathworks Inc., Natick, MA, www.mathworks.com/products/matlab.

  46. Nabney IT (2002) NETLAB: algorithms for pattern recognition. Springer, Heidelberg, www.ncrg.aston.ac.uk/netlab.

  47. Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–286.

    Article  Google Scholar 

  48. Vapnik V (1995) The nature of statistical learning theory. Springer, New York.

    Google Scholar 

  49. Browne A, Hudson BD, Whitley DC,Ford MG, Picton P (2004) Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic domain problems. Neurocomputing 57:275–293.

    Article  Google Scholar 

  50. Rupniak, NM, Kramer MS (1999) Discovery of the antidepressant and anti-emetic efficacy of substance P receptor (NK1) antagonists. Trends Pharmacol Sci 20:485–490.

    Article  CAS  PubMed  Google Scholar 

  51. Wang, J.X, DiPasquale, A.J, Bray, A.M, Maeji N.J, Geysen, H.M. (1993) Study of stereo-requirements of Substance P binding to NK1 receptors using analogues with systematic D-amino acid replacements. Biorg. Med. Chem. Lett., 3:451–456.

    Article  Google Scholar 

  52. Kulp D, Haussler D, Reese MG, Eeckman FH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 134–142.

    Google Scholar 

  53. Salzberg S, Chen X, Henderson J, Fasman K (1996) Finding genes in DNA using decision trees and dynamic programming. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 201–210.

    Google Scholar 

  54. Yada T, Hirosawa M (1996) Gene recognition in cyanobacterium genomic sequence data using the hidden Markov model. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 252–260.

    Google Scholar 

  55. Ying X, Uberbacher EC (1996) Gene prediction by pattern recognition and homology search. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 241–251.

    Google Scholar 

  56. Burset, M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34:353–367.

    Article  CAS  PubMed  Google Scholar 

  57. Thanaraj TA (1999) A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures. Nucleic Acids Res 27:2627–2637.

    Article  CAS  PubMed  Google Scholar 

  58. Thanaraj TA (2000) Positional characterization of false positives from computational prediction of human splice sites. Nucleic Acids Res 28:744–754.

    Article  CAS  PubMed  Google Scholar 

  59. Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comp Sci 41:1308–-1315.

    CAS  Google Scholar 

  60. Cerius-2. MSI Inc., San Leandro, CA.

    Google Scholar 

  61. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Res 46:3–26.

    Article  CAS  Google Scholar 

  62. Kiralj R, Ferreira MMC (2003) A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors, I. The chemometric approach J Mol Graph Mod 21:435–448.

    Article  CAS  Google Scholar 

  63. Young S, Sacks S (2000) Analysis of a large, high-throughput screening data using recursive partitioning. In: Gundertofte K, Jorgensen FS (eds) Molecular modelling and prediction of biological activity. Kluwer Academic/Plenum Press, New York, pp. 149–156.

    Google Scholar 

  64. Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network based technique for identifying poorly soluble compounds. J Chem Inf Comput Sci 43:674–679.

    CAS  PubMed  Google Scholar 

  65. Watson JD, Hopkins NH, Roberts JW, Argetsinger J, Weiner A (1987) Molecular biology of the gene (4th edn). Benjamin Cummings, Menlo Park, CA.

    Google Scholar 

  66. Sharkey AJC, Sharkey NE, Chandroth GO (1996) Neural nets and diversity. Neural Computing and Applications 4:218–227.

    Article  Google Scholar 

  67. Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recogn 7:705–719.

    Article  Google Scholar 

  68. Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227.

    Google Scholar 

  69. Breiman L (1996) Bagging predictors. Mach Learn 26:123–140.

    Google Scholar 

  70. Wolpert DH (1992) Stacked generalization. Neural Networks 5:241–259.

    Article  Google Scholar 

  71. Yang S, Browne A, Picton P (2002) Multistage neural network ensembles. In: Proc. 3rd international workshop on multiple classifier systems, lecture notes in computer science, vol. 2364. Springer, Heidelberg, pp. 91–97.

    Google Scholar 

Download references

Acknowledgement

This chapter is dedicated to our dear friend and colleague, Martyn, who passed away after a brave fight with cancer on June 7, 2007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Brian D. Hudson BSc, PhD .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Humana Press, a part of Springer Science + Business Media, LLC

About this protocol

Cite this protocol

Livingstone, D.J., Browne, A., Crichton, R., Hudson, B.D., Whitley, D., Ford, M.G. (2008). The Extraction of Information and Knowledge from Trained Neural Networks. In: Livingstone, D.J. (eds) Artificial Neural Networks. Methods in Molecular Biology™, vol 458. Humana Press. https://doi.org/10.1007/978-1-60327-101-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-60327-101-1_12

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-718-1

  • Online ISBN: 978-1-60327-101-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics