Abstract
In the past, neural networks were viewed as classification and regression systems whose internal representations were incomprehensible. It is now becoming apparent that algorithms can be designed that extract comprehensible representations from trained neural networks, enabling them to be used for data mining and knowledge discovery, that is, the discovery and explanation of previously unknown relationships present in data. This chapter reviews existing algorithms for extracting comprehensible representations from neural networks and outlines research to generalize and extend the capabilities of one of these algorithms, TREPAN. This algorithm has been generalized for application to bioinformatics data sets, including the prediction of splice site junctions in human DNA sequences, and cheminformatics. The results generated on these data sets are compared with those generated by a conventional data mining technique (C5) and appropriate conclusions are drawn.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo, CA.
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmont, CA.
Craven MW, Shavlik JW (1994) Using sampling and queries to extract rules from trained neural networks. In: Proc. of the 11th international conference on machine learning. Morgan Kaufmann, San Mateo, CA, pp. 37–45.
Bullinaria JA (1997) Analysing the internal representations of trained neural networks. In: Browne A (ed) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 3–26
Browne A (1997) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK.
Gallant SI (1998) Connectionist expert systems, Communications of the ACM 31:152–169.
Gallant SI, Hayashi Y (1990) A neural network expert system with confidence measurements. IPMU:562–567.
Saito K, Nakano R (1988) Medical diagnostic expert system based on PDP model. In: Proc. of IEEE international conf. on neural networks, pp. 255–262.
Shavlik J, Towell G (1989) An approach to combining explanation-based and neural learning algorithms, Connection Science 1:233–255.
Baba K, Enbutu I, Yoda M (1990) Explicit representation of knowledge acquired from plant historical data using neural networks. Neural Networks. 3:155–160.
Bochereau L, Boutgine P (1990) Extraction of semantic features and logical rules from multilayer neural networks. In: International joint conference on neural networks, Washington, DC, vol. 2, pp. 579–582.
Goh TH (1993) Semantic extraction using neural network modelling and sensitivity analysis. In Proc. international joint conf. on neural networks, Nagoya, Japan, pp.1031–1034.
McMillan C, Mozer M, Smolensky P (1993) Dynamic conflict resolution in a connectionist rule-based system. In: Proc. of the 13th IJCAI, pp.1366–1371.
Yeung D, Fong H (1994) Knowledge matrix: AN explanation and knowledge refinement facility for a rule induced neural network. In: Proc. 12th national conf. on artificial intelligence, vol. 2, pp. 889–894
Yoon B, Lacher R (1994) Extracting rules by destructive learning. In: Neural networks, 1994. IEEE world congress on computational intelligence, pp. 1766–1771.
Sethi I, Yoo J (1994) Symbolic approximation of feedforward networks. In: Gesema E, Kanal L (eds) Pattern recognition in practice, IV: multiple paradigms, comparative studies and hybrid systems. North-Holland, Amsterdam, pp. 313–324.
Fletcher G, Hinde C (1995) Using neural networks as a tool for constructive rule based architectures. Knowledge Based Systems 8:183–187.
Thrun SB (1995) Extracting rules from artificial neural networks with distributed representations. In: Tesauro G, Touretzky D, Leen T (eds) Advances in neural information processing systems MIT Press, San Mateo, CA, pp. 505–512.
Benitez J, Castro J, Requina JI (1997) Are artificial neural networks black boxes? IEEE Trans Neural Networks 8:1156–1164.
Taha I, Ghosh J (1997) Evaluating and ordering of rules extracted from feedforward networks. In: Proc. IEEE international conf. on neural networks, pp. 408–413.
Ampratwum CS, Picton PD, Browne A (1998) Rule extraction from neural network models of chemical species in optical emission spectra. In: Proc. workshop on recent advances in soft computing, pp. 53–64.
Maire F (1999) Rule extraction by backpropagation of polyhedrons. Neural Networks, 12:717–725.
Ishikawa M (2000) Rule extraction by successive regularization. Neural Networks 13:1171–183.
Setiono R (2000) Extracting m-of-n rules from trained neural networks. IEEE Trans Neural Networks 11:512–519.
Ultsch A, Mantyk R, Halmans G (1993) Connectionist knowledge aquisition tool CONKAT. In: Hand J (ed) Artificial intelligence frontiers in statistics AI and statistics, vol. III, Chapman and Hall, London, pp. 256–263.
Giles C, Omlin C (1993) Extraction, insertion, and refinement of symbolic rules in dynamically driven recurrent networks. Connection Science 5:307–328.
Giles C, Omlin C (1993) Rule refinement with recurrent neural networks. In: Proc. IEEE international conf. on neural networks, pp. 801–806.
McGarry K, Wermter S, MacIntyre J (1999) Knowledge extraction from radial basis function networks and multi layer perceptrons. In: Proc. international joint conf. on neural networks (Washington, DC), pp. 2494–2497.
Andrews R, Tickle AB, Golea M, Diederich J (1997) Rule extraction from trained artificial neural networks. In: Browne A (ed.) Neural network analysis, architectures and algorithms. Institute of Physics Press, Bristol, UK, pp. 61–100.
Tickle, A, Maire, F, Bologna, G, Andrews, R, Diederich J (2000) Lessons from past, current issues, and future research directions in extracting knowledge embedded in artificial neural networks. In: Wermter S, Sun R (eds) Hybrid neural systems. Springer-Verlag, Berlin, pp. 226–239.
Shavlik J (1994) Combining symbolic and neural learning. Machine Learning 14:321–331
Bologna G (2000) Rule extraction from a multilayer perceptron with staircase activation functions. In: Proc. international joint conf. on neural networks, Como, Italy, pp. 419–424.
Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384
Browne A (1998) Detecting systematic structure in distributed representations. Neural Networks 11:815–824.
Browne A, Picton P (1999) Two analysis techniques for feed-forward networks. Behaviormetrika 26:75–87.
Hayashi Y (1991) A neural expert system with automated extraction of fuzzy if-then rules and its application to medical diagnosis. In: Lippmann R, Moody J, Touretzky D (eds) Advances in neural information processing systems, vol. 3. Morgan Kaufmann, San Mateo, CA.
Halgamuge S.K, Glesner M (1994) Neural networks in designing fuzzy systems for real world applications. Fuzzy Sets and Systems 65:1–12.
Carpenter G, Tan, A.H. (1995) Rule extraction: From neural architecture to symbolic representation. Connect. Sci 7:3–27.
Mitra S, Hayashi Y (2000) Neuro-fuzzy rule generation: survey in a soft computing framework. IEEE Trans Neural Networks 11:748–768.
Sun R, Peterson T (1998) Autonomous learning of sequential tasks: experiments and analyses. IEEE Trans Neural Networks 9:1217–1234.
Towell G, Shavlik JW (1993) The extraction of refined rules from knowledge based neural networks. Machine Learning 31:71–101.
Fu L (1994) Rule generation from neural networks. IEEE Trans Systems, Man and Cybernetics, 24:1114–1124.
Thrun SB (1994) Extracting provably correct rules from neural networks. Technical report IAI-TR-93–5, Institut fur Informatik III Universitat Bonn.
Craven MW, Shavlik JW (1997) Understanding time series networks. Int J Neural Syst 8:373–384.
Matlab. The Mathworks Inc., Natick, MA, www.mathworks.com/products/matlab.
Nabney IT (2002) NETLAB: algorithms for pattern recognition. Springer, Heidelberg, www.ncrg.aston.ac.uk/netlab.
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–286.
Vapnik V (1995) The nature of statistical learning theory. Springer, New York.
Browne A, Hudson BD, Whitley DC,Ford MG, Picton P (2004) Biological data mining with neural networks: implementation and application of a flexible decision tree extraction algorithm to genomic domain problems. Neurocomputing 57:275–293.
Rupniak, NM, Kramer MS (1999) Discovery of the antidepressant and anti-emetic efficacy of substance P receptor (NK1) antagonists. Trends Pharmacol Sci 20:485–490.
Wang, J.X, DiPasquale, A.J, Bray, A.M, Maeji N.J, Geysen, H.M. (1993) Study of stereo-requirements of Substance P binding to NK1 receptors using analogues with systematic D-amino acid replacements. Biorg. Med. Chem. Lett., 3:451–456.
Kulp D, Haussler D, Reese MG, Eeckman FH (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 134–142.
Salzberg S, Chen X, Henderson J, Fasman K (1996) Finding genes in DNA using decision trees and dynamic programming. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 201–210.
Yada T, Hirosawa M (1996) Gene recognition in cyanobacterium genomic sequence data using the hidden Markov model. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 252–260.
Ying X, Uberbacher EC (1996) Gene prediction by pattern recognition and homology search. In: Proc. ISMB-96. AAAI/MIT Press, St. Louis, pp. 241–251.
Burset, M, Guigo R (1996) Evaluation of gene structure prediction programs. Genomics 34:353–367.
Thanaraj TA (1999) A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures. Nucleic Acids Res 27:2627–2637.
Thanaraj TA (2000) Positional characterization of false positives from computational prediction of human splice sites. Nucleic Acids Res 28:744–754.
Oprea TI, Davis AM, Teague SJ, Leeson PD (2001) Is there a difference between leads and drugs? A historical perspective. J Chem Inf Comp Sci 41:1308–-1315.
Cerius-2. MSI Inc., San Leandro, CA.
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2001) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Res 46:3–26.
Kiralj R, Ferreira MMC (2003) A priori molecular descriptors in QSAR: a case of HIV-1 protease inhibitors, I. The chemometric approach J Mol Graph Mod 21:435–448.
Young S, Sacks S (2000) Analysis of a large, high-throughput screening data using recursive partitioning. In: Gundertofte K, Jorgensen FS (eds) Molecular modelling and prediction of biological activity. Kluwer Academic/Plenum Press, New York, pp. 149–156.
Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ, Whitley DC, Pitt WR (2003) A consensus neural network based technique for identifying poorly soluble compounds. J Chem Inf Comput Sci 43:674–679.
Watson JD, Hopkins NH, Roberts JW, Argetsinger J, Weiner A (1987) Molecular biology of the gene (4th edn). Benjamin Cummings, Menlo Park, CA.
Sharkey AJC, Sharkey NE, Chandroth GO (1996) Neural nets and diversity. Neural Computing and Applications 4:218–227.
Drucker H, Schapire R, Simard P (1993) Boosting performance in neural networks. Int J Pattern Recogn 7:705–719.
Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227.
Breiman L (1996) Bagging predictors. Mach Learn 26:123–140.
Wolpert DH (1992) Stacked generalization. Neural Networks 5:241–259.
Yang S, Browne A, Picton P (2002) Multistage neural network ensembles. In: Proc. 3rd international workshop on multiple classifier systems, lecture notes in computer science, vol. 2364. Springer, Heidelberg, pp. 91–97.
Acknowledgement
This chapter is dedicated to our dear friend and colleague, Martyn, who passed away after a brave fight with cancer on June 7, 2007.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Humana Press, a part of Springer Science + Business Media, LLC
About this protocol
Cite this protocol
Livingstone, D.J., Browne, A., Crichton, R., Hudson, B.D., Whitley, D., Ford, M.G. (2008). The Extraction of Information and Knowledge from Trained Neural Networks. In: Livingstone, D.J. (eds) Artificial Neural Networks. Methods in Molecular Biology™, vol 458. Humana Press. https://doi.org/10.1007/978-1-60327-101-1_12
Download citation
DOI: https://doi.org/10.1007/978-1-60327-101-1_12
Publisher Name: Humana Press
Print ISBN: 978-1-58829-718-1
Online ISBN: 978-1-60327-101-1
eBook Packages: Springer Protocols