Skip to main content

Statistically Based Pattern Discovery Techniques for Biological Data Analysis

  • Chapter
Applications of Computational Intelligence in Biology

Part of the book series: Studies in Computational Intelligence ((SCI,volume 122))

Summary

A statistically based pattern discovery tool is presented that produces a rule-based description of complex data through the set of its statistically significant associations. The rules resulting from this analysis capture all the patterns observable within a data set for which a statistically sound rationale is available. The validity of such patterns recommends their use in cases where the rationale underlying a decision must be understood. High-risk decision making systems, a milieu familiar to many biologically-related problem domains, is the likely area of application for this technique. An analysis of the performance of this technique on a series of biologically relevant data distributions is presented, and the relative merits and weaknesses of this technique are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Srikant R (1994) Fast Algorithms for Mining Association Rules, In: Proc. 20th Int. Conf. Very Large Data Bases, 487–499. Morgan Kaufmann, Santiago, Chile

    Google Scholar 

  2. Srikant R, Agrawal R (1997) Mining generalized association rules, Fut Gen Comp Sys 13(2–3):161–180

    Article  Google Scholar 

  3. Ma X, Wang W, Sun Y. (2003) Associative Classifier Modeling Method Based on Rough Set Theory and Factor Analysis Technology, In: Proc. Int. Conf. Sys., Man and Cyb., vol. 3, 2412–2417. SMC ’03

    Google Scholar 

  4. Do TD, Hui SC, Fong AC (2005) Prediction Confidence for Associative Classification, In: Proc. 4th Int. Conf Mach. Learn. Cybern., vol. 4, 1993–1997. IEEE, Guangzhou

    Chapter  Google Scholar 

  5. Sun Y, Wong AKC, Wang Y (2006) An overview of associative classfiers. Tech. rep., University of Waterloo, Canada

    Google Scholar 

  6. Sun Y, Wang Y, Wong AKC (2006) Boosting an associative classfier, IEEE Trans Knowledge Data Eng 18(7):988–992

    Article  Google Scholar 

  7. Wang K, Zhou S, He Y (2000) Growing Decision Trees On Support-Less Association Rules, In: Proc. 6th ACM SIGKDD Int. Conf. on Know. Disc. Data Mining, 265–269. SIGKDD’00, Boston

    Google Scholar 

  8. Veloso A, Meira Jr W, Zaki MJ (2006) Lazy Associative Classification, In: Proc. 6th Int. Conf. Data Mining, 645–654. ICDM’06, Hong Kong

    Google Scholar 

  9. Wong AKC, Wang Y (1997) High-order pattern discovery from discrete-valued data, IEEE Trans Knowledge Data Eng 9(6):877–893

    Article  Google Scholar 

  10. Wang Y, Wong AKC (2003) From association to classification: Inference using weight of evidence, IEEE Trans Knowledge Data Eng 15(3):764–767

    Article  MathSciNet  Google Scholar 

  11. Wong AKC, Wang Y (2003) Pattern discovery: A data driven approach to decision support, IEEE Trans Syst, Man, Cybern C 33(1):114–124

    Article  MathSciNet  Google Scholar 

  12. Wang Y (1997) High Order Pattern Discovery and Analysis of Discrete-Valued Data Sets. Ph.D. Thesis, Systems Design Engineering, University of Waterloo

    Google Scholar 

  13. Alhammady H, Ramamohanarao K (2006) Using emerging patterns to construct weighted decision trees, IEEE Trans Knowledge Data Eng 18(7):865–876

    Article  Google Scholar 

  14. Yin X, Han J (2003) CPAR: Classification based on Predictive Association Rules, In: 3rd SIAM Int. Conf. Data Mining. SDM’03, San Francisco

    Google Scholar 

  15. Quinlan JR. (1993) C4.5 : Programs for Machine Learning. Morgan Kaufman

    Google Scholar 

  16. Quinlan JR (1996) Learning first-order definitions of functions, Journal of Artificial Intelligence Research 5:139–161

    MATH  Google Scholar 

  17. Yager RR, Filev DP (1996) Relational partitioning of fuzzy rules, Fuzzy Sets & Sys 80:57–69

    Article  MathSciNet  Google Scholar 

  18. Chong A, Gedon TD, Wong KW, Koczy LT (2001) A Histogram-Based Rule Extraction Technique for Fuzzy Systems, In: Fuzzy Systems FUZZ-IEEE’01, 638–641. FUZZ-IEEE ’01, Melbourne, Australia

    Google Scholar 

  19. Hong TP, Lee CY (1996) Induction of fuzzy rules and membership function from training examples, Fuzzy Sets & Sys 84:33–47

    Article  MATH  MathSciNet  Google Scholar 

  20. Wang XZ, Wang YD, Xu XF, Ling WD, Yeung DS (2001) A new approach to fuzzy rule generation: Fuzzy extension matrix, Fuzzy Sets & Sys 123:291–306

    Article  MATH  MathSciNet  Google Scholar 

  21. Xing H, Huang SH, Shi J (2003) Rapid development of knowledge-based systems via integrated knowledge acquisition, Artificial Intelligence for Engineering Design, Analysis and Manufacturing 17:221–234

    Article  Google Scholar 

  22. Spiegel D, Sudkamp T (2003) Sparse data in the evolutionary generation of fuzzy models, Fuzzy Sets & Sys 138:363–379

    Article  MathSciNet  Google Scholar 

  23. Hoffmann F (2004) Combined boosting and evolutionary algorithms for learning of fuzzy classification rules, Fuzzy Sets & Sys 141:47–58

    Article  MATH  Google Scholar 

  24. Ishibuchi H, Nozaki K, Yamamoto N, Tanaka H (1994) Construction of fuzzy classification systems with rectangular fuzzy rules using genetic algorithms, Fuzzy Sets & Sys 65(2/3):237–253

    Article  MathSciNet  Google Scholar 

  25. Cord’on O, Herrera F, Hoffmann F, Magkalena L (2001) Genetic Fuzzy Systems: Evolutionary Tuning and Learning of Fuzzy Knowledge Bases, chap. 11, 375– 382. World Scientific, Singapore

    Google Scholar 

  26. Ghosh A, Pal NR, Pal SK (1993) Self-organization for object extraction using a multilayer neural network and fuzziness measures, IEEE Trans Fuzzy Syst 1(1):54–68

    Article  Google Scholar 

  27. Mitra S, Hayashi Y (2000) Neuro-fuzzy rule generation: Survey in soft computing framework, IEEE Trans Neural Networks 11(3):748–768

    Article  Google Scholar 

  28. Labbi A, Gauthier E. (1997) Combining fuzzy knowledge and data for neuro-fuzzy modeling, J of Intell Sys 6(4)

    Google Scholar 

  29. Pal SK, Mitra S. (1999) Neuro-Fuzzy Pattern Recognition : Methods in Soft Computing. Wiley-Interscience

    Google Scholar 

  30. Pedrycz W. (1995) Fuzzy Sets Engineering. CRC Press

    Google Scholar 

  31. Kruse R, Gebhardt JE, Klawonn F (1994) Foundations of Fuzzy Systems. John Wiley & Sons, New York.

    Google Scholar 

  32. Nauck D, Klawonn F, Kruse R (1997) Foundations of Neuro-Fuzzy Systems. John Wiley & Sons, New York

    Google Scholar 

  33. Höppner F, Klawonn F, Kruse R, Runkler TA (1999) Fuzzy Cluster Analysis. Chichester, England

    MATH  Google Scholar 

  34. Nauck D, Kruse R (1996) Designing Neuro-Fuzzy Systems Through Backprop-agation, 203–228. Kluwer Academic Publishers, Boston, Dordrecht, London

    Google Scholar 

  35. Nauck D, Kruse R (1997) A neuro-fuzzy method to learn fuzzy classification rules from data, Fuzzy Sets & Sys 89(3):277–288

    Article  MathSciNet  Google Scholar 

  36. Gabrys B (2004) Learning hybrid neuro-fuzzy classifier models from data: To combine or not to combine?, Fuzzy Sets & Sys 147(1):39–56

    Article  MATH  MathSciNet  Google Scholar 

  37. Song Q, Kasabov NK (2005) NFI: A neuro-fuzzy inference method for trans-ductive reasoning, IEEE Trans Fuzzy Syst 13(6):799–808

    Article  Google Scholar 

  38. Wang JS, Lee CSG (2002) Self-adaptive neuro-fuzzy inference systems for classification applications, IEEE Trans Fuzzy Syst 10(6):790–802

    Article  Google Scholar 

  39. Shen Q, Chouchoulas A (2002) A rough-fuzzy approach for generating classification rules, Pat Rec 35:2425–2438

    Article  MATH  Google Scholar 

  40. Tsumoto S. (2002) Statistical Evidence for Rough Set Analysis, In: FUZZ-IEEE ’02 [77], 757–762. Published in [78]

    Google Scholar 

  41. Ziarko W. (2002) Acquisition of Hierarchy-Structured Probabalistic Decision Tables and Rules from Data, In: FUZZ-IEEE ’02 [77], 779–784. Published in [78]

    Google Scholar 

  42. Bean CL, Kambhampati C, Rajasekharan S. (2002) A Rough Set Solution to a Fuzzy Set Problem, In: FUZZ-IEEE ’02 [77], 18–23. Published in [78]

    Google Scholar 

  43. Kukolj D (2002) Design of adaptive Takagi-Sugeno-Kang fuzzy models, Applied Soft Computing 2:89–103

    Article  Google Scholar 

  44. Chen L, Tokuda N, Zhang X, He Y (2001) A new scheme for an automatic generation of multi-variable fuzzy systems, Fuzzy Sets & Sys 120:323–329

    Article  MATH  MathSciNet  Google Scholar 

  45. Chen MY. (2002) Establishing Interpretable Fuzzy Models from Numeric Data, In: Proceedings of the 4th World Congress on Intelligent Control and Automation, vol. 3, 1857–1861. IEEE

    Google Scholar 

  46. Hamilton-Wright A (2005) Transparent Decision Support Using Statistical Evidence. Ph.D. Thesis, Systems Design Engineering, University of Waterloo

    Google Scholar 

  47. Hamilton-Wright A, Stashuk DW. (2005) Comparing ‘Pattern Discovery’ and Back-Propagation Classifiers, In: Proc. of the Int. J. Conf. Neural Networks. (IJCNN), vol. 2, 1286–1291. IJCNN ’05, Montréal, Québec

    Google Scholar 

  48. Hamilton-Wright A, Stashuk DW (2006) Transparent decision support using statistical reasoning and fuzzy inference, IEEE Trans Knowledge Data Eng 18(8):1125–1137

    Article  Google Scholar 

  49. Hamilton-Wright A, Stashuk DW, Tizhoosh HR (2007) Fuzzy classification using pattern discovery, IEEE Trans Fuzzy Syst 15(5):772–783

    Article  Google Scholar 

  50. Hamilton-Wright A, Stashuk DW (2006) Clinical Characterization of Elec-tromyographic Data Using Computational Tools, In: Proc. Symp. Comp. Intel. in Bioinf. & Comp. Biol. CIBCB, Toronto

    Google Scholar 

  51. Hamilton-Wright A, Stashuk DW (2006) Clinical Decision Support By Fuzzy Logic Analysis of Quantitative Electromyographic Data, In: Proc. XVIth Int. Soc. of Electromyog. and Kinesiol. ISEK ’06, Torino, Italy

    Google Scholar 

  52. Bishop CM. (1995) Neural Networks for Pattern Recognition. Oxford

    Google Scholar 

  53. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors, Nature 323:533–536

    Article  Google Scholar 

  54. Minsky ML, Papert SA. (1988) Perceptrons : An Introduction to Computational Geometry. MIT Press, 2nd edn.

    Google Scholar 

  55. Simpson PK (1991) Artificial Neural Systems. Windcrest/McGraw-Hill

    Google Scholar 

  56. Hertz J, Krogh A, Palmer RG. (1991) Introduction to the Theory of Neural Computation. Santa Fe Institute Studies in the Sciences of Complexity

    Google Scholar 

  57. Cheeseman P (1985) In defense of probability, In: Proc. Ninth Int. Conf. A.I. (IJCAI-85), 1002–1009. Morgan Kaufmann, Santiago, Chile

    Google Scholar 

  58. Cheeseman P, Self M, Kelly J, Stutz J (1988) Bayesian classification, In: Proc. Seventh Nat. Conf. A.I. (AAAI-88), vol. 2, 607–611. Morgan Kaufmann, St. Paul, Minnesota

    Google Scholar 

  59. Cheeseman P, Kelly J, Self M, Stutz J, Taylor W, Freeman D (1993) Readings in Knowledge Acquisition and Learning: Automating the Construction and Improvement of Expert Systems, chap. AutoClass: a Bayesian classification system, 431–441. Morgan Kaufmann, San Mateo, California

    Google Scholar 

  60. Duda RO, Hart PE, Stork DG. (2001) Pattern Classification. John Wiley & Sons, 2nd edn.

    Google Scholar 

  61. Haberman SJ (1973) The analysis of residuals in cross-classified tables, Biometrics 29(1):205–220

    Article  Google Scholar 

  62. Haberman SJ (1979) Analysis of Qualitative Data, vol. 1 of Springer Series in Statistics, 78–79,82–83. Academic Press, Toronto

    Google Scholar 

  63. Antonie ML, Zaï ane OR. (2004) Knowledge Discovery In Databases, vol. 3202 of Lecture Notes in Computer Science, chap. Mining Positive and Negative Association Rules: An Approach for Confined Rules, 27–38. Springer

    Google Scholar 

  64. Wu X, Zhang C, Zhang S (2002) Mining Both Positive and Negative Association Rules, In: Proc. 19th Int. Conf. on Mach. Learn., 658–665. ICML ’02, Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  65. Hamilton-Wright A, Stashuk DW. (2006) Fuzzy Rule Based Decision Making For Electromyographic Characterization, In: IPMU ’06 [79]

    Google Scholar 

  66. Hamilton-Wright A, Stashuk DW, Pino L. (2006) On Weight Of Evidence Based Reliability In ‘Pattern Discovery’, In: IPMU ’06 [79]

    Google Scholar 

  67. Hamilton-Wright A, Stashuk DW, Pino L. (2006) Internal Measures of Reliability in ‘Pattern Discovery’ Based Fuzzy Inference, In: IPMU ’06 [79]

    Google Scholar 

  68. Gokhale DV (1999) On joint and conditional entropies, Entropy 1(2):21–24

    Article  MATH  MathSciNet  Google Scholar 

  69. Chau T (2001) Marginal maximum entropy partitioning yields asymptotically consistent probability density functions, IEEE Trans Pattern Anal Machine In-tell 23(4):414–417

    Article  Google Scholar 

  70. Bryson N, Joseph A (2001) Optimal techniques for class-dependent attribute discretization, J Op Res Soc 52(10):1130–1143

    Article  MATH  Google Scholar 

  71. Ching JY, Wong AKC, Chan KCC (1995) Class-dependent discretization for inductive learning from continuous and mixed-mode data, IEEE Trans Pattern Anal Machine Intell 17(7):641–651

    Article  Google Scholar 

  72. Liu L, Wong AKC, Wang Y (2004) A global optimal algorithm for class-dependent discretization of continuous data, J Int Data Anal 8(2):151–170

    Google Scholar 

  73. Bellman R (1961) Adaptive Control Processes: A Guided Tour. Princeton University Press, New Jersey

    MATH  Google Scholar 

  74. Guyon I, Elisseef A (2003) An introduction to variable and feature selection, J Mach Learn Research 3(7/8):1157–1182

    Article  MATH  Google Scholar 

  75. Ye J, Janardan R, Li Q, Park H (2006) Feature reduction via generalized uncorrelated linear discriminant analysis, IEEE Trans Knowledge Data Eng 18(10):1312–1322

    Article  Google Scholar 

  76. Newman DJ, Hettich S, Blake CL, Merz CJ. (1998). UCI repository of machine learning databases URL http://www.ics.uci.edu/~mlearn/MLRepository.htm

  77. FUZZ-IEEE ’02. (2002) Proc. 11th IEEE Int. Conf. Fuzzy Sys., FUZZ-IEEE’02. Published in [78]

    Google Scholar 

  78. WCCI ’02. (2002) Proc. 2002 World Congr. Comp. Int. WCCI 2002. IEEE Press, Honolulu

    Google Scholar 

  79. IPMU ’06. (2006) Proc. 11th Int. Conf. on Info. Proc. and Mgt. of Uncertainty

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hamilton-Wright, A., Stashuk, D.W. (2008). Statistically Based Pattern Discovery Techniques for Biological Data Analysis. In: Smolinski, T.G., Milanova, M.G., Hassanien, AE. (eds) Applications of Computational Intelligence in Biology. Studies in Computational Intelligence, vol 122. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78534-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78534-7_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78533-0

  • Online ISBN: 978-3-540-78534-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics