The Greedy Prepend Algorithm for Decision List Induction

Yuret, Deniz; de la Maza, Michael

doi:10.1007/11902140_6

The Greedy Prepend Algorithm for Decision List Induction

Deniz Yuret²⁰ &
Michael de la Maza²¹

Conference paper

1086 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4263))

Abstract

We describe a new decision list induction algorithm called the Greedy Prepend Algorithm (GPA). GPA improves on other decision list algorithms by introducing a new objective function for rule selection and a set of novel search algorithms that allow application to large scale real world problems. GPA achieves state-of-the-art classification accuracy on the protein secondary structure prediction problem in bioinformatics and the English part of speech tagging problem in computational linguistics. For both domains GPA produces a rule set that human experts find easy to interpret, a marked advantage in decision support environments. In addition, we compare GPA to other decision list induction algorithms as well as support vector machines, C4.5, naive Bayes, and a nearest neighbor method on a number of standard data sets from the UCI machine learning repository.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rivest, R.L.: Learning decision lists. Machine Learning 2, 229–246 (1987)
MathSciNet Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Google Scholar
Webb, G.I.: Recent progress in learning decision lists by prepending inferred rules. In: Proceedings of the Second Singapore International Conference on Intelligent Systems (SPICIS 1994), Singapore, pp. B280–B285 (1994)
Google Scholar
Newlands, D., Webb, G.I.: Alternative strategies for decision list construction. In: Proceedings of the Fourth Data Mining Conference (DM IV 2003), pp. 265–273 (2004)
Google Scholar
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)
Chapter Google Scholar
Webb, G.I.: Opus: An efficient admissible algorithm for unordered search. JAIR 3, 431–465 (1995)
MATH Google Scholar
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Workshop on Massive Datasets, Washington, DC, NRC, Committee on Applied and Theoretical Statistics (1993)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta sheet and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)
Article Google Scholar
Levin, J.M., Pascarella, S., Argos, P., Garnier, J.: Quantification of secondary structure prediction improvement using multiple alignment. Prot. Engin. 6, 849–854 (1993)
Article Google Scholar
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Article Google Scholar
Huang, J.T., Wang, M.T.: Secondary structural wobble: The limits of protein prediction accuracy. Biochemical and Biophysical Research Communications 294(3), 621–625 (2002)
Article Google Scholar
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Genetics 34, 508–519 (1999)
Article Google Scholar
King, R.D., Sternberg, M.J.E.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 5, 2298–2310 (1996)
Article Google Scholar
Frishman, D., Argos, P.: Seventy-five percent accuracy in protein secondary structure prediction. Proteins: Structure, Function, and Genetics 27, 329–335 (1997)
Article Google Scholar
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247, 11–15 (1995)
Article Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Google Scholar
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L.: Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics 19(2), 359–382 (1993)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Google Scholar
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (1996)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

Koç University, Istanbul, Turkey
Deniz Yuret
Park Hudson Finance, Cambridge, MA, 02139, USA
Michael de la Maza

Authors

Deniz Yuret
View author publications
You can also search for this author in PubMed Google Scholar
Michael de la Maza
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Sabanci University, 34956, Istanbul, Turkey
Albert Levi
Sabancı University, Istanbul, Turkey
Erkay Savaş & Selim Balcısoy &
Faculty of Engineering and Natural Sciences, Sabancı University, 34956, Tuzla, Istanbul, Turkey
Hüsnü Yenigün
Faculty of Engineering & Natural Sciences, Sabanci University, Istanbul, Turkey
Yücel Saygın

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuret, D., de la Maza, M. (2006). The Greedy Prepend Algorithm for Decision List Induction. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds) Computer and Information Sciences – ISCIS 2006. ISCIS 2006. Lecture Notes in Computer Science, vol 4263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11902140_6

Download citation

DOI: https://doi.org/10.1007/11902140_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47242-1
Online ISBN: 978-3-540-47243-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics