Abstract
We describe a new decision list induction algorithm called the Greedy Prepend Algorithm (GPA). GPA improves on other decision list algorithms by introducing a new objective function for rule selection and a set of novel search algorithms that allow application to large scale real world problems. GPA achieves state-of-the-art classification accuracy on the protein secondary structure prediction problem in bioinformatics and the English part of speech tagging problem in computational linguistics. For both domains GPA produces a rule set that human experts find easy to interpret, a marked advantage in decision support environments. In addition, we compare GPA to other decision list induction algorithms as well as support vector machines, C4.5, naive Bayes, and a nearest neighbor method on a number of standard data sets from the UCI machine learning repository.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Rivest, R.L.: Learning decision lists. Machine Learning 2, 229–246 (1987)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
Webb, G.I.: Recent progress in learning decision lists by prepending inferred rules. In: Proceedings of the Second Singapore International Conference on Intelligent Systems (SPICIS 1994), Singapore, pp. B280–B285 (1994)
Newlands, D., Webb, G.I.: Alternative strategies for decision list construction. In: Proceedings of the Fourth Data Mining Conference (DM IV 2003), pp. 265–273 (2004)
Clark, P., Boswell, R.: Rule induction with CN2: Some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991)
Webb, G.I.: Opus: An efficient admissible algorithm for unordered search. JAIR 3, 431–465 (1995)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the Workshop on Massive Datasets, Washington, DC, NRC, Committee on Applied and Theoretical Statistics (1993)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Chou, P.Y., Fasman, G.D.: Conformational parameters for amino acids in helical, beta sheet and random coil regions calculated from proteins. Biochemistry 13(2), 211–222 (1974)
Levin, J.M., Pascarella, S., Argos, P., Garnier, J.: Quantification of secondary structure prediction improvement using multiple alignment. Prot. Engin. 6, 849–854 (1993)
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Huang, J.T., Wang, M.T.: Secondary structural wobble: The limits of protein prediction accuracy. Biochemical and Biophysical Research Communications 294(3), 621–625 (2002)
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Genetics 34, 508–519 (1999)
King, R.D., Sternberg, M.J.E.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 5, 2298–2310 (1996)
Frishman, D., Argos, P.: Seventy-five percent accuracy in protein secondary structure prediction. Proteins: Structure, Function, and Genetics 27, 329–335 (1997)
Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. Journal of Molecular Biology 247, 11–15 (1995)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Weischedel, R., Meteer, M., Schwartz, R., Ramshaw, L.: Coping with ambiguity and unknown words through probabilistic models. Computational Linguistics 19(2), 359–382 (1993)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Computational Linguistics 21(4), 543–565 (1995)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (1996)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yuret, D., de la Maza, M. (2006). The Greedy Prepend Algorithm for Decision List Induction. In: Levi, A., Savaş, E., Yenigün, H., Balcısoy, S., Saygın, Y. (eds) Computer and Information Sciences – ISCIS 2006. ISCIS 2006. Lecture Notes in Computer Science, vol 4263. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11902140_6
Download citation
DOI: https://doi.org/10.1007/11902140_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-47242-1
Online ISBN: 978-3-540-47243-8
eBook Packages: Computer ScienceComputer Science (R0)