Applying classification algorithms in practice

Brodley, Carla E.; Smyth, Padhraic

doi:10.1023/A:1018557312521

Applying classification algorithms in practice

Published: March 1997

Volume 7, pages 45–56, (1997)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Carla E. Brodley¹ &
Padhraic Smyth²

414 Accesses
25 Citations
Explore all metrics

Abstract

In this paper we present a perspective on the overall process of developing classifiers for real-world classification problems. Specifically, we identify, categorize and discuss the various problem-specific factors that influence the development process. Illustrative examples are provided to demonstrate the iterative nature of the process of applying classification algorithms in practice. In addition, we present a case study of a large scale classification application using the process framework described, providing an end-to-end example of the iterative nature of the application process. The paper concludes that the process of developing classification applications for operational use involves many factors not normally considered in the typical discussion of classification models and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ardanuy, P. E., Han, D. and Salomonson, V. V. (1991) The moderate resolution imaging spectrometer (MODIS) science and data system requirement. IEEE Transactions on Geoscience and Remote Sensing, 29, 75–88.
Google Scholar
Belesley, D. A. (1986) Model selection in regression analysis, regression diagnostics and prior knowledge. International Journal of Forecasting, 2, 41–6.
Google Scholar
Bourlard, H. A. and Morgan, N. (1994) Connectionist Speech Recognition: A Hybrid Approach. Boston, MA: Kluwer Academic Publishers.
Google Scholar
Box, D. R. (1990) Role of models in statistical analysis. Statistical Science, 5, 169–74.
Google Scholar
Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984) Classification and Regression Trees. Belmont, CA: Wadsworth International Group.
Google Scholar
Brodley, C. E. (1995) Recursive automatic bias selection for classifier construction. Machine Learning, 20, 63–94.
Google Scholar
Buntine, W. and Smyth, P. (1994) Learning from data: A probabilistic framework. Tutorial notes for AAAI-94 conference. Menlo Park, CA: AAAI.
Google Scholar
Buntine, W. (1994) Operations for learning with graphical models. Journal of Artificial Intelligence Research, 2, 159–225.
Google Scholar
Burl, M. C., Fayyad, U. M., Perona, P., Smyth, P. and Burl, M. P. (1994) Automating the hunt for volcanoes on Venus. Proceedings of the 1994 Computer Vision and Pattern Recognition Conference (CVPR-94) pp. 302–309. Los Alamitos, CA: IEEE Computer Society Press.
Google Scholar
Cheeseman, P. (1990) On finding the most probable model. In Shrager and Langley (eds), Computational Models of Scientific Discovery and Theory Formation. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Dawid, A. P. (1976) Properties of diagnostic data distributions. Biometrics, 32, 647–58.
Google Scholar
Draper, B. A., Brodley, C. E. and Utgoff, P. E. (1994) Goal-directed classification using linear machine decision trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16, 888–93.
Google Scholar
Evans, B. and Fisher, D. (1994) Overcoming process delays with decision tree induction. IEEE Expert, 9, 60–6.
Google Scholar
Fayyad, U. M., Smyth, P., Weir, N. and Djorgovski, S. (1995) Automated analysis and exploration of large image databases. Journal of Intelligent Information Systems, 4, 7–25.
Google Scholar
Fayyad, U. M., Piatetsky-Shapiro, G. and Smyth, P. (1996a) From data-mining to knowledge discovery: An overview. In Fayyad, Piatetsky-Shapiro, Smyth and Uthurasamy (eds), Advances in Knowledge Discovery and Data Mining. AAAI/ MIT Press, 1–36.
Fayyad, U. M., Djorgovski, S. G. and Weir, N. (1996b) Automating the analysis and cataloging of sky surveys. In Fayyad, Piatetsky-Shapiro, Smyth and Uthurasamy (eds), Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 471–94.
Fung, W. K. (1995) Diagnostics in linear discriminant analysis. Journal of American Statistics Association, 90, 952–6.
Google Scholar
Gelman, A., Carlin, J. B., Stern, H. and Rubin, D. (1995) Bayesian Data Analysis. New York, NY: Chapman and Hall.
Google Scholar
Hand, D. J. (1993) Artificial Intelligence Frontiers in Statistics: AI and Statistics III. London, UK: Chapman and Hall.
Google Scholar
Hand, D. J. (1994a) Statistical strategy: Step 1. In Cheeseman and Oldford (eds), Selecting Models from Data: Artificial Intelligence and Statistics IV. New York: Springer-Verlag.
Google Scholar
Hand, D. J. (1994b) Deconstructing statistical questions. Journal of the Royal Statistical Society, Series A, 157, 317–56.
Google Scholar
Hastie, T. and Tibshirani, R. (1995) Discriminant adaptive nearest neighbor classification. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. Montreal, Quebec: AAAI Press, 142–49.
Google Scholar
Kodratoff, Y. (1994) Guest editorial. AI Communications, 7.
Landgrebe, D; and Biehl, L. (1994) An Introduction to Multispec. West Lafayette, IN: Purdue Research Foundation.
Google Scholar
Langley, P. and Simon, H. A. (1995) Applications of machine learning and rule induction. Communications of the ACM, 38, 55–64.
Google Scholar
Lee, K. F. (1989) Automatic Speech Recognition: The Development of the Sphinx System. Boston, MA: Kluwer Academic Publishers.
Google Scholar
Lehmann, E. L. (1990) Model specification: The views of Fisher and Neyman, and later developments. Statistical Science, 5, 160–8.
Google Scholar
Linhart, H. and Zucchini, W. (1986) Model Selection. NY: Wiley.
Google Scholar
Matthies, L. (1992) Stereo vision for planetary rovers-stochastic modeling to near real-time implementation. The International Journal of Computer Vision, 8, 71–91.
Google Scholar
Michie, D. (1989) Problems of computer-aided concept formation. In Quinlan (ed.), Applications of Expert Systems. Wokingham, UK: Addison-Wesley.
Google Scholar
Nakhaeizadeh, G. (1995) What Daimler-Benz has learned as an industrial partner from the machine learning project StatLog. Working Notes of: Workshop on Applying Machine Learning in Practice: Twelfth International Machine Learning Conference pp. 22–6. Available at http://www.aic.nrl.navy. mil/aha/imlc95-workshop/notes.html.
Petsche, T., Marcantonio, A., Darken, C., Hanson, S. J., Kuhn, G. M. and Santoso, I. (in press) A neural network autoassociator for induction motor failure prediction. In Touretzky, Mozer and Hasselmo (eds), Advances in Neural Information Processing Systems 8, MIT Press.
Pettit, L. I. (1986) Diagnostics in Bayesian model choice. The Statistician, 35, 183–90.
Google Scholar
Quinlan, J. R. (1986) Induction of decision trees. Machine Learning, 1, 81–106.
Google Scholar
Quinlan, J. R. (1993) C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Reich, Y., Konda, S. L., Levy, S. N., Monarch, I. A. and Subrah-manian, E. (1993) New roles for machine learning in design. Artificial Intelligence in Design, 8, 165–81.
Google Scholar
Ripley, B. D. (1996) Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press.
Google Scholar
Royce, W. W. (1970) Managing the development of large software systems. Proceedings IEEE WESCON pp. 1–9.
Rudstrom, A. (1995) Applications of machine learning, (Technical Report: 95–018), Stockholm, Sweden: University of Stockholm, Department of Computer and Systems Sciences.
Google Scholar
Schmidt, W. F., Levelt, D. F. and Duin, R. P. W. (1994) An experimental comparison of neural classifiers with ‘traditional’ classifiers. In Gelsema and Kanal (eds), Pattern Recognition in Practice IV: Multiple Paradigms, Comparative Studies, and Hybrid Systems. Amsterdam: Elsevier Science.
Google Scholar
Schwartz, S., Wiles, J. and Philips, S. (1993) Connectionist, rule-based, and Bayesian decision aids: An empirical comparison. In Hand (ed.), Artificial Intelligence Frontiers in Statistics: AI and Statistics III. London: Chapman and Hall.
Google Scholar
Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis. London: Chapman and Hall.
Google Scholar
Smyth, P. (1994a) Hidden Markov monitoring for fault detection in dynamic systems. Pattern Recognition, 27, 149–64.
Google Scholar
Smyth, P. (1994b) Markov monitoring with unknown states. IEEE Journal on Selected Areas in Communications, special issue on intelligent signal processing for communications, 12, 1600–12.
Google Scholar
Smyth, P., Burl, M., Fayyad, U. M. and Perona, P. (1996) Knowledge discovery in large image databases: Dealing with uncertainties in ground truth. In Fayyad, Piatetsky-Shapiro, Smyth and Uthurasamy (eds), Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 517–40.
Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L. and Cowell, R. G. (1993) Bayesian analysis in expert systems (with discussion). Statistical Science, 8, 219–83.
Google Scholar
Wang, Q. R. and Suen, C. Y. (1984) Analysis and design of a decision tree based on entropy reduction and its application to large character set recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 4, 406–17.
Google Scholar
Weir, N., Fayyad, U. and Djorgovski, S. G. (1995a) Automated star/galaxy classification for POSS-II. The Astronomical Journal, 109, 2401–14.
Google Scholar
Weir, N., Djorgovski, S. G. and Fayyad, U. (1995b) Initial galaxy counts from digitized POSS-II. The Astronomical Journal, 110, 1–20.
Google Scholar
Weiss, S. M. and Kulikowski, C. S. (1991) Computer Systems that Learn. Palo Alto: Morgan Kaufmann.
Google Scholar
Widrow, B., Rumelhart, D. E. and Lehr, M. A. (1994) Neural networks: Applications in industry, business, and science. Communications of the ACM, 37, 93–105.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, 47906, USA
Carla E. Brodley
Information and Computer Science, University of California, Irvine, CA, 92717-3425, USA
Padhraic Smyth

Authors

Carla E. Brodley
View author publications
You can also search for this author in PubMed Google Scholar
Padhraic Smyth
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brodley, C.E., Smyth, P. Applying classification algorithms in practice. Statistics and Computing 7, 45–56 (1997). https://doi.org/10.1023/A:1018557312521

Download citation

Issue Date: March 1997
DOI: https://doi.org/10.1023/A:1018557312521

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applying classification algorithms in practice

Abstract

Access this article

Similar content being viewed by others

Classification

Metrics for Evaluating Classification Algorithms

Classifier calibration: a survey on how to assess and improve predicted class probabilities

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Applying classification algorithms in practice

Abstract

Access this article

Similar content being viewed by others

Classification

Metrics for Evaluating Classification Algorithms

Classifier calibration: a survey on how to assess and improve predicted class probabilities

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation