Abstract
Much work has been done to deal with the test-cost sensitive learning on data with missing values. There is a confliction of efficiency and accuracy among previous strategies. Sequential test strategies have high accuracy but low efficiency because of their sequential property. Some batch strategies have high efficiency but lead to poor performance since they make all decisions at one time using initial information. In this paper, we propose a new test strategy, GTD algorithm, to address this problem. Our algorithm uses training data to judge the benefits brought by an unknown attribute and chooses the most useful unknown attribute each time until there is no rewarding unknown attributes. It is more reasonable to judge the utility of an unknown attribute from the real performance on training data other than from the estimation. Our strategy is meaningful since it has high efficiency(We only use training data so GTD is not sequential) and lower total costs than the previous strategies at the same time. The experiments also prove that our algorithm significantly outperforms previous algorithms especially when there is a high missing rate and large fluctuations of test costs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mitchell, T.M.: Machine Learning. McGraw Hill, New York (1997)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco (1993)
Juang, B.-H., Katagiri, S.: Discriminative learning for minimum error classification. IEEE Transactions on Signal Processing 40(12) (1992)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3) (1995)
Turney, P.D.: Types of cost in inductive concept learning. In: Workshop Cost-Sensitive Learning at the 17th Int’l. Conf. Machine Learning (2000)
Elkan, C.: The foundations of cost-sensitive learning. In: 7th Int’l. Joint Conf. Artificial Intelligence, pp. 973–978 (2001)
Domingos, P., Pazzani, M.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)
Kai, M.T.: Inducing cost-sensitive trees via instance weighting. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 139–147. Springer, Heidelberg (1998)
Nunez, M.: The use of background knowledge in decision tree induction. Machine Learning, 231–250 (1991)
Tan, M.: Cost-sensitive learning of classification knowledge and its applications in robotics. Machine Learning J., 7–33 (1993)
Yang, Q., Ling, C., Chai, X., Pan, R.: Test-cost sensitive classification on data with missing values. IEEE Transactions on Knowledge and Data Engineering (5) (2006)
Cebe, M., Gunduz-Demir, C.: Test-cost sensitive classification based on conditioned loss functions. Machine Learning (2007)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley and Sons, Inc., Chichester (2001)
Blake, C.L., Merz, C.J.: Uci repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, C. (2010). Test-Cost Sensitive Classification Using Greedy Algorithm on Training Data. In: Cai, Z., Hu, C., Kang, Z., Liu, Y. (eds) Advances in Computation and Intelligence. ISICA 2010. Lecture Notes in Computer Science, vol 6382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16493-4_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-16493-4_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16492-7
Online ISBN: 978-3-642-16493-4
eBook Packages: Computer ScienceComputer Science (R0)