Skip to main content
Log in

An empirical study of cost-sensitive learning in cultural modeling

  • Original Article
  • Published:
Information Systems and e-Business Management Aims and scope Submit manuscript

Abstract

Cultural modeling aims at developing behavioral models of groups and analyzing the impact of culture factors on group behavior using computational methods. Machine learning methods and in particular classification, play a central role in such applications. In modeling cultural data, it is expected that standard classifiers yield good performance under the assumption that different classification errors have uniform costs. However, this assumption is often violated in practice. Therefore, the performance of standard classifiers is severely hindered. To handle this problem, this paper empirically studies cost-sensitive learning in cultural modeling. We consider cost factor when building the classifiers, with the aim of minimizing total misclassification costs. We conduct experiments to investigate four typical cost-sensitive learning methods, combine them with six standard classifiers and evaluate their performance under various conditions. Our empirical study verifies the effectiveness of cost-sensitive learning in cultural modeling. Based on the experimental results, we gain a thorough insight into the problem of non-uniform misclassification costs, as well as the selection of cost-sensitive methods, base classifiers and method-classifier pairs for this domain. Furthermore, we propose an improved algorithm which outperforms the best method-classifier pair using the benchmark cultural datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Notes

  1. http://www.cidcm.umd.edu/mar.

References

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmon

    Google Scholar 

  • Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Google Scholar 

  • Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164

  • Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets

  • Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence, pp 973–978

  • Govindarajan M (2007) Text mining technique for data mining application. World Acad Sci Eng Technol 26(104):544–549

    Google Scholar 

  • Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelli Data Anal 6(5):203–231

    Google Scholar 

  • Khuller S, Martinez V, Nau D, Simari G, Sliva A, Subrahmanian VS (2007) Finding most probable worlds of logic programs. In: Proceedings of the first international conference on scalable uncertainty management, pp 45–59

  • Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-One loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283

  • Liu XY, Wu JX, Zhou ZX (2006) Exploratory undersampling for class-imbalance learning. In: Proceedings of the sixth IEEE international conference on data mining, pp 539–550

  • Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets

  • Mao WJ, Tuzhilin A, Gratch J (2011) Social and economic computing. IEEE Intell Syst 26(6):19–21

    Article  Google Scholar 

  • Martinez V, Simari GI, Sliva A, Subrahmanian VS (2007) CONVEX: context vectors as a paradigm for learning group behaviors based on similarity. IEEE Intell Syst 23(4):51–57

    Article  Google Scholar 

  • Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the fifteenth international conference on machine learning, pp 445–453

  • Sarker RA, Abbass HA, Newton C (2002) Heuristics and optimization for knowledge discovery. Idea Group Inc, Naperville

    Google Scholar 

  • Su P, Mao W, Zeng D, Li X, Wang FY (2009) Handling class imbalance problem in cultural modeling. In: Proceedings of the 2009 IEEE international conference on intelligence and security informatics, pp 251–256

  • Subrahmanian VS (2007) Computer science: cultural modeling in real time. Science 317(5844):1509–1510

    Article  Google Scholar 

  • Subrahmanian VS, Albanese M, Martinez MV, Nau D, Reforgiato D, Simari GI, Sliva A, Wilkenfeld J, Udrea O (2007) CARA: a cultural-reasoning architecture. IEEE Intell Syst 22(2):12–16

    Article  Google Scholar 

  • Ting KM (1998) Inducing cost-sensitive trees via instance weighting. In: Proceedings of the second european symposium on principles of data mining and knowledge discovery, pp 139–147

  • Wang FY (2009) Is culture computable? IEEE Intell Syst 24(2):2–3

    Article  Google Scholar 

  • Wang FY, Carley KM, Zeng D, Mao W (2007) Social computing: from social informatics to social intelligence. IEEE Intell Syst 22(2):79–83

    Article  Google Scholar 

  • Weiss GM (2004) Mining with rarity—problems and solutions: a unifying framework. SIGKDD Explor 6(1):7–19

    Article  Google Scholar 

  • Wolpert D (1992) Stacked generalization. Neural Netw 5(2):241–260

    Article  Google Scholar 

  • Xia F, Yang YW, Zhou L, Li FX, Cai M, Zeng D (2009) A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning. Pattern Recogn 42(7):1572–1581

    Article  Google Scholar 

  • Zeng D, Wang FY, Carley KM (2007) Social computing. IEEE Intell Syst 22(5):20–22

    Article  Google Scholar 

  • Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets

  • Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 60921061, 61175040, 91024030, 90924302 and 71025001, and the Research Fund of State Key Laboratory of Management and Control for Complex Systems under Grant No. 20110102.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenji Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, P., Mao, W. & Zeng, D. An empirical study of cost-sensitive learning in cultural modeling. Inf Syst E-Bus Manage 11, 437–455 (2013). https://doi.org/10.1007/s10257-012-0198-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10257-012-0198-4

Keywords

Navigation