An empirical study of cost-sensitive learning in cultural modeling

Su, Peng; Mao, Wenji; Zeng, Daniel

doi:10.1007/s10257-012-0198-4

An empirical study of cost-sensitive learning in cultural modeling

Original Article
Published: 11 August 2012

Volume 11, pages 437–455, (2013)
Cite this article

Information Systems and e-Business Management Aims and scope Submit manuscript

Peng Su¹,
Wenji Mao² &
Daniel Zeng^2,3

331 Accesses
2 Citations
Explore all metrics

Abstract

Cultural modeling aims at developing behavioral models of groups and analyzing the impact of culture factors on group behavior using computational methods. Machine learning methods and in particular classification, play a central role in such applications. In modeling cultural data, it is expected that standard classifiers yield good performance under the assumption that different classification errors have uniform costs. However, this assumption is often violated in practice. Therefore, the performance of standard classifiers is severely hindered. To handle this problem, this paper empirically studies cost-sensitive learning in cultural modeling. We consider cost factor when building the classifiers, with the aim of minimizing total misclassification costs. We conduct experiments to investigate four typical cost-sensitive learning methods, combine them with six standard classifiers and evaluate their performance under various conditions. Our empirical study verifies the effectiveness of cost-sensitive learning in cultural modeling. Based on the experimental results, we gain a thorough insight into the problem of non-uniform misclassification costs, as well as the selection of cost-sensitive methods, base classifiers and method-classifier pairs for this domain. Furthermore, we propose an improved algorithm which outperforms the best method-classifier pair using the benchmark cultural datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inferring Learning Strategies from Cultural Frequency Data

The Challenges of Cultural Segmentation: New Approaches from Computational Social Science

Association Rules Mining for Culture Modeling

Notes

http://www.cidcm.umd.edu/mar.

References

Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, Belmon
Google Scholar
Chawla N, Bowyer K, Hall L, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Google Scholar
Domingos P (1999) MetaCost: a general method for making classifiers cost-sensitive. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining, pp 155–164
Drummond C, Holte RC (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets
Elkan C (2001) The foundations of cost-sensitive learning. In: Proceedings of the seventeenth international joint conference on artificial intelligence, pp 973–978
Govindarajan M (2007) Text mining technique for data mining application. World Acad Sci Eng Technol 26(104):544–549
Google Scholar
Japkowicz N, Stephen S (2002) The class imbalance problem: a systematic study. Intelli Data Anal 6(5):203–231
Google Scholar
Khuller S, Martinez V, Nau D, Simari G, Sliva A, Subrahmanian VS (2007) Finding most probable worlds of logic programs. In: Proceedings of the first international conference on scalable uncertainty management, pp 45–59
Kohavi R, Wolpert DH (1996) Bias plus variance decomposition for zero-One loss functions. In: Proceedings of the thirteenth international conference on machine learning, pp 275–283
Liu XY, Wu JX, Zhou ZX (2006) Exploratory undersampling for class-imbalance learning. In: Proceedings of the sixth IEEE international conference on data mining, pp 539–550
Maloof MA (2003) Learning when data sets are imbalanced and when costs are unequal and unknown. In: Working notes of the ICML 2003 workshop on learning from imbalanced data sets
Mao WJ, Tuzhilin A, Gratch J (2011) Social and economic computing. IEEE Intell Syst 26(6):19–21
Article Google Scholar
Martinez V, Simari GI, Sliva A, Subrahmanian VS (2007) CONVEX: context vectors as a paradigm for learning group behaviors based on similarity. IEEE Intell Syst 23(4):51–57
Article Google Scholar
Provost F, Fawcett T, Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the fifteenth international conference on machine learning, pp 445–453
Sarker RA, Abbass HA, Newton C (2002) Heuristics and optimization for knowledge discovery. Idea Group Inc, Naperville
Google Scholar
Su P, Mao W, Zeng D, Li X, Wang FY (2009) Handling class imbalance problem in cultural modeling. In: Proceedings of the 2009 IEEE international conference on intelligence and security informatics, pp 251–256
Subrahmanian VS (2007) Computer science: cultural modeling in real time. Science 317(5844):1509–1510
Article Google Scholar
Subrahmanian VS, Albanese M, Martinez MV, Nau D, Reforgiato D, Simari GI, Sliva A, Wilkenfeld J, Udrea O (2007) CARA: a cultural-reasoning architecture. IEEE Intell Syst 22(2):12–16
Article Google Scholar
Ting KM (1998) Inducing cost-sensitive trees via instance weighting. In: Proceedings of the second european symposium on principles of data mining and knowledge discovery, pp 139–147
Wang FY (2009) Is culture computable? IEEE Intell Syst 24(2):2–3
Article Google Scholar
Wang FY, Carley KM, Zeng D, Mao W (2007) Social computing: from social informatics to social intelligence. IEEE Intell Syst 22(2):79–83
Article Google Scholar
Weiss GM (2004) Mining with rarity—problems and solutions: a unifying framework. SIGKDD Explor 6(1):7–19
Article Google Scholar
Wolpert D (1992) Stacked generalization. Neural Netw 5(2):241–260
Article Google Scholar
Xia F, Yang YW, Zhou L, Li FX, Cai M, Zeng D (2009) A closed-form reduction of multi-class cost-sensitive learning to weighted multi-class learning. Pattern Recogn 42(7):1572–1581
Article Google Scholar
Zeng D, Wang FY, Carley KM (2007) Social computing. IEEE Intell Syst 22(5):20–22
Article Google Scholar
Zhang J, Mani I (2003) kNN approach to unbalanced data distributions: a case study involving information extraction. In: Proceedings of the ICML’2003 workshop on learning from imbalanced data sets
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Article Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Natural Science Foundation of China under Grant Nos. 60921061, 61175040, 91024030, 90924302 and 71025001, and the Research Fund of State Key Laboratory of Management and Control for Complex Systems under Grant No. 20110102.

Author information

Authors and Affiliations

School of Management Engineering, Shandong Jianzhu University, Shandong, 250101, China
Peng Su
State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Wenji Mao & Daniel Zeng
Department of Management Information Systems, University of Arizona, Tucson, AZ, 85721, USA
Daniel Zeng

Authors

Peng Su
View author publications
You can also search for this author in PubMed Google Scholar
Wenji Mao
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenji Mao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Su, P., Mao, W. & Zeng, D. An empirical study of cost-sensitive learning in cultural modeling. Inf Syst E-Bus Manage 11, 437–455 (2013). https://doi.org/10.1007/s10257-012-0198-4

Download citation

Received: 01 November 2011
Revised: 16 May 2012
Accepted: 02 August 2012
Published: 11 August 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10257-012-0198-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study of cost-sensitive learning in cultural modeling

Abstract

Access this article

Similar content being viewed by others

Inferring Learning Strategies from Cultural Frequency Data

The Challenges of Cultural Segmentation: New Approaches from Computational Social Science

Association Rules Mining for Culture Modeling

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An empirical study of cost-sensitive learning in cultural modeling

Abstract

Access this article

Similar content being viewed by others

Inferring Learning Strategies from Cultural Frequency Data

The Challenges of Cultural Segmentation: New Approaches from Computational Social Science

Association Rules Mining for Culture Modeling

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation