Abstract
Finding relational expressions which exist frequently in one class of data while not in the other class of data is an interesting work. In this paper, a relational expression of this kind is defined as a contrast inequality. Gene Expression Programming (GEP) is powerful to discover relations from data and express them in mathematical level. Hence, it is desirable to apply GEP to such mining task. The main contributions of this paper include: (1) introducing the concept of contrast inequality mining, (2) designing a two-genome chromosome structure to guarantee that each individual in GEP is a valid inequality, (3) proposing a new genetic mutation to improve the efficiency of evolving contrast inequalities, (4) presenting a GEP-based method to discover contrast inequalities, (5) giving an extensive performance study on real-world datasets. The experimental results show that the proposed methods are effective. Contrast inequalities with high discriminative power are discovered from the real-world datasets. Some potential works on contrast inequality mining are discussed.
This work was supported by the National Natural Science Foundation of China under grant No. 60773169, the 11th Five Years Key Programs for Sci. &Tech. Development of China under grant No. 2006BAI05A01, and the Young Faculty Foundation of Sichuan University under grant No. 2009SCU11030.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dong, G., Li, J.: Efficient Mining of Emerging Patterns: Discovering Trends and Differences. In: Zaki, M.J., Ho, C.-T. (eds.) KDD 1999. LNCS (LNAI), vol. 1759, pp. 43–52. Springer, Heidelberg (2000)
Li, J., Liu, H., Downing, J.R., Yeoh, A., Wong, L.: Simple Rules Underlying Gene Expression Profiles Using the Concept of Emerging Patterns. Bioinformatics 19, 71–78 (2003)
Li, J., Dong, G., Ramamohanarao, K.: Making Use of the Most Expressive Jumping Emerging Patterns for Classification. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 220–232. Springer, Heidelberg (2000)
Li, J., Liu, G., Wong, L.: Mining Statistically Important Equivalence Classes and Delta-Discriminative Emerging Patterns. In: Proc. of KDD 2007, pp. 430–439 (2007)
An, A., Wan, Q., Zhao, J., Huang, X.: Diverging Patterns: Discovering Significant Frequency Change Dissimilarities in Large Databases. In: Proc. of CIKM 2007, pp. 1473–1476 (2009)
Duan, L., Tang, C., Tang, L., Zhang, T., Zuo, J.: Mining Class Contrast Functions by Gene Expression Programming. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds.) ADMA 2009. LNCS, vol. 5678, pp. 116–127. Springer, Heidelberg (2009)
Loekito, E., Bailey, J.: Fast Mining of High Dimensional Expressive Contrast Patterns Using Zero-suppressed Binary Decision Diagrams. In: Proc. of KDD 2006, pp. 307–316 (2006)
Ferreira, C.: Gene Expression Programming: A New Adaptive Algorithm for Solving Problems. Complex Systems 13(2), 87–129 (2001)
Ferreira, C.: Gene Expression Programming: Mathematical Modeling by an Artificial Intelligence. Angra do Heroismo, Portugal (2002)
Zhang, X., Dong, G., Ramamohanarao, K.: Exploring Constraints to Efficiently Mine Emerging Patterns from Large High-dimensional Datasets. In: Proc. of KDD 2000, pp. 310–314 (2000)
Bailey, J., Manoukian, T., Ramamohanarao, K.: Fast Algorithms for Mining Emerging Patterns. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 39–50. Springer, Heidelberg (2002)
Fan, H., Ramamohanarao, K.: An Efficient Single-Scan Algorithm for Mining Essential Jumping Emerging Patterns for Classification. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 456–462. Springer, Heidelberg (2002)
Bailey, J., Manoukian, T., Ramamohanarao, K.: A Fast Algorithm for Computing Hypergraph Transversals and its Application in Mining Emerging Patterns. In: Proc. of ICDM 2003, pp. 485–488 (2003)
Bailey, J., Dong, G.: Contrast Data Mining: Methods and Applications. Tutorial at 2007 IEEE ICDM (2007)
Dong, G., Zhang, X., Wong, L., Li, J.: CAEP: Classification by Aggregating Emerging Patterns. Discovery Science, 30–42 (1999)
Li, J., Dong, G., Ramamohanarao, K., Wong, L.: DeEPs: A New Instance-Based Lazy Discovery and Classification System. Machine Learning 54(2), 99–124 (2004)
Lopes, H.S., Weinert, W.R.: EGIPSYS: An Enhanced Gene Expression Programming Approach for Symbolic Regression Problems. Int’l Journal of Applied Mathematics and Computer Science 14(3), 375–384
Zhou, C., Xiao, W., Tirpak, T.M., Nelson, P.C.: Evolution Accurate and Compact Classification Rules with Gene Expression Programming. IEEE Transactions on Evolutionary Computation 7(6), 519–531 (2003)
Zuo, J., Tang, C., Li, C., et al.: Time Series Prediction based on Gene Expression Programming. In: Li, Q., Wang, G., Feng, L. (eds.) WAIM 2004. LNCS, vol. 3129, pp. 55–64. Springer, Heidelberg (2004)
Li, J., Wong, L.: Identifying Good Diagnostic Gene Groups from Gene Expression Profiles Using the Concept of Emerging Patterns. Bioinformatics 18, 725–734 (2002)
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Fayyad, U., Irani, K.: Multi-interval Discretization of Continuous-valued Attributes for Classification Learning. In: Proc. of IJCAI 1993, pp. 1022–1029 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Duan, L., Zuo, J., Zhang, T., Peng, J., Gong, J. (2010). Mining Contrast Inequalities in Numeric Dataset. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)