Abstract
In this paper, we study the problem of mining high confidence fragment-based classification rules from the imbalanced HIV data whose class distribution is extremely skewed. We propose an efficient approach to mining frequent fragments in different classes of compounds that can provide best hints of the characteristic of each class and can be used to build associative classification rules. We adopt the pattern-growth paradigm and define an efficient fragment enumeration scheme. Moreover, we introduce an improved instance-centric rule-generation strategy to mine the high-confidence fragment-based classification rules, which are very insightful and useful in differentiating one class from other classes. Experiments show that our algorithm can discover more interesting rules than the previous method and can facilitate the detection of new compounds with desired anti-HIV activity.
This work was supported in part by National Natural Science Foundation of China under Grant No. 60573061, Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology(TNList), Program for New Century Excellent Talents in University under Grant No. NCET-07-0491, State Education Ministry of China.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: ICDM 2002 (2002)
Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD 1998 (1998)
James, C.A., Weininger, D., Delany, J.: Daylight theory manual - Daylight 4.71, Daylight Chemical Information Systems (2000), http://www.daylight.com
Kramer, S., Raedt, L.D., Helma, C.: Molecular feature mining in hiv data. In: KDD 2001 (2001)
Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001 (2001)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001 (2001)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, pp. 80–86 (1998)
Wang, J., Karypis, G.: HARMONY: Efficiently mining the best rules for classification. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, Springer, Heidelberg (2005)
Weininger, D.: SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences 28, 31 (1988)
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM 2002 (2002)
Yin, X., Han, J.: CPAR: Classification based on predictive association rules. In: SDM 2003 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lv, B., Wang, J., Zhou, L. (2008). High Confidence Fragment-Based Classification Rule Mining for Imbalanced HIV Data. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-540-78849-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78848-5
Online ISBN: 978-3-540-78849-2
eBook Packages: Computer ScienceComputer Science (R0)