Skip to main content

High Confidence Fragment-Based Classification Rule Mining for Imbalanced HIV Data

  • Conference paper
Progress in WWW Research and Development (APWeb 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4976))

Included in the following conference series:

  • 891 Accesses

Abstract

In this paper, we study the problem of mining high confidence fragment-based classification rules from the imbalanced HIV data whose class distribution is extremely skewed. We propose an efficient approach to mining frequent fragments in different classes of compounds that can provide best hints of the characteristic of each class and can be used to build associative classification rules. We adopt the pattern-growth paradigm and define an efficient fragment enumeration scheme. Moreover, we introduce an improved instance-centric rule-generation strategy to mine the high-confidence fragment-based classification rules, which are very insightful and useful in differentiating one class from other classes. Experiments show that our algorithm can discover more interesting rules than the previous method and can facilitate the detection of new compounds with desired anti-HIV activity.

This work was supported in part by National Natural Science Foundation of China under Grant No. 60573061, Basic Research Foundation of Tsinghua National Laboratory for Information Science and Technology(TNList), Program for New Century Excellent Talents in University under Grant No. NCET-07-0491, State Education Ministry of China.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Borgelt, C., Berthold, M.R.: Mining molecular fragments: Finding relevant substructures of molecules. In: ICDM 2002 (2002)

    Google Scholar 

  2. Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: KDD 1998 (1998)

    Google Scholar 

  3. James, C.A., Weininger, D., Delany, J.: Daylight theory manual - Daylight 4.71, Daylight Chemical Information Systems (2000), http://www.daylight.com

  4. Kramer, S., Raedt, L.D., Helma, C.: Molecular feature mining in hiv data. In: KDD 2001 (2001)

    Google Scholar 

  5. Kuramochi, M., Karypis, G.: Frequent subgraph discovery. In: ICDM 2001 (2001)

    Google Scholar 

  6. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001 (2001)

    Google Scholar 

  7. Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Knowledge Discovery and Data Mining, pp. 80–86 (1998)

    Google Scholar 

  8. Wang, J., Karypis, G.: HARMONY: Efficiently mining the best rules for classification. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, Springer, Heidelberg (2005)

    Google Scholar 

  9. Weininger, D.: SMILES 1. Introduction and encoding rules. Journal of Chemical Information and Computer Sciences 28, 31 (1988)

    Article  Google Scholar 

  10. Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. In: ICDM 2002 (2002)

    Google Scholar 

  11. Yin, X., Han, J.: CPAR: Classification based on predictive association rules. In: SDM 2003 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yanchun Zhang Ge Yu Elisa Bertino Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lv, B., Wang, J., Zhou, L. (2008). High Confidence Fragment-Based Classification Rule Mining for Imbalanced HIV Data. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds) Progress in WWW Research and Development. APWeb 2008. Lecture Notes in Computer Science, vol 4976. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78849-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78849-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78848-5

  • Online ISBN: 978-3-540-78849-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics