Abstract
This paper presents an association rule mining system that is capable of handling set-valued attributes. Our previous research has exposed us to a variety of real-world biological datasets that contain attributes whose values are sets of elements, instead of just individual elements. However, very few data mining tools accept datasets that contain these set-valued attributes, and none of them allow the mining of association rules directly from this type of data. We introduce in this paper two algorithms for mining (classification) association rules directly from set-valued data and compare their performance. We have implemented a system based on one of these algorithms and have applied it to a number of biological datasets. We describe here our system and highlight its merits by means of comparing the results achieved with it and the failed attempts to mine association rules from those datasets using standard tools. Our system makes the creation of input files containing set-valued data much easier, and makes the mining of association rules directly from these data possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proc. 20th VLDB Conference, pp. 487–499 (1994)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. SIGMOD Records 22(2), 207–216 (1993)
Bing, L., Hsu, W., Yiming, M., Kian, W.C., Minqing, H., Yiyuan, X., Jing, L.: Classification based on associations (CBA), http://www.comp.nus.edu.sg/~dm2/
Cohen, W.: Learning trees and rules with set-valued features. In: Proc. 13th AAAI Conf. (1996)
Cristofor, L., Cristofor, D.: Association rules miner (ARMiner), http://www.cs.umb.edu/~laur/ARMiner/
Doyle, D., Judecki, J., Lund, J., Padovano, B.: Genomic data mining. Undergraduate Graduation Project (MQP), Worcester Polytechnic Institute (April 2001)
Kalles, D., Papagelis, A.: Induction of decision trees in numeric domains using set-valuedattributes (2000)
Lin, W., Alvarez, S.A., Ruiz, C.: Efficient adaptive–support association rule mining forrecommender systems. Data Mining and Knowledge Discovery 6(1), 83–105 (2002)
Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proc. 4th KDD Conf., NewYork, pp. 80–86 (August 1998)
Murphy, B., Phu, D., Pushee, I., Tan, F.: Motif- and expression-based classification of DNA. Undergraduate Graduation Project (MQP), Worcester Polytechnic Institute (April 2001)
Novelli, G., Semprini, S., Capon, F., Dallapiccola, B.: A possible role of naip gene deletions in sex-related spinal muscular atrophy phenotype variation. Neurogenetics 1(1), 29–30 (1997)
Payne, T.R.: Instance-based prototypical learning of set valued attributes (1995), URL= http://citeseer.nj.nec.com/payne95instancebased.html
Shoemaker, C., Pungliya, M., Sao Pedro, M., Ruiz, C., Alvarez, S.A., Ward, M., Ryder, E., Krushkal, J.: Computational methods for single point and multipoint analysis of genetic variants associated with a simulated complex disorder in a general population. Genetic Epidemiology 21(Suppl. 1), 738–745 (2001)
Shoemaker, C.A., Sao Pedro, M.A., Alvarez, S.A., Ruiz, C.: Prediction vs. description: Two data mining approaches to the analysis of genetic data. In: Proc. 12th Genetic AnalysisWorkshop. Southwest Foundation for Biomedical Research, pp. 449–453 (October 2000)
Wirth, B., Herz, M., Wetter, A., Moskau, S., Hahnen, E., Rudnik-Schoeneborn, S., Wienker, T., Zerres, K.: Quantitative analysis of survival motor neuron copies: Identification of subtle smn1 mutations in patients with spinal muscular atrophy, genotype-phenotype correlation, and implications for genetic counseling. American Jounal of Human Genetics 64, 1340–1356 (1999)
Ian Witten, H., Frank, E.: Data Mining: PracticalMachine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shoemaker, C.A., Ruiz, C. (2003). Association Rule Mining Algorithms for Set-Valued Data. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_90
Download citation
DOI: https://doi.org/10.1007/978-3-540-45080-1_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive