ABSTRACT
Data arising from genomic and proteomic experiments is amassing at high speeds resulting in huge amounts of raw data; consequently, the need for analyzing such biological data --- the understanding of which is still lagging way behind --- has been prominently solicited in the post-genomic era we are currently witnessing. In this paper we attempt to analyze annotated genome data by applying a very central data-mining technique known as association rule mining with the aim of discovering rules capable of yielding deeper insights into this type of data. We propose a new technique capable of using domain knowledge in the form of queries in order to efficiently mine only the subset of the associations that are of interest to researcher in an incremental and interactive mode.
- R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD (Washington D.C., USA), 1993.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant, Fast algorithms for mining association rules. Proceeding of the VLDB (Santiago, Chile), 1994.]] Google ScholarDigital Library
- C. Becquet, S. Blachon, B. Jeudy, J. F. Boulicuat, and O. Grandrillon, "Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data." Genome Biology 3(12), 2002.]]Google Scholar
- J. F. Boulicaut, A. Bykowski, C. Rigotti. "Free-sets: a condensed representation of Boolean data for frequency query approximation." Data Mining and Knowledge Journal 7:5--22, 2003.]] Google ScholarDigital Library
- A. Clare and R. D. King, Data mining the yeast genome in a lazy functional language. Proceedings of the International Symposium on Practical Aspects of Declarative Languages (New Orleans, Louisiana), January 2003.]] Google ScholarDigital Library
- Q. Ding, M. Khan, A. Roy, and W. Perrizo, The p-tree algebra. Proceedings of the ACM SAC (Madrid, Spain), 2002.]] Google ScholarDigital Library
- B. Geothals and J. V. D. Bussche, Interactive Constrained Association Rule Mining. Proceedings of the International Conference on Data Warehousing and Knowledge Discovery, volume 1874 of Lecture Notes in Computer Science. Springer, 2000.]] Google ScholarDigital Library
- J. Han, J. Pei and Y. Yin, Mining Frequent Patterns without Candidate Generation. Proceeding of ACM SIGMOD (Dallas, Texas), 1--12, 2000.]] Google ScholarDigital Library
- A. Icev, C. Ruiz, and E. F. Ryder, Distance-Enhanced Association Rules fro Gene Expression. Proceedings of the ACM SIGKDD BIOKDD, Workshop on Data Mining in Bioinformatics (Washington D. C., USA), July 2002.]]Google Scholar
- P. Kotala, P. Zhou, S. Mudivarthy, W. Perrizo and E. Deckard, Gene Expression Profiling of DNA Microarray Data using Peano Count Trees. Online proceedings of the first annual Virtual Conference on Genomics and Bioinformatics, October 2001.]]Google Scholar
- Munich Information Center for Protein Sequences. {http://mips.gsf.de/}. August 2004.]]Google Scholar
- W. Perrizo, Peano count tree technology lab notes. Technical Report NDSU-CS-TR-01-1, 2001. {http://www.cs.ndsu.nodak.edu/~perrizo/classes/785/pct.html }. January 2003.]]Google Scholar
- I. Rahal, D. Ren, and W. Perrizo, "A Scalable Vertical Model for Mining Association Rules." To appear in the Journal of Information & Knowledge Management (JIKM) by World Scientific, December 2004 issue.]]Google ScholarCross Ref
- P. Shenoy, J. Haristsa, S. Sudatsham, G. Bhalotia, M. Baqa and D. Shah, Turbo-charging vertical mining of large databases. Proceedings of the ACM SIGMOD (Austin, Texas), 22--29, May 2000.]] Google ScholarDigital Library
- A. Tuzhilin and G. Adomavicius, Handling Very Large Numbers of Association Rules in the Analysis of Microarray Data. Proceedings of the ACM SIGKDD (Edmonton, Alberta), July 2002.]] Google ScholarDigital Library
- D. D. Williams, G. D. Pavitt, and C. G. Proud, "Characterization of the initiation factor eIF2B and its regulation in Drosophila melanogaster." Journal of Biological Chemistry, 276(6): 3733--3742, February 2001.]]Google ScholarCross Ref
- M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li, New Algorithms for Fast Discovery of Association Rules. Proceedings of the SIGKDD (Newport, California), 283--286, August 1997.]]Google Scholar
Index Terms
- Incremental interactive mining of constrained association rules from biological annotation data with nominal features
Recommendations
CARIBIAM: Constrained Association Rules using Interactive Biological IncrementAl Mining
This paper analyses annotated genome data by applying a very central data-mining technique known as Association Rule Mining (ARM) with the aim of discovering rules and hypotheses capable of yielding deeper insights into this type of data. In the ...
TCOM, an innovative data structure for mining association rules among infrequent items
Association rule mining is one of the most important areas in data mining, which has received a great deal of attention. The purpose of association rule mining is the discovery of association relationships or correlations among a set of items. In this ...
Future direction of incremental association rules mining
ACM-SE 47: Proceedings of the 47th Annual Southeast Regional ConferenceData mining has been attracted much attention from practitioners and researchers in recent years. Association rules are one of the most important research areas of data mining. Association Rule Mining (ARM) aims to discovers the relationship between the ...
Comments