Abstract
We report in this paper about our practice of frequent pattern discovery algorithms in the context of mining biological data related to genomic alterations in cancer. A number of frequent item set methods have already been successfully applied to various biological data obtained from large scale analyses (see for instance [4] for SAGE data, [20,22,26] for gene expression data), and all of these have to face the peculiarity of such data wrt standard basket analysis data, namely that the number of observations is low wrt the number of attributes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albertson, D.G., Collins, C., McCormick, F., Gray, J.W.: Chromosome aberrations in solid tumors. Nat. Genet. 34, 369–376 (2003)
Aliferis, C.F., Hardin, D., Massion, P.P.: Machine learning models for lung cancer classification using array comparative genomic hybridization. In: Proc. AMIA Symp. 2002, pp. 7–11 (2002)
Bayardo, R.J., Agrawal, R., Gunopulos, D.: Constraint-based rule mining in large, dense databases. Data Mining and Knowledge Discovery 4, 217–240 (2000)
Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J.F., Gandrillon, O.: Strong association rules mining for large gene expression data analysis: a case study on human SAGE data. Genome Biology 12 (2002)
Besson, J., Robardet, C., Boulicaut, J.F., Rome, S.: Constraint-based concept mining and its application to microarray data analysis. Intelligent Data Analysis 9 (to appear)
Billerey, C., Chopin, D., Aubriot-Lorton, M.H., Ricol, D., Gil Diez de Medina, S., Van Rhijn, B., Bralet, M.P., Lefrère-Belda, M.A., Lahaye, J.B., Abbou, C.C., Bonaventure, J., Zafrani, E.S., Van der Kwast, T., Thiery, J.P., Radvanyi, F.: Frequent FGFR3 mutations in papillary non-invasive bladder (pTa) tumors. Am. J. Pathol. 158, 1955–1959 (2001)
Bonchi, F., Giannotti, F., Mazzanti, A., Pedreschi, D.: Examiner: Optimized level-wise frequent pattern mining with monotone constraints. In: Proc. of the Third IEEE Int. Conf. on Data Mining, ICDM 2003 (2003)
Bonchi, F., Lucchese, C.: On closed constrained frequent pattern mining. In: Proc. of the Fourth International Conference on Data Mining (ICDM 2004). Morgan Kaufmann, San Francisco (2004)
Boros, E., Gurvich, V., Khachiyan, L., Makino, K.: On the complexity of generating maximal frequent and minimal infrequent sets. In: Alt, H., Ferreira, A. (eds.) STACS 2002. LNCS, vol. 2285, p. 133. Springer, Heidelberg (2002)
Bucila, C., Gehrke, J., Kifer, D., White, W.: Dualminer: A dual-pruning algorithm for itemsets with constraints. Data Mining and Knowledge Discovery 7, 241–272 (2003)
Cappellen, D., Gil Diez de Medina, S., Chopin, D., Thiery, J.P., Radvanyi, F.: Frequent loss of heterozygosity on chromosome 10q in muscle-invasive transitional cell carcinomas of the bladder. Oncogene 14, 3059–3066 (1997)
Chin, K., Ortiz de Solorzano, C., Knowles, D., Jones, A., Chou, W., Garcia Rodriguez, E., Kuo, W.L., Ljung, B.M., Chew, K., Myambo, K., Miranda, M., Krig, S., Garbe, J., Stampfer, M., Yaswen, P., Gray, J.W., Lockett, S.J.: In situ analyses of genome instability in breast cancer. Nat. Genet. 36, 984–988 (2004)
Clare, A., King, R.D.: Predicting gene function in saccharomyces cerevisiae. Bioinformatics 19(suppl. 2), ii42–ii49 (2003)
Dong, G., Li, J.: Efficient mining of emerging patterns: Discovering trends and differences. In: Knowledge Discovery and Data Mining, pp. 43–52 (1999)
Hahn, W.C., Weinberg, R.A.: Modelling the molecular circuitry of cancer. Nature 2, 331–341 (2002)
Hupé, P., Stransky, N., Thiery, J.P., Radvanyi, F., Barillot, E.: Analysis of array CGH data: from signal ratio to gain and loss of DNA regions. Bioinformatics 12, 3413–3422 (2004)
Ishkanian, A.S., Malloff, C.A., Watson, S.K., DeLeeuw, R.J., Chi, B., Coe, B.P., Snijders, A., Albertson, D.G., Pinkel, D., Marra, M.A., Ling, V., MacAulay, C., Lam, W.L.: A tiling resolution DNA microarray with complete coverage of the human genome. Nat. Genet. 36, 299–303 (2004)
Kramer, S., De Raedt, L.: Feature construction with version spaces for biochemical application. In: Proc. of the 18th International Conference on Machine Learning (ICML 2001). Morgan Kaufmann, San Francisco (2001)
Li, J., Dong, G., Ramamoharao, K., Wong, L.: DeEPs: a new instance-based discovery and classification system. Machine Learning 54, 99–124 (2004)
Li, J., Wong, L.: Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics 18, 725–734 (2003)
Mercier, G., Berthault, N., Mary, J., Peyre, J., Antoniadis, A., Comet, J.P., Cornuejols, A., Froidevaux, C., Dutreix, M.: Biological detection of low radiation doses by combining results of two microarray analysis methods. Nucleic Acids Res. 32 (2004)
Pang, F., Cong, G., Tung, A.K.H., Yang, J., Zaki, M.: Carpenter: Finding closed patterns in long biological datasets. In: Proc. of SIGKDD 2003 (2003)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Pei, J., Han, J., Mao, R.: Closet an efficient algorithm for mining frequent closed itemsets. In: Proc. of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD (2000)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufman, San Francisco (1993)
Rioult, F., Boulicaut, J.F., Crémilleux, B., Besson, J.: Using transposition for pattern discovery from microarray data. In: Proc. of the 8th ACM SIGMODWorkshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2003), pp. 73–79 (2003)
Rouveirol, C., et al.: Computation of minimal recurrent gain and loss regions from array-CGH data. Extended version of JOBIM 2004 (2004) (in preparation)
Saitta, L., Zucker, J.D.: Semantic abstraction for concept representation and learning. In: Symposium on Abstraction, Reformulation and Approximation (SARA 1998), pp. 103–120 (1998)
Scheffer, T., Wrobel, S.: Finding the most interesting patterns in a database quickly by using sequential sampling. Journal of Machine Learning Research 3, 833–862 (2002)
Snijders, A.M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A.K., Huey, B., Kimura, K., Law, S., Myambo, K., Palmer, J., Ylstra, B., Yue, Y.P., Gray, J.W., Jain, A.N., Pinkel, D., Albertson, D.G.: Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29, 263–264 (2001)
Soulet, A., Crémilleux, B., Rioult, F.: Condensed representation of emerging patterns. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 127–132. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rouveirol, C., Radvanyi, F. (2005). Local Pattern Discovery in Array-CGH Data. In: Morik, K., Boulicaut, JF., Siebes, A. (eds) Local Pattern Detection. Lecture Notes in Computer Science(), vol 3539. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11504245_9
Download citation
DOI: https://doi.org/10.1007/11504245_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26543-6
Online ISBN: 978-3-540-31894-1
eBook Packages: Computer ScienceComputer Science (R0)