Abstract
Supervised learning methods have been recently exploited to learn gene regulatory networks from gene expression data. The basic approach consists into building a binary classifier from feature vectors composed by expression levels of a set of known regulatory connections, available in public databases or known in literature. Such a classifier is then used to predict new unknown connections.
The quality of the training set plays a crucial role in such an inference scheme. In binary classification the training set should be composed of positive and negative examples, but in Biology literature the only collected information is whether two genes interact. Instead, the counterpart information is usually not reported, as Biologists are not aware to state whether two genes are not interacting.
The over presence of topology motifs in currently known gene regulatory networks, such as, feed–forward loops, bi–fan clusters, and single input modules, could drive the selection of reliable negative examples. We introduce, discuss, and evaluate a number of negative selection heuristics that exploits the known gene network topology of Escherichia coli and Saccharomyces cerevisiae.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Margolin, A.A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Dalla Favera, R., Califano, A.: ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7(suppl. 1) (2006)
Faith, J.J., Hayete, B., Thaden, J.T., Mogno, I., Wierzbowski, J., Cottarel, G., Kasif, S., Collins, J.J., Gardner, T.S.: Large-scale mapping and validation of escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol.
Liang, S., Fuhrman, S., Somogyi, R.: Reveal, a general reverse engineering algorithm for inference of genetic network architectures. In: Pac. Symp. Biocomput., pp. 18–29 (1998)
Polynikis, A., Hogan, S.J., di Bernardo, M.: Comparing different ODE modelling approaches for gene regulatory networks. Journal of Theoretical Biology (2009)
Werhli, A.V., Husmeier, D.: Reconstructing gene regulatory networks with bayesian networks by combining expression data with multiple sources of prior knowledge. Stat. Appl. Genet. Mol. Biol. 6 (2007)
Ben-Hur, A., Noble, W.S.: Kernel methods for predicting protein-protein interactions. Bioinformatics 21, i38–i46
Bock, J.R., Gough, D.A.: Predicting protein protein interactions from primary structure. Bioinformatics 17, 455–460 (2001)
Yamanishi, Y., Bach, F., Vert, J.P.: Glycan classification with tree kernels. Bioinformatics 23, 1211–1216 (2007)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques. Kaufmann series in data management systems. Morgan Kaufmann, San Francisco
Grzegorczyk, M., Husmeier, D., Werhli, A.V.: Reverse engineering gene regulatory networks with various machine learning methods. Analysis of Microarray Data
Mordelet, F., Vert, J.P.: SIRENE: supervised inference of regulatory networks. Bioinformatics 24, i76–i82 (2008)
Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: KDD 2008: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213–220. ACM, New York (2008)
Cerulo, L., Elkan, C., Ceccarelli, M.: Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics (2010)
Yu, H., Han, J., chuan Chang, K.C.: Pebl: Web page classification without negative examples. IEEE Transactions on Knowledge and Data Engineering 16, 70–81 (2004)
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI 2003, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, pp. 587–594 (2003)
Ceccarelli, M., Cerulo, L.: Selection of negative examples in learning gene regulatory networks. In: IEEE International Conference on Bioinformatics and Biomedicine Workshop, BIBMW 2009, pp. 56–61 (2009)
Alon, U.: Network motifs: theory and experimental approaches. Nature Reviews Genetics 8, 450–461 (2007)
Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., Alon1, U.: Network motifs: Simple building blocks of complex networks. Science 298 (2002)
Albert, I., Albert, R.: Conserved network motifs allow protein protein interaction prediction. Bioinformatics 20, 3346–3352 (2004)
Itzkovitz, S., Levitt, R., Kashtan, N., Milo, R., Itzkovitz, M., Alon, U.: Coarse-graining and self-dissimilarity of complex networks. Phys. Rev. E Stat. Nonlin. Soft. Matter Phys. 71 (2005)
Kalir, S., McClure, J., Pabbaraju, K., Southward, C., Ronen, M., Leibler, S., Surette, M.G., Alon, U.: Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria. Science 292, 2080–2083 (2001)
Goemann, B., Wingender, E., Potapov, A.P.: An approach to evaluate the topological significance of motifs and other patterns in regulatory networks. BMC System Biology 3 (2009)
Shen-Orr, S.S., Milo, R., Mangan, S., Alon, U.: Network motifs in the transcriptional regulation network of escherichia coli. Nature Genetics 31, 64–68 (2002)
Lin, H.T., Lin, C.J., Weng, R.C.: A note on platt’s probabilistic outputs for support vector machines. Mach. Learn. 68, 267–276 (2007)
Marbach, D., Schaffter, T., Mattiussi, C., Floreano, D.: Generating realistic in silico gene networks for performance assessment of reverse engineering methods. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 16, 229–239 (2009)
Stolovitzky, G., Monroe, D., Califano, A.: Dialogue on reverse-engineering assessment and methods: The dream of high-throughput pathway inference. Annals of the New York Academy of Sciences 1115, 1–22 (2007)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), Software available at , http://www.csie.ntu.edu.tw/~cjlin/libsvm
Minami, R., Kitazawa, R., Maeda, S., Kitazawa, S.: Analysis of 5’-flanking region of human smad4 (DPC4) gene. Biochimica et Biophysica Acta (BBA) - Gene Structure and Expression 1443, 182–185 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cerulo, L., Paduano, V., Zoppoli, P., Ceccarelli, M. (2011). Labeling Negative Examples in Supervised Learning of New Gene Regulatory Connections. In: Rizzo, R., Lisboa, P.J.G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2010. Lecture Notes in Computer Science(), vol 6685. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21946-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-21946-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21945-0
Online ISBN: 978-3-642-21946-7
eBook Packages: Computer ScienceComputer Science (R0)