skip to main content
10.1145/1569901.1570162acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
poster

Creating regular expressions as mRNA motifs with GP to predict human exon splitting

Published: 08 July 2009 Publication History

Abstract

RNAnet [3] http://bioinformatics.essex.ac.uk/users/wlangdon/rnanet/ allows the user to calculate correlations of gene expression, both between genes and between components within genes. We investigate all of Ensembl http://www.ensembl.org and find all the Homo Sapiens exons for which there are sufficient robust Affymetrix HG-U133 Plus 2 GeneChip probes. Calculating correlation between mRNA probe measurements for the same exon shows many exons whose components are consistently up regulated and down regulated. However we identify other Ensembl exons where sub-regions within them are self consistent but these transcript blocks are not well correlated with other blocks in the same exon. We suggest many current Ensembl exon definitions are incomplete. Secondly, having identified exon with substructure we use machine learning to try and identify patterns in the DNA sequence lying between blocks of high correlation which might yield biological or technological explanations. A Backus-Naur form (BNF) context-free grammar constrains strongly typed genetic programming (STGP) to evolve biological motifs in the form of regular expressions (RE) (e.g. TCTTT) which classify gene exons with potential alternative mRNA expression from those without. We show biological patterns can be data mined by a GP written in gawk and using egrep from NCBI's GEO http://www.ncbi.nlm.nih.gov/geo/ database. The automatically produced DNA motifs suggest that alternative polyadenylation is not responsible. (Full version in TR-09-02 [7].) Blocky exons can be found in http://bioinformatics.essex.ac.uk/users/wlangdon/tr-09-02.tar.gz

References

[1]
Langdon, W. B. Genetic Programming and Data Structures. Kluwer, 1998.
[2]
Langdon, W. B. Evolving GeneChip correlation predictors on parallel graphics hardware. In 2008 IEEE World Congress on Computational Intelligence (Hong Kong, 1-6 June 2008), J. Wang, Ed., IEEE Computational Intelligence Society, IEEE Press, pp. 4152--4157.
[3]
Langdon, W. B. A map of human gene expression. Tech. Rep. CES-486, Departments of Mathematical, Biological Sciences and Computing and Electronic Systems, University of Essex, Colchester, CO4 3SQ, UK, July 2008.
[4]
Langdon, W. B., and Harrison, A. P. Evolving DNA motifs to predict GeneChip probe performance. Algorithms in Molecular Biology. In press.
[5]
Langdon, W. B., McKay, R. I., and Spector, L. Genetic programming. In Handbook of Metaheuristics, J.-Y. Potvin and M. Gendreau, Eds., second ed. Springer, ch. 7.
[6]
Langdon, W. B., and Poli, R. Foundations of Genetic Programming. Springer-Verlag, 2002.
[7]
Creating regular expressions as mRNA motifs with GP to predict human exon splitting. Tech. Rep. TR-09-02, Department of Computer Science, Crest Centre, King's College, London, Strand, London, WC2R 2LS, UK, 19 Mar. 2009.
[8]
Langdon, W. B., Upton, G. J. G., da Silva Camargo, R., and Harrison, A. P. A survey of spatial defects in Homo Sapiens Affymetrix GeneChips. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2009). In press.
[9]
Poli, R., Langdon, W. B., and McPhee, N. F. A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk, 2008. (With contributions by J. R. Koza).
[10]
Retelska, D., et al. Similarities and differences of polyadenylation signals in human and fly. BMC Genomics 7, 1 (2006), 176.
[11]
Sanchez-Graillet, O., Rowsell, J., Langdon, W. B., Stalteri, M. A., Arteaga Salas, J. M., Upton, G. J., and Harrison, A. P. Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips. Journal of Integrative Bioinformatics 5, 2 (2008), 98.

Cited By

View all
  • (2014)Automatic Synthesis of Regular Expressions from ExamplesComputer10.1109/MC.2014.34447:12(72-80)Online publication date: 1-Dec-2014
  • (2012)Automatic generation of regular expressions from examples with genetic programmingProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2331000(1477-1478)Online publication date: 7-Jul-2012
  • (2012)Genetic programming needs better benchmarksProceedings of the 14th annual conference on Genetic and evolutionary computation10.1145/2330163.2330273(791-798)Online publication date: 7-Jul-2012

Index Terms

  1. Creating regular expressions as mRNA motifs with GP to predict human exon splitting

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation
        July 2009
        2036 pages
        ISBN:9781605583259
        DOI:10.1145/1569901

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 July 2009

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. HDONA
        2. affymetrix genechip
        3. alternative splicing
        4. alternative splicing of homosapiens exons
        5. bioinformatics
        6. biological interpretation of computer generated motifs
        7. gene expression and regulation
        8. genetic algorithms
        9. genetic programming
        10. grammar
        11. integration of genetic programming into bioinformatics
        12. microarray analysis
        13. regular expression
        14. strongly typed genetic programming

        Qualifiers

        • Poster

        Conference

        GECCO09
        Sponsor:
        GECCO09: Genetic and Evolutionary Computation Conference
        July 8 - 12, 2009
        Québec, Montreal, Canada

        Acceptance Rates

        Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 20 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2014)Automatic Synthesis of Regular Expressions from ExamplesComputer10.1109/MC.2014.34447:12(72-80)Online publication date: 1-Dec-2014
        • (2012)Automatic generation of regular expressions from examples with genetic programmingProceedings of the 14th annual conference companion on Genetic and evolutionary computation10.1145/2330784.2331000(1477-1478)Online publication date: 7-Jul-2012
        • (2012)Genetic programming needs better benchmarksProceedings of the 14th annual conference on Genetic and evolutionary computation10.1145/2330163.2330273(791-798)Online publication date: 7-Jul-2012

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media