Elsevier

Computers in Biology and Medicine

Volume 43, Issue 8, 1 September 2013, Pages 1023-1024
Computers in Biology and Medicine

ExonSuite: Algorithmically optimizing alternative gene splicing for the PUF proteins

https://doi.org/10.1016/j.compbiomed.2013.05.014Get rights and content

Abstract

The stability of mRNA and its translation is a vital process necessary for proper protein production. The specificity of the regulation is controlled by specific RNA motifs and regulatory proteins. Pumilio/fem-3 mRNA-binding factor (PUF) proteins are usually used in regulating mRNA stability as well as translation. Here, we optimized a PUF protein target finder program to understand the natural diversity of RNA recognition by this family of proteins. ExonSuite is available to compile and run at https://github.com/dilanustek/ExonSuite.

Introduction

To find the target of PUF proteins is a novel way to engineer mRNA stability and translation [1], [2]. This is done through engineering PUF proteins that can bind to specific sequences in order to either enhance or suppress inclusion of an mRNA in a protein product [3], [4]. An engineered PUF protein can be used in regulating mRNA stability as well as translation [5]. The PUF proteins bind to RNA, and can be engineered to bind to specific eight-nucleotide sequences. PUF proteins can also be engineered to have different effects based on the composition of the C-terminal. Proteins with C-terminal arginine–serine rich domains tend to enhance inclusion of the exon, while proteins with glycine-rich domains tend to suppress inclusion of the exon [5], [6].

It has been shown that PUF proteins can affect drug sensitivity of cancer cells [5]. Researchers would like to extend the usability of the PUF proteins, which is where this work comes in. The purpose of this work is to describe ExonSuite, a program which finds optimal sequences for PUF protein targeting. The program then builds a frequency table of every 8-mer (8 base pair sequence with fifth position either C or U) that PUF proteins can bind to. The program then compares each 8-mer in the table across the whole genome, and counts how many times each 8-mer appears. For each target, the program returns the 8-mers with the fewest matches across the genome, and thus the fewest off-target effects, which we take to minimize the undesired effects of PUF protein binding.

Section snippets

Algorithm

The computational problem of choosing optimal PUF protein binding sites can be broken down into two different parts: finding all the valid 8-mer binding sites, and calculating the associated score for ‘off-target’ matches, i.e. matches that are not within the exon of interest. There are several different strategies which can be employed to solve each component. After experimentation we settled on an efficient method which minimized computational redundancy.

First, the skipsGen (part of

Output

We also designed a new file format containing our return information. This format consists of a two column system; the left column contains the header information from the individual exons in the given initial file. Separated by a colon on the right is the 8-mer which received the lowest off-target score along with the associated off-target score (Supp. 1). This can be read through spreadsheet programs like excel and using any text editor.

Testing

The algorithm was initially tested on a variety of generic sequences with easily predicted anticipated results, such as TTTTTTTTTTTATTTTTTTTTTT and similar simple sequences, with clearly defined expected outcomes. Based on the success of our initial tests, we began testing the algorithm on actual data which we obtained from the UCSC Genome Browser (http://genome.ucsc.edu), specifically, the set of experimentally derived mRNA sequences from C. elegans. Initially, we focused on small subsets of

Results

To obtain experimental data, we ran the algorithm on the same MacBook as above, but now using the inputs Mouse ESTs (Expressed Sequence Tags) (Supp. 2), Marmoset spliced ESTs (Supp. 3), and Turkey ESTs (Supp. 4). The ESTs consist of predicted genes obtained from cDNA transcripts, and are what the UNC lab will be focusing their computational efforts on. The mouse EST data file was 33.9 MB in size, and ExonSuite returned a 6 MB output file with no known errors. The mean best off-target score was

Discussion

ExonSuite (Supp. 5) is a powerful program to identify algorithmically optimal PUF protein binding sites. While illustrative, the output from ExonSuite should be interpreted mainly as a guide to further laboratory testing regarding hypotheses about the manipulation of genetic expression with PUF proteins.

Further extensions to this algorithm could allow for the detection of many sequences following a regular pattern by modifying the way in which frequencies is generated.

Conflict of interest statement

There is no conflict of interest.

Acknowledgments

We would like to thank Professor Sam Rebelsky of Grinnell College for his invaluable debugging assistance and coding advice. We would also like to thank Prof. Zefeng Wang from UNC Pharmacology.

References (6)

  • N.A. Faustino et al.

    Pre-mRNA splicing and human disease

    Genes Dev.

    (2003)
  • Malka Nissim-Rafinia et al.

    The splicing machinery is a genetic modifier of disease severity

    Trends Genet.

    (2006)
  • Michael E. Rolish

    Algorithms for simulating human pre-mRNA splicing decisions

    (2005)
There are more references available in the full text version of this article.

Cited by (0)

View full text