Predicting genes expressed via −1 and +1 frameshifts

Nucleic Acids Research, Jan 2004

Computational identification of ribosomal frameshift sites in genomic sequences is difficult due to their diverse nature, yet it provides useful information for understanding the underlying mechanisms and discovering new genes. We have developed an algorithm that searches entire genomic or mRNA sequences for frameshifting sites, and implements the algorithm as a web-based program called FSFinder (Frameshift Signal Finder). The current version of FSFinder is capable of finding −1 frameshift sites on heptamer sequences X XXY YYZ, and +1 frameshift sites for two genes: protein chain release factor B (prfB) and ornithine decarboxylase antizyme (oaz). We tested FSFinder on ∼190 genomic and partial DNA sequences from a number of organisms and found that it predicted frameshift sites efficiently and with greater sensitivity and specificity than existing approaches. It has improved sensitivity because it considers many known components of a frameshifting cassette and searches these components on both + and − strands, and its specificity is increased because it focuses on overlapping regions of open reading frames and prioritizes candidate frameshift sites. FSFinder is useful for discovering unknown genes that utilize alternative decoding, as well as for analyzing frameshift sites. It is freely accessible at http://wilab.inha.ac.kr/FSFinder/.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/32/16/4884.full.pdf

Predicting genes expressed via −1 and +1 frameshifts

Sanghoon Moon 2 Yanga Byun 2 Hong-Jin Kim 1 2 Sunjoo Jeong 0 2 Kyungsook Han 2 0 Department of Molecular Biology, Dankook University , Seoul 140-714, Korea 1 College of Pharmacy, Chung-Ang University , Seoul 156-756, Korea 2 School of Computer Science and Engineering, Inha University , Inchon 402-751, Korea Computational identification of ribosomal frameshift sites in genomic sequences is difficult due to their diverse nature, yet it provides useful information for understanding the underlying mechanisms and discovering new genes. We have developed an algorithm that searches entire genomic or mRNA sequences for frameshifting sites, and implements the algorithm as a web-based program called FSFinder (Frameshift Signal Finder). The current version of FSFinder is capable of finding -1 frameshift sites on heptamer sequences X XXY YYZ, and 11 frameshift sites for two genes: protein chain release factor B (prfB) and ornithine decarboxylase antizyme (oaz). We tested FSFinder on 190 genomic and partial DNA sequences from a number of organisms and found that it predicted frameshift sites efficiently and with greater sensitivity and specificity than existing approaches. It has improved sensitivity because it considers many known components of a frameshifting cassette and searches these components on both + and - strands, and its specificity is increased because it focuses on overlapping regions of open reading frames and prioritizes candidate frameshift sites. FSFinder is useful for discovering unknown genes that utilize alternative decoding, as well as for analyzing frameshift sites. It is freely accessible at http://wilab.inha.ac.kr/FSFinder/. - INTRODUCTION Programmed ribosomal frameshifting is involved in the expression of certain genes in a wide range of organisms such as viruses, bacteria and eukaryotes including humans (15). In this process, the ribosome switches to an alternative frame at a specific site in response to special signals in the messenger RNA (4). Programmed frameshifting plays a significant role in morphogenesis, autogenous control and in producing alternative enzymatic activities (6). The most common frameshift is a 1 frameshift, in which the ribosome slips a single nucleotide in the upstream direction. The major elements of 1 frameshifting consist of a slippery site, where the ribosome changes reading frames, and a stimulatory RNA structure such as a pseudoknot or a stemloop located a few nucleotides downstream (4,69). It is generally accepted that ribosomes pause at 1 frameshifts, but Kontos et al. (7) report that pausing is not sufficient to mediate frameshifting. Most slippery sites consist of a heptameric sequence of the form X XXY YYZ in the incoming 0-frame (10), but there are other slippery sequences that do not conform to this motif (5). The slippery heptamer is separated from the stimulatory structure by a sequence of 59 nt, the so-called spacer (3,8). The length of the spacer is known to influence the efficiency of frameshifting. Frameshifts typically produce fusion proteins in which the N- and C-terminal domains are encoded by overlapping open reading frames (ORFs) (9), as shown in Figure 1. +1 frameshifts are much less common than 1 frameshifts but have been observed in diverse organisms (6). Escherichia coli prfB encoding release factor 2 (RF2) is a well-known gene that utilizes +1 frameshifting (11,12). In RF2 frameshifting, a ShineDalgarno (SD) sequence is often observed upstream of a slippery sequence, normally CUU UGA C and in a single known case CUU UAA C (12). Several +1 frameshift sites have also been recognized in eukaryotic mRNA. For example, the expression of mammalian antizyme 1 (AZ1) requires a +1 frameshift, and the frameshift signal consists of a slippery sequence and two stimulatory elementsa sequence of unknown function, upstream of the slippery sequence, and a pseudoknot (13). Computational identification of frameshift sites from genomic sequences is difficult since the sequence requirements for frameshifting cassettes are diverse and highly dependent on the organism. Several computational approaches have been attempted, but only a few are publicly available. The model for eukaryotic 1 frameshifting developed by Bekaert et al. (8) only considers H-type pseudoknots as stimulatory structures and misses many frameshift sites with other stimulatory structures. Hammell et al. (9) developed a program to identify 1 frameshift sites in prokaryotic and eukaryotic DNA sequences, but the sensitivity of their approach is low; it misses many frameshift sites because it only considers downstream pseudoknots, and its definition of a pseudoknot is too restrictive. For example, their approach does not locate the frameshift sites in Rous sarcoma virus (RSV), because loops 1 and 2 of the pseudoknot are larger than permitted by their approach. FreqAnalysis developed by Shah et al. (14) is usable to identify simple novel slippery sequences, but it does not take in consideration existence of stimulators. A semi-automated approach by Ivanov et al. (13) finds a gene where antizyme frameshifting is expected to occur and then identifies the frameshift. While this approach has been shown to be successful for identifying ornithine decarboxylase antizyme (oaz) frameshifting, it omits universality. There are also computational approaches that identify frameshifting errors in sequencing when the reference protein sequences are available (1517). In this paper, we present an algorithm for locating 1 and +1 frameshift sites of certain types in genomic or mRNA sequences. The algorithm is intended to find 1 frameshift sites of X XXY YYZ type in viruses, bacteria and eukaryotes, and considers pseudoknots as well as simple stemloops as downstream stimulatory structures. It also allows the user to change the stem and loop sizes from their default values. +1 frameshift signals are too diverse among different organisms. Therefore, the algorithm currently finds only those frameshift sites that are conserved among many species, namely frameshift sites used in genes encoding protein chain release factor B (prfB) and ornithine decarboxylase antizyme (oaz). The algorithm has been implemented as a web-based application program called FSFinder (Frameshift Signal Finder), and is accessible at http://wilab.inha.ac.kr/FSFinder/. COMPUTATIONAL MODEL Components of frameshift signals We have modified the computational model for 1 frameshift signals of Hammell et al. (9) to improve its sensitivity and selectivity. Sequences of three codons (9 nt) in a genomic sequence are first examined for possible slippery sequences of the form X XXY YYZ. In this sequence X and Z can be any nucleotide, and Y can be A or U (in Hammells model, Z is either A, U or C). If a slippery sequence is identified, FSFinder searches for a downstream structure by sliding 411 nt along the spacer. Figure 2 shows a programmed 1 frameshift site with a pseudoknot as stimulatory structure. The pseudoknot (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/32/16/4884.full.pdf
Article home page: http://nar.oxfordjournals.org/content/32/16/4884.abstract

Sanghoon Moon, Yanga Byun, Hong-Jin Kim, Sunjoo Jeong, Kyungsook Han. Predicting genes expressed via −1 and +1 frameshifts, Nucleic Acids Research, 2004, pp. 4884-4892, 32/16, DOI: 10.1093/nar/gkh829