rSNP_Guide, a database system for analysis of transcription factor binding to target sequences: application to SNPs and site-directed mutations (pdf)

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/29/1/312.full.pdf

rSNP_Guide, a database system for analysis of transcription factor binding to target sequences: application to SNPs and site-directed mutations

Julia V. Ponomarenko 0 2 Tatyana I. Merkulova 0 2 Gennady V. Vasiliev 0 2 Zoya B. Levashova 0 2 Galina V. Orlova 0 2 Sergey V. Lavryushev 0 2 Oleg N. Fokin 0 2 Mikhail P. Ponomarenko 0 2 Anatoly S. Frolov 0 2 Akinori Sarai 0 1 2 0 and Chemical Research , RIKEN, 3-1-1 Koyadai, Tsukuba, Japan 1 The Institute of Physical 2 Institute of Cytology and Genetics , 10 Lavrentyev Avenue, Novosibirsk, 630090, Russia rSNP_Guide is a novel curated database system for analysis of transcription factor (TF) binding to target sequences in regulatory gene regions altered by mutations. It accumulates experimental data on naturally occurring site variants in regulatory gene regions and site-directed mutations. This database system also contains the web tools for SNP analysis, i.e., active applet applying weight matrices to predict the regulatory site candidates altered by a mutation. The current version of the rSNP_Guide is supplemented by six sub-databases: (i) rSNP_DB, on DNA-protein interaction caused by mutation; (ii) SYSTEM, on experimental systems; (iii) rSNP_BIB, on citations to original publications; (iv) SAMPLES, on experimentally identified sequences of known regulatory sites; (v) MATRIX, on weight matrices of known TF sites; (vi) rSNP_Report, on characteristic examples of successful rSNP_Tools implementation. These databases are useful for the analysis of natural SNPs and site-directed mutations. The databases are available through the Web, http://wwwmgs.bionet.nsc.ru/mgs/ systems/rsnp/. - Application of Single Nucleotide Polymorphism (SNP) analysis to the human genome is currently among the greatest challenges presented by the human genome sequence initiative (1). This novel research field permits exploration of the influence of specific sequence alterations on disease susceptibility, drug resistance/sensitivity and ultimately health care. The number of experimentally detected SNPs is growing tremendously. Currently the HGMD database (2) contains more than 10 000 SNPs that alter codon translation, more than 1000 that affect splice sites, and less than 200 that influence gene regulatory regions. In the databases, dbSNP (3), HGBASE (4), ALFRED (5) and OMIM (6), SNPs in regulatory and coding regions are represented in a similar ratio. Obviously, functional alteration of highly conserved codons and splice sites, resulting in alteration of protein structure and function, are detected more easily than less conserved regulatory regions such as promoters, enhancers, silencers, introns, etc. (7). Recent experiments (810) have shown that regulatory SNPs may be manifest in several ways, including: (i) alteration of function of a site important for normal regulation; (ii) a difference in affinity of protein binding at such a site; or (iii) acquired function of a site not normally participating in proper regulation. Thus, as has been shown experimentally (9,10), the influence of an SNP cannot be predicted reliably, only by inspection of the local region for potential regulatory elements similar to those of known sequence. Although SNP analysis is only now being applied to regulatory regions, it is being developed using experimental findings in the databases TRANSFAC (11), TRRD (12), COMPEL (13), ACTIVITY (14) and others, which accumulate information not only about naturally occurring site variants, but also resulting from intentional (site-directed) mutagenesis. Among the latter artificial variants, site-directed mutagenesis altering several nucleotides is more informative for SNP analysis of regulatory DNA regions than deletions, insertions or hybrid constructs. Since disease penetration may be affected not only by the presence or absence of a transcription factor (TF) binding site in a regulatory region, but also by quantitative alterations of binding efficiency [e.g., erythroid-specific DNA-binding protein(s) affinity alterations cause -thalassemia; 15], the data on sequence-activity relationships are informative for SNP analysis of regulatory regions. We anticipate that further development of the present database will actually have prescriptive value for specific applications in disease. From this perspective, our web-resource rSNP_Guide integrates experimental data on natural SNPs with sequence variations generated artificially. The core of this resource is the database rSNP_DB. It compiles data on alterations in DNA binding by nuclear proteins observed due to natural and experimental sequence variations. This information is represented in a simple format adopted for computer analysis. rSNP_DB is supplemented by four databases: (i) SYSTEM, experimental conditions; (ii) rSNP_BIB, references to original publications; (iii) SAMPLES, multiple alignments of the known TF-sites sequences; and (iv) MATRIX, weight matrices for TF site recognition. To apply the information stored in these databases to SNP-analysis of DNA regulatory regions, we have developed the Java-script applet, rSNP_Tools. We have tested this rSNP_Tools on a series of examples, which represent both naturally occurring mutations and relevant artificial constructs. These test results are documented in the rSNP_Report database and are helpful for analysis of SNPs and mutagenesis. The rSNP_Guide is available through the Web, http:// wwwmgs.bionet.nsc.ru/mgs/programs/rsnp/. DATA REPRESENTATION A graphical representation of the rSNP_Guide components and sources of information is given in Figure 1. In this figure, the arrows link the components of the rSNP_Guide and related web resources. Initial information on the naturally occurring mutations is extracted from original publications and the databases HGMD (2), dbSNP (3), HGBASE (4), ALFRED (5) and OMIM (6), whereas the site-directed mutagenesis data are taken from TRANSFAC (11), TRRD (12), COMPEL (13) and ACTIVITY (14). Using the original publications (rSNP_BIB), we document the experimental conditions (SYSTEM). Taking into account experimental conditions, the data on alterations in nuclear protein binding to DNA with point mutations are accumulated in rSNP_DB. Next, typical examples of the rSNP_DB entries are chosen and investigated using the Javascript applet rSNP_Tools, which implements SAMPLES and MATRIX (16). Finally, the results are stored in the database rSNP_Reports. Each entry of the core database, rSNP_DB, contains the information on DNAprotein interaction alterations caused by mutation. The entry has 16 descriptive field names (Fig. 2). These field names are color-coded. If a user clicks the field name, the Help function is activated in a separate window, which contains information about formatting the data, examples, etc. With the keywords, the database can be queried using SRS (17). The second database, SYSTEM, contains the accumulating data on experimental systems. The entry has nine descriptive field names. By analogy to rSNP_DB, each field is supported by the Help function. The detailed description of the SYSTEM format is given in (14). The third database (...truncated)