FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization
FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization
Hsiang-Yu Yuan 2
Jen-Jie Chiou 1
Wen-Hsien Tseng 1
Chia-Hung Liu 1
Chuan-Kun Liu 0
Yi-Jung Lin 0
Hui-Hung Wang 2
Adam Yao 0 2
Yuan-Tsong Chen 2
Chun-Nan Hsu 1
0 National Genotyping Center , Academia Sinica, Taipei , Taiwan
1 Institute of Information Science , Academia Sinica, Taipei , Taiwan
2 Institute of Biomedical Sciences , Academia Sinica, Taipei , Taiwan
Single nucleotide polymorphism (SNP) prioritization based on the phenotypic risk is essential for association studies. Assessment of the risk requires access to a variety of heterogeneous biological databases and analytical tools. FASTSNP (function analysis and selection tool for single nucleotide polymorphisms) is a web server that allows users to efficiently identify and prioritize high-risk SNPs according to their phenotypic risks and putative functional effects. A unique feature of FASTSNP is that the functional effect information used for SNP prioritization is always up-to-date, because FASTSNP extracts the information from 11 external web servers at query time using a team of web wrapper agents. Moreover, FASTSNP is extendable by simply deploying more Web wrapper agents. To validate the results of our prioritization, we analyzed 1569 SNPs from the SNP500Cancer database. The results show that SNPs with a high predicted risk exhibit low allele frequencies for the minor alleles, consistent with a well-known finding that a strong selective pressure exists for functional polymorphisms. We have been using FASTSNP for 2 years and FASTSNP enables us to discover a novel promoter polymorphism. FASTSNP is available at http://fastsnp.ibms.sinica. edu.tw.
INTRODUCTION
An important approach to disease gene mapping is
investigating whether a single nucleotide polymorphism (SNP) is
functionally involved in a disease. For complex diseases,
the problem is complicated because, unlike Mendelian
diseases, their genetic causes might involve many genes and
hundreds of alleles. Although there are millions of SNPs
deposited in public SNP databases, only a small proportion
of them are functional polymorphisms that contribute to
disease phenotypes. Thus, prioritizing SNPs based on their
phenotypic risks is essential for association studies (
1
).
Assessment of the risk requires access to a variety of
heterogeneous biological databases and analytical tools.
FASTSNP (function analysis and selection tool for single
nucleotide polymorphisms) is a web server that allows users
to efficiently identify the SNPs most likely to have functional
effects. It prioritizes SNPs according to 13 phenotypic risks
and putative functional effects, such as changes to the
transcriptional level, pre-mRNA splicing, protein structure and
so on. A unique feature of FASTSNP is that the prediction
of functional effects is always based on the most up-to-date
information, which FASTSNP extracts from 11 external web
servers at query time using a team of re-configurable web
wrapper agents (
2,3
). These web wrapper agents automate
web browsing and data extraction and can be easily configured
and maintained with a tool that uses a machine learning
algorithm. This allows users to configure/repair a web wrapper
agent without programming. Another benefit of using web
wrapper agents is that FASTSNP is extendable, so we can
include new functions by simply deploying more web wrapper
agents. In this manner, we have already built several new
functionalities, such as the inclusion of information on
haplotype blocks from HapMap (4).
SNP prioritization
Recent studies show that SNPs may have functional effects
on the following.
(i) protein structures, by changing single amino acids (
5,6
);
(ii) transcriptional regulation, by affecting transcription
factor binding sites in promoter or intronic enhancer
regions (
7,8
); and
(iii) alternative splicing regulation, by disrupting exonic
splicing enhancers (
9
) or silencers.
SNPs may also lead to premature termination of peptides
(non-sense), which would disable the protein function. Each of
these distinct functional effects may incur a risk that causes a
disease. Therefore, to prioritize SNPs for the study of complex
diseases, it is critical to identify the functional variants that
are most likely to have functional effects leading to disease
phenotypes before genotyping. Based on previous studies of
the functional effects of polymorphisms, Tabor et al. (
1
)
presented a prioritization strategy that associates the relative
risk of a SNP with its location and the type of sequence
variants. We extended their strategy with our recent findings and
developed a decision tree to assess the risk of a SNP. The
decision tree, shown in Figure 1, classifies a SNP into 1 of
13 types of the functional effects, each of which is assigned a
risk ranking number between 0 and 5. A high risk rank implies
a high-risk level. Table 1 gives the definitions of the function
types, (...truncated)