Kin-Driver: a database of driver mutations in protein kinases
Franco L. Simonetti
2
Cristian Tornador
1
Nuria Nabau-Moreto
0
Miguel A. Molina-Vila
3
Cristina Marino-Buslje
2
0
Computational Genomics Laboratory, Genetics Department, Institut de Biologia Universitat de Barcelona (IBUB), Facultat de Biologia
,
Av Diagonal 645
1
Pompeu Fabra University (UPF), Dept. de Tecnologies de la Informacio i les Comunicacions.
Tanger 122-140 08018, Barcelona
,
Spain
2
Fundacio n Instituto Leloir
,
Av. Patricias Argentinas 435. C1405BWE, Buenos Aires
,
Argentina
3
Breakthrough Cancer Research Unit, Dexeus University Hospital
,
Sabino Arana 5-19, Barcelona
,
Spain
Somatic mutations in protein kinases (PKs) are frequent driver events in many human tumors, while germ-line mutations are associated with hereditary diseases. Here we present Kin-driver, the first database that compiles driver mutations in PKs with experimental evidence demonstrating their functional role. Kin-driver is a manual expert-curated database that pays special attention to activating mutations (AMs) and can serve as a validation set to develop new generation tools focused on the prediction of gain-of-function driver mutations. It also offers an easy and intuitive environment to facilitate the visualization and analysis of mutations in PKs. Because all mutations are mapped onto a multiple sequence alignment, analogue positions between kinases can be identified and tentative new mutations can be proposed for studying by transferring annotation. Finally, our database can also be of use to clinical and translational laboratories, helping them to identify uncommon AMs that can correlate with response to new antitumor drugs. The website was developed using PHP and JavaScript, which are supported by all major browsers; the database was built using MySQL server. Kin-driver is available at: http://kin-driver.leloir.org.ar/ VC The Author(s) 2014. Published by Oxford University Press. Page 1 of 5 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Introduction
Cancer arises due to somatic mutations that result in a
growth advantage for the tumor cells. These mutations are
known as drivers and can be divided into two groups:
(i) loss-of-function mutations, which inactivate tumor
suppressor genes (from here on inactivating mutations)
and (ii) activating or gain-of-function mutations that
transform proto-oncogenes into oncogenes. Somatic
mutations in protein kinases (PKs) are frequent driver events in
many human tumor types and functionally relevant
germline mutations are associated with hereditary disorders.
Clinical laboratories worldwide are analysing
thousands of human tumor samples, looking for activating
mutations (AMs) in certain PKssuch as EGFR, HER2 or
BRAFthat correlate with good responses to new
generations of antitumor drugs that are kinase inhibitors.
Mutations either new or not functionally characterized are
often found. In addition, whole-genomic sequencing
of human malignancies and other diseases is identifying
thousands of changes in PKs, but most of them are likely
to be passenger mutations or even polymorphisms.
Discriminating driver mutations in PKs is a significant
challenge that is hampered by the fact that there are no
curated sets of true driver and passenger alterations. The
extent of this challenge was evidenced when three
state-ofthe-art methods, namely MutationAssessor (1), TransFITC
(2) and FATHMM (3), were fed with well-established,
tumor-associated AMs of PKs and failed to predict them as
high impact or disease related (4). Therefore, it is uncertain
that the current tools, which are generally based on
conservation calculations, can be trusted to screen whole-genome
sequencing data in search of driver mutations in PKs. New
methods need to be developed and unambiguously assessed
datasets of driver mutations are required to train and test
them.
Mutation recruitment
Recruitment procedure is described by Molina-Vila et al.
(4). Briefly, in the case of proto-oncogenic kinases,
abstracts and titles of PubMed manuscripts were mined with
the kinase name, plus words activating, gain of function
or constitutive activation. For tumor suppressor kinases,
the words inactivating and loss of function were used.
Furthermore, all UniProt entries for human kinases
were mined for the same keywords to identify new
variants. The references were manually checked to confirm its
status.
For each annotated mutation, all samples with that
mutation were retrieved from COSMIC using the Biomart
perl API.
MSA construction
Human STK and TKs domains were obtained from Pfam
families PF00069 and PF07714, respectively. To account
for classification problems in Pfam families, some
sequences incorrectly classified as TK were moved from this
alignment to the corresponding one and realigned with
T-coffee (5). For each MSA, a sequence logo was
calculated using seq2logo (6).
Mutation relative frequency calculation
A relative frequency was computationally calculated for all
mutations of the 518 PKs of the COSMIC database release
70 (7) as the frequency of mutation in COSMIC for that
gene times 1000 over the total number of tumor samples
sequenced for that gene.
All mutations with a relative frequency above 2 (0.2%)
were then checked in PubMed by introducing the name of
the mutation (e.g. P267R) and added to the dataset if they
were found to have functional effects. EGFR mutations
conferring a response rate to erlotinib higher than 50%,
according to the EGFR somatic mutations database (http://
www.somaticmutations-egfr.info/), were also added.
Kin-Driver database offers a comprehensive set of 560
primary AMs in the kinase and justamembrane (JM) domains
of 39 PKs and 83 inactivating mutations in 5 kinases
compiled by a two-step systematic search for each of the 518
PKs present in the complete kinase study of the COSMIC
database (7) (release 70). Only primary mutations with
experimental evidence demonstrating their activating/
inactivating role were included.
Kin-Driver is a MySQL relational database offering
structural and sequence data cross-referenced with
COSMIC and with our set of curated mutations. It also
provides the frequencies of these mutations in actual tumor
samples. The CosmicMart service is used to fetch the data,
so frequencies for new mutations can easily be added and
data are kept up to date with the periodic COSMIC
releases.
Our database can be interrogated by protein name,
gene name or keyword, amino acid position or specific
mutation name (i.e. T790M). Range or specific mutations can
also be used to look for driver mutations in other PKs in
equivalent positions (see later). Finally, the database can
be browsed by PK name, domain, tissue or type of
histology, and these last two attributes obtained from the
corresponding mutated samples are available (...truncated)