Exploring Williams–Beuren syndrome using myGrid
Vol. 20 Suppl. 1 2003, pages i303–i310
DOI: 10.1093/bioinformatics/bth944
BIOINFORMATICS
Exploring Williams–Beuren syndrome using
my
Grid
R. D. Stevens1, ∗, H. J. Tipney2 , C. J. Wroe1 , T. M. Oinn3 ,
M. Senger3 , P. W. Lord1 , C. A. Goble1 , A. Brass1 and
M. Tassabehji2
1 Department of Computer Science, University of Manchester, Oxford Road, Manchester,
M13 9PL, UK, 2 University of Manchester, Academic Unit of Medical Genetics,
St. Mary’s Hospital, Hathersage Road, M13 0JH, UK and 3 European Bioinformatics
Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Received on January 15, 2004; accepted on March 1, 2004
∗ To whom correspondence should be addressed.
Bioinformatics 20(Suppl. 1) © Oxford University Press 2003; all rights reserved.
Availability: The my Grid software is available via http://www.
mygrid.org.uk
Contact:
1
INTRODUCTION
Bioinformatics already offers a huge selection of data and
analytical resources for a biologist to perform in silico experiments. In such experiments, services representing tools act
upon data, producing more data until a goal is achieved or
hypothesis revealed. With current tools it is possible to reveal
interesting biological insights computationally. A major barrier, however, in utilizing these resources is the time needed
by skilled bioinformaticians to manually and repeatedly
co-ordinate multiple tools to produce a result. Tasks that
take minutes of computational time, actually take days to run
manually. This paper describes the use of the my Grid middleware (Stevens et al., 2003) services to create and manage the
information from running in silico bioinformatics experiments
in a semantically enriched Grid aware environment. This is
done in the context of Williams–Beuren syndrome (WBS),
a microdeletion in a complex region of human chromosome
7, which requires repeated application of a range of standard
bioinformatics techniques to characterize the region deleted
in the syndrome.
Due to the highly repetitive nature of the sequence
flanking/in the Williams–Beuren syndrome critical region
(WBSCR), sequencing of the region is incomplete leaving
documented gaps in the released genomic sequence. In order
to produce a complete and accurate map of the WBSCR,
researchers must constantly search for newly sequenced
human DNA clones which extended into these ‘gap’ regions
(see Section 3). Once placed in this region, these DNA
sequences must be analysed with a battery of prediction tools
to locate putative genes, their regulatory elements, as well as
both characterized and otherwise uncharacterized genes and
their products implicated in WBS.
i303
ABSTRACT
Motivation: In silico experiments necessitate the virtual
organization of people, data, tools and machines.The scientific
process also necessitates an awareness of the experience
base, both of personal data as well as the wider context of
work. The management of all these data and the co-ordination
of resources to manage such virtual organizations and the
data surrounding them needs significant computational infrastructure support.
Results: In this paper, we show that my Grid, middleware
for the Semantic Grid, enables biologists to perform and
manage in silico experiments, then explore and exploit the
results of their experiments. We demonstrate my Grid in the
context of a series of bioinformatics experiments focused
on a 1.5 Mb region on chromosome 7 which is deleted in
Williams–Beuren syndrome (WBS). Due to the highly repetitive nature of sequence flanking/in the WBS critical region
(WBSCR), sequencing of the region is incomplete leaving documented gaps in the released sequence. my Grid was used
in a series of experiments to find newly sequenced human
genomic DNA clones that extended into these ‘gap’ regions in
order to produce a complete and accurate map of the WBSCR.
Once placed in this region, these DNA sequences were analysed with a battery of prediction tools in order to locate
putative genes and regulatory elements possibly implicated
in the disorder. Finally, any genes discovered were submitted
to a range of standard bioinformatics tools for their characterization. We report how my Grid has been used to create
workflows for these in silico experiments, run those workflows
regularly and notify the biologist when new DNA and genes
are discovered. The my Grid services collect and co-ordinate
data inputs and outputs for the experiment, as well as much
provenance information about the performance of experiments
on WBS.
R.D.Stevens et al.
i304
affords that scientist a personalised view of his or her data.
In this paper we describe the in silico experiments required
for exploring WBS, and the use of my Grid services to implement and manage the running of those experiments and their
results.
2 WILLIAMS–BEUREN SYNDROME
Williams–Beuren syndrome1 is a rare, sporadically occurring microdeletion disorder characterized by a unique set of
physical and behavioural features (Morris, 1988). WBS is
caused by a 1.5 Mb deletion (Osborne et al., 2001) located
in chromosome band 7q11.23 (Ewart et al., 1993). WBS is a
complex, multisystem genetic disorder with an intricate phenotype (Ewart et al., 1993; Osborne, 1999; Preus, 1984). The
region commonly deleted in WBS is flanked by highly repetitive regions, ≈ 320–500 kb in length (Peoples et al., 2000)
containing both pseudogenes and genes.
Most WBS individuals have a deletion of ≈ 1.5 Mb, encompassing 24 genes (Tassabehji, 2003) (Fig. 1). A smaller region
within the common WBS deleted region containing the genes
whose absence are critical to the WBS (the WBSCR) phenotype has been identified (Osborne, 1999; Tassabehji et al.,
1999) (Fig. 1).
Many maps of the region have been published (DeSilva
et al., 1999; Peoples et al., 2000; Valero et al., 2000; Osborne
et al., 2001), each with an increasing level of detail, and
the ‘complete’ chromosome 7 sequence was released in
2003 (Hillier, 2003), but still a fully comprehensive map of
the WBSCR is not available. The overriding reason for this
is the complexity and repetitive nature of the WBSCR which
has lead to inconsistencies between published maps and hard
to close gaps in the genomic sequence.
The gaps in the WBSCR may harbour important genes and
associated regulatory elements which are deleted; so defining their composition is crucial for genotype-to-phenotype
correlations. The production of a complete, comprehensive
and robust map of the WBS region is vital if we are to fully
understand the pathology of WBS.
3 WILLIAMS–BEUREN SYNDROME
BIOINFORMATICS IN MY GRID
The biological problem described above has previously been
investigated manually in two major analyses:
1. Retrieve newly submitted human genomic sequences
that extend into the gap. Similarity searches are made
against a range of GenBank databanks using the BLAST
programme BLASTN (Altschul et al., 1997). Repeat
Masker2 is used to search against RepBase Update
1 OMIM: #194050.
2 http://www.repeatmasker.org.
As (...truncated)