Exploring Williams–Beuren syndrome using myGrid

Bioinformatics, Aug 2004

Motivation: In silico experiments necessitate the virtual organization of people, data, tools and machines. The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work. The management of all these data and the co-ordination of resources to manage such virtual organizations and the data surrounding them needs significant computational infra-structure support. Results: In this paper, we show that myGrid, middleware for the Semantic Grid, enables biologists to perform and manage in silico experiments, then explore and exploit the results of their experiments. We demonstrate myGrid in the context of a series of bioinformatics experiments focused on a 1.5 Mb region on chromosome 7 which is deleted in Williams–Beuren syndrome (WBS). Due to the highly repetitive nature of sequence flanking/in the WBS critical region (WBSCR), sequencing of the region is incomplete leaving documented gaps in the released sequence. myGrid was used in a series of experiments to find newly sequenced human genomic DNA clones that extended into these ‘gap’ regions in order to produce a complete and accurate map of the WBSCR. Once placed in this region, these DNA sequences were analysed with a battery of prediction tools in order to locate putative genes and regulatory elements possibly implicated in the disorder. Finally, any genes discovered were submitted to a range of standard bioinformatics tools for their characterization. We report how myGrid has been used to create workflows for these in silico experiments, run those workflows regularly and notify the biologist when new DNA and genes are discovered. The myGrid services collect and co-ordinate data inputs and outputs for the experiment, as well as much provenance information about the performance of experiments on WBS. Availability: The myGrid software is available via http://www.mygrid.org.uk

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/20/suppl_1/i303.full.pdf

Exploring Williams–Beuren syndrome using myGrid

Vol. 20 Suppl. 1 2003, pages i303–i310 DOI: 10.1093/bioinformatics/bth944 BIOINFORMATICS Exploring Williams–Beuren syndrome using my Grid R. D. Stevens1, ∗, H. J. Tipney2 , C. J. Wroe1 , T. M. Oinn3 , M. Senger3 , P. W. Lord1 , C. A. Goble1 , A. Brass1 and M. Tassabehji2 1 Department of Computer Science, University of Manchester, Oxford Road, Manchester, M13 9PL, UK, 2 University of Manchester, Academic Unit of Medical Genetics, St. Mary’s Hospital, Hathersage Road, M13 0JH, UK and 3 European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Received on January 15, 2004; accepted on March 1, 2004 ∗ To whom correspondence should be addressed. Bioinformatics 20(Suppl. 1) © Oxford University Press 2003; all rights reserved. Availability: The my Grid software is available via http://www. mygrid.org.uk Contact: 1 INTRODUCTION Bioinformatics already offers a huge selection of data and analytical resources for a biologist to perform in silico experiments. In such experiments, services representing tools act upon data, producing more data until a goal is achieved or hypothesis revealed. With current tools it is possible to reveal interesting biological insights computationally. A major barrier, however, in utilizing these resources is the time needed by skilled bioinformaticians to manually and repeatedly co-ordinate multiple tools to produce a result. Tasks that take minutes of computational time, actually take days to run manually. This paper describes the use of the my Grid middleware (Stevens et al., 2003) services to create and manage the information from running in silico bioinformatics experiments in a semantically enriched Grid aware environment. This is done in the context of Williams–Beuren syndrome (WBS), a microdeletion in a complex region of human chromosome 7, which requires repeated application of a range of standard bioinformatics techniques to characterize the region deleted in the syndrome. Due to the highly repetitive nature of the sequence flanking/in the Williams–Beuren syndrome critical region (WBSCR), sequencing of the region is incomplete leaving documented gaps in the released genomic sequence. In order to produce a complete and accurate map of the WBSCR, researchers must constantly search for newly sequenced human DNA clones which extended into these ‘gap’ regions (see Section 3). Once placed in this region, these DNA sequences must be analysed with a battery of prediction tools to locate putative genes, their regulatory elements, as well as both characterized and otherwise uncharacterized genes and their products implicated in WBS. i303 ABSTRACT Motivation: In silico experiments necessitate the virtual organization of people, data, tools and machines.The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work. The management of all these data and the co-ordination of resources to manage such virtual organizations and the data surrounding them needs significant computational infrastructure support. Results: In this paper, we show that my Grid, middleware for the Semantic Grid, enables biologists to perform and manage in silico experiments, then explore and exploit the results of their experiments. We demonstrate my Grid in the context of a series of bioinformatics experiments focused on a 1.5 Mb region on chromosome 7 which is deleted in Williams–Beuren syndrome (WBS). Due to the highly repetitive nature of sequence flanking/in the WBS critical region (WBSCR), sequencing of the region is incomplete leaving documented gaps in the released sequence. my Grid was used in a series of experiments to find newly sequenced human genomic DNA clones that extended into these ‘gap’ regions in order to produce a complete and accurate map of the WBSCR. Once placed in this region, these DNA sequences were analysed with a battery of prediction tools in order to locate putative genes and regulatory elements possibly implicated in the disorder. Finally, any genes discovered were submitted to a range of standard bioinformatics tools for their characterization. We report how my Grid has been used to create workflows for these in silico experiments, run those workflows regularly and notify the biologist when new DNA and genes are discovered. The my Grid services collect and co-ordinate data inputs and outputs for the experiment, as well as much provenance information about the performance of experiments on WBS. R.D.Stevens et al. i304 affords that scientist a personalised view of his or her data. In this paper we describe the in silico experiments required for exploring WBS, and the use of my Grid services to implement and manage the running of those experiments and their results. 2 WILLIAMS–BEUREN SYNDROME Williams–Beuren syndrome1 is a rare, sporadically occurring microdeletion disorder characterized by a unique set of physical and behavioural features (Morris, 1988). WBS is caused by a 1.5 Mb deletion (Osborne et al., 2001) located in chromosome band 7q11.23 (Ewart et al., 1993). WBS is a complex, multisystem genetic disorder with an intricate phenotype (Ewart et al., 1993; Osborne, 1999; Preus, 1984). The region commonly deleted in WBS is flanked by highly repetitive regions, ≈ 320–500 kb in length (Peoples et al., 2000) containing both pseudogenes and genes. Most WBS individuals have a deletion of ≈ 1.5 Mb, encompassing 24 genes (Tassabehji, 2003) (Fig. 1). A smaller region within the common WBS deleted region containing the genes whose absence are critical to the WBS (the WBSCR) phenotype has been identified (Osborne, 1999; Tassabehji et al., 1999) (Fig. 1). Many maps of the region have been published (DeSilva et al., 1999; Peoples et al., 2000; Valero et al., 2000; Osborne et al., 2001), each with an increasing level of detail, and the ‘complete’ chromosome 7 sequence was released in 2003 (Hillier, 2003), but still a fully comprehensive map of the WBSCR is not available. The overriding reason for this is the complexity and repetitive nature of the WBSCR which has lead to inconsistencies between published maps and hard to close gaps in the genomic sequence. The gaps in the WBSCR may harbour important genes and associated regulatory elements which are deleted; so defining their composition is crucial for genotype-to-phenotype correlations. The production of a complete, comprehensive and robust map of the WBS region is vital if we are to fully understand the pathology of WBS. 3 WILLIAMS–BEUREN SYNDROME BIOINFORMATICS IN MY GRID The biological problem described above has previously been investigated manually in two major analyses: 1. Retrieve newly submitted human genomic sequences that extend into the gap. Similarity searches are made against a range of GenBank databanks using the BLAST programme BLASTN (Altschul et al., 1997). Repeat Masker2 is used to search against RepBase Update 1 OMIM: #194050. 2 http://www.repeatmasker.org. As (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/20/suppl_1/i303.full.pdf
Article home page: http://bioinformatics.oxfordjournals.org/content/20/suppl_1/i303.abstract

R. D. Stevens, H. J. Tipney, C. J. Wroe, T. M. Oinn, M. Senger, P. W. Lord, C. A. Goble, A. Brass, M. Tassabehji. Exploring Williams–Beuren syndrome using myGrid, Bioinformatics, 2004, pp. i303-i310, 20/suppl 1, DOI: 10.1093/bioinformatics/bth944