The UCSC Table Browser data retrieval tool

Nucleic Acids Research, Jan 2004

The University of California Santa Cruz (UCSC) Table Browser (http://genome.ucsc.edu/cgi‐bin/hgText) provides text‐based access to a large collection of genome assemblies and annotation data stored in the Genome Browser Database. A flexible alternative to the graphical‐based Genome Browser, this tool offers an enhanced level of query support that includes restrictions based on field values, free‐form SQL queries and combined queries on multiple tables. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tab‐ delimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser User’s Guide located on the UCSC website provides instructions and detailed examples for constructing queries and configuring output.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/32/suppl_1/D493.full.pdf

The UCSC Table Browser data retrieval tool

Donna Karolchik 0 Angela S. Hinrichs 0 Terrence S. Furey 0 Krishna M. Roskin 0 Charles W. Sugnet 0 David Haussler 0 W. James Kent 0 0 Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), School of Engineering , 1156 High Street, Santa Cruz, CA 95064-1077, USA The University of California Santa Cruz (UCSC) Table Browser (http://genome.ucsc.edu/cgi-bin/ hgText) provides text-based access to a large collection of genome assemblies and annotation data stored in the Genome Browser Database. A flexible alternative to the graphical-based Genome Browser, this tool offers an enhanced level of query support that includes restrictions based on field values, free-form SQL queries and combined queries on multiple tables. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tabdelimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser User's Guide located on the UCSC website provides instructions and detailed examples for constructing queries and configuring output. - The UCSC Table Browser data retrieval tool is built on top of the Genome Browser Database, a set of MySQL relational databases that each store sequence and annotation data for one genome assembly (1). Tables within the databases may be differentiated by whether the data are based on genomic startstop coordinates or are independent of position. Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments and various other annotations. Each of the annotation tracks displayed in the graphical Genome Browser is based on one or more positional tables. Data associated with custom annotation tracks active within the users Table Browser session are also available as positional tables. Non-positional tables contain data not tied to genomic location, for example a table that correlates a Genethon marker name with a Marshfield marker name. Some nonpositional tables relate internal numeric mRNA IDs to extended information such as author, tissue or keyword. Other meta tables contain information about the structure of the database itself or describe external files containing sequence data. Because of the large size of the data set stored in each database, particular attention has been paid to maintaining adequate interactive performance. The databases contain optimizations to support range-based queries from the Table Browser and Genome Browser. Smaller tables are indexed on a few critical fields and the data are presorted prior to loading into the database. With larger tables, the data are separated by chromosome into smaller tables, and a binning scheme is implemented on the larger chromosome tables. The document http://genome.ucsc.edu/goldenPath/ gbdDescriptions.html contains a detailed description of the database tables and fields, which are dumped weekly into downloadable tab-delimited files. In addition to the inclusion of the latest human and mouse assemblies, the Genome Browser Database has expanded in the past year to include rat, worm and a collection of species targeted by the NISC Comparative Sequencing Program (2), with plans to add support for several additional genomes in the coming year. Recently, the UCSC Genome Bioinformatics group has placed considerable emphasis on comparative genome analysis. The group has been active in the analysis of evolutionary conservation and divergence among species (3,4), phylogenetic analysis of rates of substitution (5) and multiple species alignments. This research has resulted in the addition of several new types of annotation data to the Genome Browser Database. The axtChain program written by Jim Kent produces chained BLASTZ alignments between two species (6). This alignment tool uses a gap scoring system that allows longer gaps than traditional affine gap scoring systems and can also tolerate gaps in both species simultaneously. Further processing of the chained alignments with the chainNet program outputs an alignment net that shows the best chain for every part of the genome. UCSC has also been collaborating closely with the Penn State University Bioinformatics Group to produce three-way multiple species alignments using Webb Millers multiz program, which takes BLASTZ and axtBest alignments as input (7,8). Many research scientists are familiar with the UCSC Genome Browser (9), the graphical interface to the Genome The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors Browser Database that displays requested portions of genome assemblies together with a series of aligned annotation tracks. Despite its ease of use, situations exist in which a graphical browser may not be the optimal tool for working with genomic data. A user might wish to view the raw data or examine the relationships between the tables underlying the browser. It is often desirable to filter the display output with greater restrictions than are offered by the Genome Browser, or to output the data in a text-based format that can be imported into other programs. The UCSC Table Browserwhich may be accessed directly at http://genome.ucsc.edu/cgi-bin/hgText or through the Tables link on the UCSC Genome Bioinformatics home page (http://genome.ucsc.edu)provides a powerful and flexible alternative for querying and manipulating the annotation tables within the Genome Browser Database. Using Table Browser form-based or free-form queries, one may quickly and easily extract subsets of the database, in many cases eliminating the need to set up a local copy of the MySQL database. By configuring the tools output options, the user can generate a custom annotation track that may be automatically added to the graphical browser session, or create a file in one of several output formats that can be used as input into other programs. The Table Browser can also display basic statistics calculated over a selected subset of data. BASIC DATA QUERIES In its most basic form, the Table Browser can be used to retrieve a specific subset of records from a table in a selected genome assembly. The user specifies a position of interest within the assembly (or the keyword genome to access data from the entire assembly), selects a table, and then chooses the Get all fields option. The Table Browser displays the query results in a tab-delimited text format that can be easily downloaded and imported into text editors, spreadsheets and other databases, or may be further processed by the users own scripts. For example, a user who is examining alternative splicing in the human genome might be interested in downloading the indices of all mRNA sequences that align to a chromosomal region containing a particular gene. One would set the Table Br (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/32/suppl_1/D493.full.pdf
Article home page: http://nar.oxfordjournals.org/content/32/suppl_1/D493.abstract

Donna Karolchik, Angela S. Hinrichs, Terrence S. Furey, Krishna M. Roskin, Charles W. Sugnet, David Haussler, W. James Kent. The UCSC Table Browser data retrieval tool, Nucleic Acids Research, 2004, pp. D493-D496, 32/suppl 1, DOI: 10.1093/nar/gkh103