An advanced web query interface for biological databases

Database: The Journal of Biological Databases and Curation, Jan 2010

Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objects—that is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs.

Article PDF cannot be displayed. You can download it here:

https://database.oxfordjournals.org/content/2010/baq006.full.pdf

An advanced web query interface for biological databases

Mario Latendresse 0 Peter D. Karp 0 0 Bioinformatics Research Group, SRI International , 333 Ravenswood Avenue, Menlo Park, CA 94025, USA Although most web-based biological databases (DBs) offer some type of web-based form to allow users to author DB queries, these query forms are quite restricted in the complexity of DB queries that they can formulate. They can typically query only one DB, and can query only a single type of object at a time (e.g. genes) with no possible interaction between the objectsthat is, in SQL parlance, no joins are allowed between DB objects. Writing precise queries against biological DBs is usually left to a programmer skillful enough in complex DB query languages like SQL. We present a web interface for building precise queries for biological DBs that can construct much more precise queries than most web-based query forms, yet that is user friendly enough to be used by biologists. It supports queries containing multiple conditions, and connecting multiple object types without using the join concept, which is unintuitive to biologists. This interactive web interface is called the Structured Advanced Query Page (SAQP). Users interactively build up a wide range of query constructs. Interactive documentation within the SAQP describes the schema of the queried DBs. The SAQP is based on BioVelo, a query language based on list comprehension. The SAQP is part of the Pathway Tools software and is available as part of several bioinformatics web sites powered by Pathway Tools, including the BioCyc.org site that contains more than 500 Pathway/Genome DBs. Introduction Biological databases (DBs) now number in the hundreds, and are widely viewed as an essential part of post-genomic molecular biology. However, significant barriers limit biologists access to biological DBs. Existing easy-to-use webbased query interfaces to biological DBs severely limit the complexity of queries that the user can formulate. Users who want to formulate complicated queries must learn both a DB query language such as SQL, and a computer programming language such as C or Java in which to embed those queries and process the results. Learning such languages is time consuming at best, and often presents an insurmountable hurdle for the biologist. One reason is that the semantics of SQL is based on concepts not commonly taught to scientists in the course of their University education (e.g. join). We present a flexible web interface through which biologists and bioinformaticists can author precise queries to biological DBs. The queries that can be written with this interface, called the Structured Advanced Query Page (SAQP), can be as precise as what can be expected from a computer programmer using expressive DB query languages like SQL. But the web interface can be used without programming expertise, and use of the SAQP avoids the types of errors that typically occur when writing computer programsgreatly reducing the barriers to writing precise queries. A precise query is formulated in such a way that it returns what the user wants without superfluous results. To enable precision in a query, an appropriate set of relational operators, as well as direct access to the underlying data, must be provided. A precise query is unlike an imprecise query that returns many results that must be disregarded by the user. A precise query can be simple or complex since if a user expects all the proteins of a DB, this can be done using a simple query whereas all proteins that are products of genes located in specific parts of the genome is a complex query requiring at least two classes of objects and several constraints. A complex query might involve more than one DB class, might include several DBs and specify many constraints, for example, find all metabolic pathways containing more than four reactions for which all enzymes in the pathway are monomeric and find all biochemical reactions that convert a carbohydrate to a phosphorylated carbohydrate, and where the molecular weight of the carbohydrate is less than 100. A simple query typically involve searching one DB and one class, using one or two constraints. Examples include a query that searches for genes by name, or that searches for biochemical reactions by EC number, or for chemical compounds by molecular weight. The flexibility, ease of use and precision of the SAQP are due to (i) readable, interactive, expandable form constructs for specifying search constraints and object attributes; (ii) the inclusion of variables in searches, which allow search components to refer to one another; and (iii) the inclusion of multiple search components within one query. The notion of readability of precise database queries is also presented in the doctoral dissertation of M. Bada (1). The SAQP web interface is based on a DB query language newly developed as part of this project called BioVelo, that is more expressive than SQL, but that has a succinct syntax and a simpler semantics. In fact, the layout of the graphical interface of the SAQP is based on the syntax of BioVelo. We consider BioVelos syntax terse enough to provide a user interface for direct entry of BioVelo queries. This interface, called the Free Form Advanced Query Page (FFAQP), is accessible from the SAQP in one click. Figure 1 illustrates the general architecture of our DB query system. Two Web page interfaces are provided: the FFAQP and the SAQP. In this article, we focus on the SAQP. The development of the SAQP was motivated by the need to provide biologists with the ability to query the large collection of Pathway/Genome DBs (PGDBs) being developed by users of the Pathway Tools software, including the more than 500 PGDBs within SRIs BioCyc collection (2,3) and the more than 200 PGDBs developed by groups outside SRI, such as YeastCyc and MouseCyc (4) (see BioCyc.org for a partial list). Pathway Tools PGDBs are managed using a Frame Knowledge Representation System called Ocelot (5). Ocelot is essentially a Common Lisp-based objectoriented database management system (DBMS) that uses a relational DBMS as a persistent back end. However, the relational aspect of Ocelot is invisible to the Ocelot user. Free Form Advanced Query Page Structured Advanced Query Page Web Browser Pathway Tools Web Server Figure 1. Two web page interfaces for constructing BioVelo queries interact with the Pathway Tools web server that communicates with a BioVelo query processor. Ocelot is an object-oriented DB system that can use a relational DB back end. The FFAQP and the SAQP are accessible at the web site BioCyc.org/query.shtml. In summary, BioVelo serves as a database query language, currently built on Ocelot, and the SAQP is a user-friendly interface built on BioVelo. The BioVelo and SAQP implementations are applicable to any DB built using Ocelot, and thus generalize beyond Pathway Tools DBs such as the BioCyc DBs. Actually, the overall approach used for the SAQP is applicable to other relatio (...truncated)


This is a preview of a remote PDF: https://database.oxfordjournals.org/content/2010/baq006.full.pdf
Article home page: http://database.oxfordjournals.org/content/2010/baq006.abstract

Mario Latendresse, Peter D. Karp. An advanced web query interface for biological databases, Database: The Journal of Biological Databases and Curation, 2010, 2010, DOI: 10.1093/database/baq006