MADGE: scalable distributed data management software for cDNA microarrays (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/19/1/87.full.pdf

MADGE: scalable distributed data management software for cDNA microarrays

Richard A. McIndoe 0 Aaron Lanzen 0 Kimberly Hurtz 0 0 Department of Pathology, Immunology and Laboratory Medicine, University of Florida , Gainesville, FL 32610, USA Motivation: The human genome project and the development of new high-throughput technologies have created unparalleled opportunities to study the mechanism of diseases, monitor the disease progression and evaluate effective therapies. Gene expression profiling is a critical tool to accomplish these goals. The use of nucleic acid microarrays to assess the gene expression of thousands of genes simultaneously has seen phenomenal growth over the past five years. Although commercial sources of microarrays exist, investigators wanting more flexibility in the genes represented on the array will turn to in-house production. The creation and use of cDNA microarrays is a complicated process that generates an enormous amount of information. Effective data management of this information is essential to efficiently access, analyze, troubleshoot and evaluate the microarray experiments. Results: We have developed a distributable software package designed to track and store the various pieces of data generated by a cDNA microarray facility. This includes the clone collection storage data, annotation data, workflow queues, microarray data, data repositories, sample submission information, and project/investigator information. This application was designed using a 3-tier client server model. The data access layer (1st tier) contains the relational database system tuned to support a large number of transactions. The data services layer (2nd tier) is a distributed COM server with full database transaction support. The application layer (3rd tier) is an internet based user interface that contains both client and server side code for dynamic interactions with the user. Availability: This software is freely available to academic institutions and non-profit organizations at http://www. genomics.mcg.edu/niddkbtc. Contact: - INTRODUCTION A result of the human genome project is the exponential increase in the amount of DNA sequence information available to researchers to use in their experimental efforts. This increase has fueled a genomic revolution for investigators. The paradigm of analyzing a single gene effect in a biological system has shifted to a global systems analysis. Global gene expression analysis at the RNA level offers the first glimpse into the future of organizing and using genomic information. Using this technology, investigators can simultaneously monitor the RNA levels of a large number of genes or even the entire genome in the context of their biological system. In this article, we will describe a software application created to manage data generated in the creation of DNA microarrays spotted onto glass slides and use two color hybridization for data acquisition. The basic strategy for a two color cDNA microarray experiment is to isolate RNA from two sources, a reference and an experimental sample (DeRisi et al., 1996, 1997; Eisen and Brown, 1999; Schena et al., 1995; Shalon et al., 1996). The RNA samples are converted to cDNA and labeled with a fluorophore, typically the reference is labeled with Cy3 and the experimental with Cy5. These two probes are combined and hybridized to the microarray. Following the hybridization and washes, the array is scanned at two wavelengths to detect the labeled cDNA that has hybridized to the array. The two computer images produced from the scanner are combined and the data for each spot (gene) is collected (along with background and error measurements). The data is expressed in the form of a ratio of experimental expression to reference expression. The hybridizations are repeated multiple times to ensure reproducibility and confidence in the measurement. Once the data from several hybridizations are generated, a variety of clustering and statistical methods can be used to help the investigators. The Microarray Database of Gene Expression (MADGE) system is a 3-tier application that models the microarray workflow required to create and use cDNA microarrays, recording both the inputs and outputs generated by the processes. The MADGE system divides the microarray workflow into eight processes performed in two concurrent paths. One path focuses upon the processes necessary to transform a tissue sample into a labeled cDNA probe. While the second path focuses on the creation of the microarrays themselves, including library construction/importation, clone amplification/purification, glass slide preparation and microarray printing. The two paths merge at the hybridization step and continue through the workflow terminating at the submission of the extracted feature data for the microarray. SYSTEMS AND METHODS The data access layer We use SQL Server v7 as the relational database management system (RDBMS) for MADGE. The application uses two databases, one for the array workflow and the other for the employees. The ArrayWorkflow database contains 67 tables with 275 attached stored procedures. The database schema models the flow of data generated during the microarray workflow, including reagent lot numbers, control data (e.g. user IDs and system dates), and data from end deliverables (e.g. feature data and robot files). The SQL scripts needed to generate the two databases will be available in the final MADGE application package. The data services layer We could have written an internet application that contained all the database logic interspersed within the business logic. However, this would have failed to meet our requirement that the system be scalable, manageable and portable. Our current design has the advantage of providing an object oriented API for the programmer as well as the ability to separate the application logic from the database logic. Therefore, making the system scalable, easy to use and more secure. The API is a distributable COM server (DLL) written in Visual Basic v6 and compiled using apartment model threading and optimized for pentium processors. This server contains one transactional and three non-transactional classes. Each class provides methods to retrieve, insert and update information in the system. For example, the Queues class contains 12 methods with 89 options, the transactional ArrayWorkflow class contains 14 methods with 122 options, the ArrayData class contains four methods and the Help class contains the getters and setters for context specific help. The application layer The application layer is the user interface for MADGE, serving as the portal the end user will use to interact with the application. We wanted to build an interface that not only gave the user an organized environment for uploading and retrieving microarray data, but also provides guidance for the day to day experimental procedures (Figure 1). In this respect it would be similar to a laboratory information management system (LIMS) for microarrays. The MADGE system uses an app (...truncated)