Advanced search    

Search: authors:"William Stafford Noble"

43 papers found.
Use AND, OR, NOT, +word, -word, "long phrase", (parentheses) to fine-tune your search.

Choosing panels of genomics assays using submodular optimization

Due to the high cost of sequencing-based genomics assays such as ChIP-seq and DNase-seq, the epigenomic characterization of a cell type is typically carried out using a small panel of assay types. Deciding a priori which assays to perform is, thus, a critical step in many studies. We present the submodular selection of assays (SSA), a method for choosing a diverse panel of...

A statistical approach for inferring the 3D structure of the genome

Motivation: Recent technological advances allow the measurement, in a single Hi-C experiment, of the frequencies of physical contacts among pairs of genomic loci at a genome-wide scale. The next challenge is to infer, from the resulting DNA–DNA contact maps, accurate 3D models of how chromosomes fold and fit into the nucleus. Many existing inference methods rely on...

On the assessment of statistical significance of three-dimensional colocalization of sets of genomic elements

A growing body of experimental evidence supports the hypothesis that the 3D structure of chromatin in the nucleus is closely linked to important functional processes, including DNA replication and gene regulation. In support of this hypothesis, several research groups have examined sets of functionally associated genomic loci, with the aim of determining whether those loci are...

Computational and Statistical Analysis of Protein Mass Spectrometry Data

of Protein Mass Spectrometry Data William Stafford Noble 0 Michael J. MacCoss 0 This is an ''Editors' Outlook'' article for PLoS Computational Biology 0 Philip E. Bourne, University of California San ... the information that, e.g., either protein A or protein B was present in the sample, but probably not both. Authors Biographies William Stafford Noble (formerly William Noble Grundy) received the PhD in

A Quick Guide to Organizing Computational Biology Projects

Biology Projects William Stafford Noble 0 Fran Lewitter, Whitehead Institute, United States of America 0 1 Department of Genome Sciences, School of Medicine, University of Washington, Seattle, Washington

A Unified Multitask Architecture for Predicting Local Protein Properties

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for...

Inferring Clonal Composition from Multiple Sections of a Breast Cancer

Cancers arise from successive rounds of mutation and selection, generating clonal populations that vary in size, mutational content and drug responsiveness. Ascertaining the clonal composition of a tumor is therefore important both for prognosis and therapy. Mutation counts and frequencies resulting from next-generation sequencing (NGS) potentially reflect a tumor's clonal...

Estimating relative abundances of proteins from shotgun proteomics data

Background Spectral counting methods provide an easy means of identifying proteins with differing abundances between complex mixtures using shotgun proteomics data. The crux spectral-counts command, implemented as part of the Crux software toolkit, implements four previously reported spectral counting methods, the spectral index (SI N ), the exponentially modified protein...

Predicting Co-Complexed Protein Pairs from Heterogeneous Data

Proteins do not carry out their functions alone. Instead, they often act by participating in macromolecular complexes and play different functional roles depending on the other members of the complex. It is therefore interesting to identify co-complex relationships. Although protein complexes can be identified in a high-throughput manner by experimental technologies such as...

FIMO: scanning for occurrences of a given motif

Summary: A motif is a short DNA or protein sequence that contributes to the biological function of the sequence in which it resides. Over the past several decades, many computational methods have been described for identifying, characterizing and searching with sequence motifs. Critical to nearly any motif-based sequence analysis pipeline is the ability to scan a sequence...

Detecting Remote Evolutionary Relationships among Proteins by Large-Scale Semantic Embedding

Virtually every molecular biologist has searched a protein or DNA sequence database to find sequences that are evolutionarily related to a given query. Pairwise sequence comparison methods—i.e., measures of similarity between query and target sequences—provide the engine for sequence database search and have been the subject of 30 years of computational research. For the...

Learning a Weighted Sequence Model of the Nucleosome Core and Linker Yields More Accurate Predictions in Saccharomyces cerevisiae and Homo sapiens

Jeff A. Bilmes 0 William Stafford Noble 0 Uwe Ohler, Duke University, United States of America 0 1 Department of Electrical Engineering, University of Washington, Seattle, Washington, United States of

The Genomedata format for storing large-scale functional genomics data

Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive...

Improved network-based identification of protein orthologs

Motivation: Identifying protein orthologs is an important task that is receiving growing attention in the bioinformatics literature. Orthology detection provides a fundamental tool towards understanding protein evolution, predicting protein functions and interactions, aligning protein–protein interaction (PPI) networks of different species and detecting conserved modules within...

Improved similarity scores for comparing motifs

Motivation: A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good...

Assessing phylogenetic motif models for predicting transcription factor binding sites

Motivation: A variety of algorithms have been developed to predict transcription factor binding sites (TFBSs) within the genome by exploiting the evolutionary information implicit in multiple alignments of the genomes of related species. One such approach uses an extension of the standard position-specific motif model that incorporates phylogenetic information via a phylogenetic...

qvality: non-parametric estimation of q-values and posterior error probabilities

Summary: Qvality is a C++ program for estimating two types of standard statistical confidence measures: the q-value, which is an analog of the p-value that incorporates multiple testing correction, and the posterior error probability (PEP, also known as the local false discovery rate), which corresponds to the probability that a given observation is drawn from the null...

Epigenetic priors for identifying active transcription factor binding sites

Motivation Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this...

Rankprop: a web server for protein remote homology detection

Summary: We present a large-scale implementation of the Rankprop protein homology ranking algorithm in the form of an openly accessible web server. We use the NRDB40 PSI-BLAST all-versus-all protein similarity network of 1.1 million proteins to construct the graph for the Rankprop algorithm, whereas previously, results were only reported for a database of 108 000 proteins. We...

Learning kernels from biological networks by maximizing entropy

Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. Results: We show that computing the diffusion...