Advanced search    

Search: authors:"Mark Gerstein"

123 papers found.
Use AND, OR, NOT, +word, -word, "long phrase", (parentheses) to fine-tune your search.

Localized structural frustration for evaluating the impact of sequence variants

Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on...

LARVA: an integrative framework for large-scale analysis of recurrent variants in noncoding annotations

In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding...

MUSIC: identification of enriched regions in ChIP-Seq experiments using a mappability-corrected multiscale signal processing framework

We present MUSIC, a signal processing approach for identification of enriched regions in ChIP-Seq data, available at music.gersteinlab.org. MUSIC first filters the ChIP-Seq read-depth signal for systematic noise from non-uniform mappability, which fragments enriched regions. Then it performs a multiscale decomposition, using median filtering, identifying enriched regions at...

iTAR: a web server for identifying target genes of transcription factors using ChIP-seq or ChIP-chip data

Background Chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-seq) or microarray hybridization (ChIP-chip) has been widely used to determine the genomic occupation of transcription factors (TFs). We have previously developed a probabilistic method, called TIP (Target Identification from Profiles), to identify TF target genes using ChIP-seq/ChIP-chip...

A uniform survey of allele-specific binding and expression over 1000-Genomes-Project individuals

• PubMed • Google ScholarSearch for Mark Gerstein in:Nature Research journals • PubMed • Google Scholar Contributions J.C., J.R. and M.G. conceived and designed the resource. J.C. and J.B. built the AlleleDB ... authors declare no competing financial interests. Corresponding author Correspondence to Mark Gerstein. Supplementary information PDF files1.Supplementary Information Supplementary Figures 1-5

An approach for determining and measuring network hierarchy applied to comparing the phosphorylome and the regulome

Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail...

Discordant Expression of Circulating microRNA from Cellular and Extracellular Sources

MicroRNA (miRNA) expression has rapidly grown into one of the largest fields for disease characterization and development of clinical biomarkers. Consensus is lacking in regards to the optimal sample source or if different circulating sources are concordant. Here, using miRNA measurements from contemporaneously obtained whole blood- and plasma-derived RNA from 2391 individuals...

The real cost of sequencing: scaling computation to keep pace with data generation

As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.

Machine learning and genome annotation: a match meant to be?

By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE.

Identification of yeast cell cycle regulated genes based on genomic features

Background Time-course microarray experiments have been widely used to identify cell cycle regulated genes. However, the method is not effective for lowly expressed genes and is sensitive to experimental conditions. To complement microarray experiments, we propose a computational method to predict cell cycle regulated genes based on their genomic features – transcription factor...

OrthoClust: an orthology-based network framework for clustering data across multiple species

Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that...

Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells

Transcription factor (TF) binding and histone modification (HM) are important for the precise control of gene expression. Hence, we constructed statistical models to relate these to gene expression levels in mouse embryonic stem cells. While both TF binding and HMs are highly ‘predictive’ of gene expression levels (in a statistical, but perhaps not strictly mechanistic, sense...

Interpretation of Genomic Variants Using a Unified Biological Network Approach

The decreasing cost of sequencing is leading to a growing repertoire of personal genomes. However, we are lagging behind in understanding the functional consequences of the millions of variants obtained from sequencing. Global system-wide effects of variants in coding genes are particularly poorly understood. It is known that while variants in some genes can lead to diseases...

FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer

Mark Gerstein 0 1 4 0 Program in Computational Biology and Bioinformatics, Yale University , New Haven, CT 06520 , USA 1 Molecular Biophysics and Biochemistry Department, Yale University , New Haven, CT

AGE: defining breakpoints of genomic structural variants at single-nucleotide resolution, through optimal alignments with gap excision

Motivation: Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is an important problem, as it is a prerequisite for classifying SVs, evaluating their functional impact and reconstructing personal genome sequences. Given approximate breakpoint locations and a bridging assembly or split read, the problem essentially reduces to...

TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles

Motivation: ChIP-seq and ChIP-chip experiments have been widely used to identify transcription factor (TF) binding sites and target genes. Conventionally, a fairly ‘simple’ approach is employed for target gene identification e.g. finding genes with binding sites within 2 kb of a transcription start site (TSS). However, this does not take into account the number of sites upstream...

The Spread of Scientific Information: Insights from the Web Usage Statistics in PLoS Article-Level Metrics

The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our...

An ensemble approach to accurately detect somatic mutations using SomaticSeq

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for...