Advanced search    

Search: authors:"Haixu Tang"

45 papers found.
Use AND, OR, NOT, +word, -word, "long phrase", (parentheses) to fine-tune your search.

A community assessment of privacy preserving techniques for human genomes

To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) in a way that preserves the privacy of the...

A community assessment of privacy preserving techniques for human genomes

To answer the need for the rigorous protection of biomedical data, we organized the Critical Assessment of Data Privacy and Protection initiative as a community effort to evaluate privacy-preserving dissemination techniques for biomedical data. We focused on the challenge of sharing aggregate human genomic data (e.g., allele frequencies) in a way that preserves the privacy of the...

A community effort to protect genomic data sharing, collaboration and outsourcing

The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and...

A computational approach toward label-free protein quantification using predicted peptide detectability

Summary: We propose here a new concept of peptide detectability which could be an important factor in explaining the relationship between a protein's quantity and the peptides identified from it in a high-throughput proteomics experiment. We define peptide detectability as the probability of observing a peptide in a standard sample analyzed by a standard proteomics routine and...

A machine-learning approach to combined evidence validation of genome assemblies

Motivation: While it is common to refer to ‘the genome sequence’ as if it were a single, complete and contiguous DNA string, it is in fact an assembly of millions of small, partially overlapping DNA fragments. Sophisticated computer algorithms (assemblers and scaffolders) merge these DNA fragments into contigs, and place these contigs into sequence scaffolds using the paired-end...

Algorithmic approaches to clonal reconstruction in heterogeneous cell populations

discussions. Compliance with Ethics Guidelines The authors Wazim Mohammed Ismail, Etienne Nzabarushimana and Haixu Tang declare that they have no conflict of interests. This article is a review article and

Automated interpretation of MS/MS spectra of oligosaccharides

Motivation: The emerging glycomics and glycoproteomics projects aim to characterize all forms of glycoproteins in different tissues and organisms. Tandem mass spectrometry (MS/MS) is the key experimental methodology for high-throughput glycan identification and characterization. Fragmentation of glycans from high energy collision-induced dissociation generates ions from...

CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomes

Background Bacteria and archaea develop immunity against invading genomes by incorporating pieces of the invaders' sequences, called spacers, into a clustered regularly interspaced short palindromic repeats (CRISPR) locus between repeats, forming arrays of repeat-spacer units. When spacers are expressed, they direct CRISPR-associated (Cas) proteins to silence complementary...

DNA sequence templates adjacent nucleosome and ORC sites at gene amplification origins in Drosophila

Eukaryotic origins of DNA replication are bound by the origin recognition complex (ORC), which scaffolds assembly of a pre-replicative complex (pre-RC) that is then activated to initiate replication. Both pre-RC assembly and activation are strongly influenced by developmental changes to the epigenome, but molecular mechanisms remain incompletely defined. We have been examining...

De novo identification of LTR retrotransposons in eukaryotic genomes

Background LTR retrotransposons are a class of mobile genetic elements containing two similar long terminal repeats (LTRs). Currently, LTR retrotransposons are annotated in eukaryotic genomes mainly through the conventional homology searching approach. Hence, it is limited to annotating known elements. Results In this paper, we report a de novo computational method that can...

De novo transcriptome sequencing in a songbird, the dark-eyed junco (Junco hyemalis): genomic tools for an ecological model system

Background Though genomic-level data are becoming widely available, many of the metazoan species sequenced are laboratory systems whose natural history is not well documented. In contrast, the wide array of species with very well-characterized natural history have, until recently, lacked genomics tools. It is now possible to address significant evolutionary genomics questions by...

Diverse CRISPRs Evolving in Human Microbiomes

CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci, together with cas (CRISPR–associated) genes, form the CRISPR/Cas adaptive immune system, a primary defense strategy that eubacteria and archaea mobilize against foreign nucleic acids, including phages and conjugative plasmids. Short spacer sequences separated by the repeats are derived from foreign DNA and...

Enhanced peptide quantification using spectral count clustering and cluster abundance

Background Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, such as spectral counting and spectra feature analysis have been extensively investigated in a wide variety of proteomic studies. The cornerstone of both methods is peptide identification based...

Enhanced peptide quantification using spectral count clustering and cluster abundance

Haixu Tang in:PubMed • Google Scholar Search for Jae K Lee in:PubMed • Google Scholar Search for Taesung Park in:PubMed • Google Scholar Corresponding authors Correspondence to Jae K Lee or Taesung

Fast and accurate identification of semi-tryptic peptides in shotgun proteomics

Motivation: One of the major problems in shotgun proteomics is the low peptide coverage when analyzing complex protein samples. Identifying more peptides, e.g. non-tryptic peptides, may increase the peptide coverage and improve protein identification and/or quantification that are based on the peptide identification results. Searching for all potential non-tryptic peptides is...

FragGeneScan: predicting genes in short and error-prone reads

The advances of next-generation sequencing technology have facilitated metagenomics research that attempts to determine directly the whole collection of genetic material within an environmental sample (i.e. the metagenome). Identification of genes directly from short reads has become an important yet challenging problem in annotating metagenomes, since the assembly of metagenomes...

Fragment assembly with double-barreled data

Pavel A. Pevzner 1 Haixu Tang 0 0 Department of Mathematics, University of Southern California , Los Angeles, CA 90089 1 Department of Computer Science and Engineering, University of California at

Fragment assembly with short reads

Motivation: Current DNA sequencing technology produces reads of about 500–750 bp, with typical coverage under 10×. New sequencing technologies are emerging that produce shorter reads (length 80–200 bp) but allow one to generate significantly higher coverage (30× and higher) at low cost. Modern assembly programs and error correction routines have been tuned to work well with...

Gene finding in metatranscriptomic sequences

Wazim Mohammed Ismail 0 Yuzhen Ye 0 Haixu Tang 0 0 School of Informatics and Computing, Indiana University , 150 S. Woodlawn Avenue, IN 47401 Bloomington , USA Background: Metatranscriptomic