PyHLA: tests for the association between HLA alleles and diseases
Fan and Song BMC Bioinformatics
PyHLA: tests for the association between HLA alleles and diseases
Yanhui Fan 0 1 2
You-Qiang Song 0 1
0 Centre for Genomic Sciences, The University of Hong Kong , 5 Sassoon Road, Pokfulam , Hong Kong, Hong Kong
1 School of Biomedical Sciences, The University of Hong Kong , 21 Sassoon Road, Pokfulam , Hong Kong, Hong Kong
2 Department of Cancer Genomics, LemonData Biotech (Shenzhen) Ltd. , Shenzhen , China
Background: Recently, several tools have been designed for human leukocyte antigen (HLA) typing using single nucleotide polymorphism (SNP) array and next-generation sequencing (NGS) data. These tools provide high-throughput and cost-effective approaches for identifying HLA types. Therefore, tools for downstream association analysis are highly desirable. Although several tools have been designed for multi-allelic marker association analysis, they were designed only for microsatellite markers and do not scale well with increasing data volumes, or they were designed for large-scale data but provided a limited number of tests. Results: We have developed a Python package called PyHLA, which implements several methods for HLA association analysis, to fill the gap. PyHLA is a tailor-made, easy to use, and flexible tool designed specifically for the association analysis of the HLA types imputed from genome-wide genotyping and NGS data. PyHLA provides functions for association analysis, zygosity tests, and interaction tests between HLA alleles and diseases. Monte Carlo permutation and several methods for multiple testing corrections have also been implemented. Conclusions: PyHLA provides a convenient and powerful tool for HLA analysis. Existing methods have been integrated and desired methods have been added in PyHLA. Furthermore, PyHLA is applicable to small and large sample sizes and can finish the analysis in a timely manner on a personal computer with different platforms. PyHLA is implemented in Python. PyHLA is a free, open source software distributed under the GPLv2 license. The source code, tutorial, and examples are available at https://github.com/felixfan/PyHLA.
HLA; Association; Interaction; Multi-allelic
-
Background
The human leukocyte antigen (HLA) loci on
chromosome 6 (6p21.3) are the most polymorphic and
genedense region of the human genome. HLA proteins play
an important role in transplant rejection. Association of
variants in the HLA region and infectious, autoimmune
diseases and cancers has been established. Directly
typing HLA using experiments is still laborious, expensive,
and time-consuming [1]. Several algorithms and
pipelines, such as HLA*IMP:02 [2] and MGAPrediction [3]
have been developed for HLA imputation using data
from genome-wide association studies (GWAS), whereas
OptiType [4], HLA-VBSeq [5] and HLAreporter [6] have
been developed for HLA typing using data from
nextgeneration sequencing (NGS) studies. All tools use HLA
allele sequences from the IMGT/HLA database [7] as
reference. These tools have provided us a cost-efficient,
high-throughput approach for HLA typing by using the
already available GWAS and NGS data.
Given the continuously increasing amounts of HLA
types being generated, integrating the workflow for their
downstream association analysis is highly desirable.
Several existing tools, such as CLUMP [8], PyPop [9] and
SKDM [10], can be used to analyze HLA types. These
tools are not ideal for association analysis of HLA types
inferred from GWAS and NGS data as they were
designed for analyzing microsatellite markers or provided
limited functions. In this study, we present PyHLA, a
Python-based standalone tool, for the association
analysis between diseases and HLA types inferred from
GWAS and NGS data.
© The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Implementation
PyHLA is implemented in Python 2.7. The graphical user
interface is also provided. The source code, tutorial and
examples are freely available at https://github.com/felixfan/
PyHLA. A demonstration is also available at https://github.
com/felixfan/PyHLA/tree/master/demo. Figure 1 shows an
overview of the methods applied to HLA types for finding
disease-associated HLA alleles.
Data summary (module 1)
Gene, allele and population level summary of the frequency
can be produced in the case and control populations.
Association analysis (module 2)
It is a simple and easy way to implement methods for
localization of susceptibility gen (...truncated)