PyHLA: tests for the association between HLA alleles and diseases

BMC Bioinformatics, Feb 2017

Background Recently, several tools have been designed for human leukocyte antigen (HLA) typing using single nucleotide polymorphism (SNP) array and next-generation sequencing (NGS) data. These tools provide high-throughput and cost-effective approaches for identifying HLA types. Therefore, tools for downstream association analysis are highly desirable. Although several tools have been designed for multi-allelic marker association analysis, they were designed only for microsatellite markers and do not scale well with increasing data volumes, or they were designed for large-scale data but provided a limited number of tests. Results We have developed a Python package called PyHLA, which implements several methods for HLA association analysis, to fill the gap. PyHLA is a tailor-made, easy to use, and flexible tool designed specifically for the association analysis of the HLA types imputed from genome-wide genotyping and NGS data. PyHLA provides functions for association analysis, zygosity tests, and interaction tests between HLA alleles and diseases. Monte Carlo permutation and several methods for multiple testing corrections have also been implemented. Conclusions PyHLA provides a convenient and powerful tool for HLA analysis. Existing methods have been integrated and desired methods have been added in PyHLA. Furthermore, PyHLA is applicable to small and large sample sizes and can finish the analysis in a timely manner on a personal computer with different platforms. PyHLA is implemented in Python. PyHLA is a free, open source software distributed under the GPLv2 license. The source code, tutorial, and examples are available at https://github.com/felixfan/PyHLA.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.biomedcentral.com/content/pdf/s12859-017-1496-0.pdf

PyHLA: tests for the association between HLA alleles and diseases

Fan and Song BMC Bioinformatics PyHLA: tests for the association between HLA alleles and diseases Yanhui Fan 0 1 2 You-Qiang Song 0 1 0 Centre for Genomic Sciences, The University of Hong Kong , 5 Sassoon Road, Pokfulam , Hong Kong, Hong Kong 1 School of Biomedical Sciences, The University of Hong Kong , 21 Sassoon Road, Pokfulam , Hong Kong, Hong Kong 2 Department of Cancer Genomics, LemonData Biotech (Shenzhen) Ltd. , Shenzhen , China Background: Recently, several tools have been designed for human leukocyte antigen (HLA) typing using single nucleotide polymorphism (SNP) array and next-generation sequencing (NGS) data. These tools provide high-throughput and cost-effective approaches for identifying HLA types. Therefore, tools for downstream association analysis are highly desirable. Although several tools have been designed for multi-allelic marker association analysis, they were designed only for microsatellite markers and do not scale well with increasing data volumes, or they were designed for large-scale data but provided a limited number of tests. Results: We have developed a Python package called PyHLA, which implements several methods for HLA association analysis, to fill the gap. PyHLA is a tailor-made, easy to use, and flexible tool designed specifically for the association analysis of the HLA types imputed from genome-wide genotyping and NGS data. PyHLA provides functions for association analysis, zygosity tests, and interaction tests between HLA alleles and diseases. Monte Carlo permutation and several methods for multiple testing corrections have also been implemented. Conclusions: PyHLA provides a convenient and powerful tool for HLA analysis. Existing methods have been integrated and desired methods have been added in PyHLA. Furthermore, PyHLA is applicable to small and large sample sizes and can finish the analysis in a timely manner on a personal computer with different platforms. PyHLA is implemented in Python. PyHLA is a free, open source software distributed under the GPLv2 license. The source code, tutorial, and examples are available at https://github.com/felixfan/PyHLA. HLA; Association; Interaction; Multi-allelic - Background The human leukocyte antigen (HLA) loci on chromosome 6 (6p21.3) are the most polymorphic and genedense region of the human genome. HLA proteins play an important role in transplant rejection. Association of variants in the HLA region and infectious, autoimmune diseases and cancers has been established. Directly typing HLA using experiments is still laborious, expensive, and time-consuming [1]. Several algorithms and pipelines, such as HLA*IMP:02 [2] and MGAPrediction [3] have been developed for HLA imputation using data from genome-wide association studies (GWAS), whereas OptiType [4], HLA-VBSeq [5] and HLAreporter [6] have been developed for HLA typing using data from nextgeneration sequencing (NGS) studies. All tools use HLA allele sequences from the IMGT/HLA database [7] as reference. These tools have provided us a cost-efficient, high-throughput approach for HLA typing by using the already available GWAS and NGS data. Given the continuously increasing amounts of HLA types being generated, integrating the workflow for their downstream association analysis is highly desirable. Several existing tools, such as CLUMP [8], PyPop [9] and SKDM [10], can be used to analyze HLA types. These tools are not ideal for association analysis of HLA types inferred from GWAS and NGS data as they were designed for analyzing microsatellite markers or provided limited functions. In this study, we present PyHLA, a Python-based standalone tool, for the association analysis between diseases and HLA types inferred from GWAS and NGS data. © The Author(s). 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Implementation PyHLA is implemented in Python 2.7. The graphical user interface is also provided. The source code, tutorial and examples are freely available at https://github.com/felixfan/ PyHLA. A demonstration is also available at https://github. com/felixfan/PyHLA/tree/master/demo. Figure 1 shows an overview of the methods applied to HLA types for finding disease-associated HLA alleles. Data summary (module 1) Gene, allele and population level summary of the frequency can be produced in the case and control populations. Association analysis (module 2) It is a simple and easy way to implement methods for localization of susceptibility gen (...truncated)


This is a preview of a remote PDF: http://www.biomedcentral.com/content/pdf/s12859-017-1496-0.pdf

Yanhui Fan, You-Qiang Song. PyHLA: tests for the association between HLA alleles and diseases, BMC Bioinformatics, 2017, pp. 90, 18, DOI: 10.1186/s12859-017-1496-0