pyNBS: a Python implementation for network-based stratification of tumor mutations (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bioinformatics/article-pdf/34/16/2859/48917242/bioinformatics_34_16_2859.pdf

pyNBS: a Python implementation for network-based stratification of tumor mutations

Bioinformatics, 34(16), 2018, 2859–2861 doi: 10.1093/bioinformatics/bty186 Advance Access Publication Date: 28 March 2018 Applications Note Systems biology pyNBS: a Python implementation for network-based stratification of tumor mutations 1 Bioinformatics and Systems Biology Program, Bioengineering Department, UC San Diego, La Jolla, CA 92093, USA and 2Department of Medicine, UC San Diego, La Jolla, CA 92093, USA *To whom correspondence should be addressed. Associate Editor: Oliver Stegle Received on October 11, 2017; revised on February 5, 2018; editorial decision on March 22, 2018; accepted on March 27, 2018 Abstract Summary: We present pyNBS: a modularized Python 2.7 implementation of the network-based stratification (NBS) algorithm for stratifying tumor somatic mutation profiles into molecularly and clinically relevant subtypes. In addition to release of the software, we benchmark its key parameters and provide a compact cancer reference network that increases the significance of tumor stratification using the NBS algorithm. The structure of the code exposes key steps of the algorithm to foster further collaborative development. Availability and implementation: The package, along with examples and data, can be downloaded and installed from the URL https://github.com/idekerlab/pyNBS. Contact: 1 Introduction The biomedical community increasingly relies on genomic information to diagnose and treat many different complex diseases, including cancer (Frampton, 2013; Johnson, 2014). In parallel, developments in molecular interaction mapping technologies and network analysis algorithms have enabled the systematic elucidation of pathways involved in cancer and other complex diseases (Schaefer et al., 2009). These two technologies—genomics and network analysis—have been recently combined to contextualize somatic mutations in tumors against the knowledge contained in molecular interaction networks and disease pathway maps. For example, numerous algorithms now use molecular network information to discover significantly mutated pathways in particular cohorts of patients (Ciriello, 2012; Drake, 2016; Leiserson, 2013, 2014; Paull, 2013; Vandin, 2011a,b; Vaske, 2010). Recently, we introduced an algorithm that uses molecular network information to guide the stratification of tumor somatic mutation profiles into clinically relevant subtypes (Hofree, 2013). Such mutation profiles have been notoriously difficult to stratify (i.e. cluster) due to their extreme heterogeneity from patient to patient. Our algorithm, called Network-Based Stratification (NBS), relies upon aggregating these mutations in molecular network neighborhoods to gain power in separating patients. The underlying assumption is that cancer arises due to disruptions in specific molecular pathways, not only disruptions in isolated genes (Vanunu et al., 2010). It is commonly observed that similar cancer types arise from mutations that affect different genes that are participants in common pathways. However, traditional gene-wise clustering methods fail to capture similarities that are observed only on the pathway level, since mutations do not necessarily fall on the same genes and therefore do not contribute to any measure of similarity between patients despite affecting the same pathway. The information of each somatic mutation is smoothed across its network neighborhood, spreading the signal to other functionally related genes in network space. It is then possible to obtain robust clusters of patients based on the similarity of these networksmoothed mutation profiles. In the original publication of NBS, the code used to develop the project was provided in MATLAB, a proprietary programming language, making open access to this software difficult. Additionally, the code lacked modularization, making individual steps of the algorithm difficult to control, analyze and test. In what follows, we implement and organize the NBS algorithm as an installable Python package, which we call pyNBS. This package modularizes and exposes the major steps in the algorithm to better control, analyze and improve the approach in future studies. C The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: V 2859 Justin K. Huang1,*, Tongqiu Jia2, Daniel E. Carlin2 and Trey Ideker1,2 2860 J.K.Huang et al. Repeat N Times Step 2: Smooth somatic mutations over molecular network Step 3b: Threshold network by retaining nearest neighbors of each gene Step 4: Cluster network-smoothed mutation profiles with graph-regularized non-negative matrix factorization (GNMF) A 5 HN90 CRN CRN genes, no propagation CRN, shuffled 4 -log10 (P-value) Step 3a: Construct influence distance matrix 3 2 1 0 BLCA, k=4 B COAD, k=3 HNSC, k=4 UCEC, k=4 Cancer Type 1.0 20 0.8 15 0.6 ARI HN90 CRN 0.4 Time HN90 CRN 200 400 600 800 5 0 1000 Number of Iterations Step 5: Consensus clustering on aggregate GNMF results Fig. 1. Overview and stepwise factorization of the NBS algorithm 2 Materials and methods The NBS algorithm requires two inputs: a matrix of binary values describing all somatic tumor mutations found within a cohort of cancer patients (patients genes) and a second file describing the genegene interactions defining a reference molecular network. Given these inputs, the NBS algorithm clusters the tumor mutation profiles into molecular subtypes as seen in Figure 1. Additional details of the algorithm are described in the original NBS manuscript (Hofree, 2013). 3 Results 3.1 pyNBS usage and validation The NBS algorithm can be executed using the pyNBS package in two modes: using a wrapper script via the command line, or by running the provided Jupyter Notebooks. Documentation for both code execution modes are provided within a GitHub repository, which can be found at: https://github.com/idekerlab/pyNBS. It should be noted that each full run of pyNBS does not necessarily produce the exact same cluster assignments on the same cohort. This variation is due to the stochastic nature of the sub-sampling step as well as the non-unique nature of matrix factorization (Cai et al., 2011). However, this variance is largely controlled by the final consensus clustering step. We tested the pyNBS package by generating patient subtypes in ovarian and uterine cancer using the data and corresponding networks released with the original Hofree et al. manuscript. PyNBS nearly perfectly recovered the original Hofree patient cluster assignments for ovarian and uterine cancer (v2 P-value: 2.3 10 107 and 5.3 10 88, respectively). These two test examples are provided, along with the required datasets (re-formatted for usage with pyNBS), as Jupyter Notebooks in the GitHub repository. 3.2 A cancer-specific network for pyNBS In addition to reconstructing the original NBS algorithm, we also explored alternative reference networks for their ability to separate tumor cohorts into clinically relevant subty (...truncated)