pyNBS: a Python implementation for network-based stratification of tumor mutations
Bioinformatics, 34(16), 2018, 2859–2861
doi: 10.1093/bioinformatics/bty186
Advance Access Publication Date: 28 March 2018
Applications Note
Systems biology
pyNBS: a Python implementation for
network-based stratification of tumor mutations
1
Bioinformatics and Systems Biology Program, Bioengineering Department, UC San Diego, La Jolla, CA 92093,
USA and 2Department of Medicine, UC San Diego, La Jolla, CA 92093, USA
*To whom correspondence should be addressed.
Associate Editor: Oliver Stegle
Received on October 11, 2017; revised on February 5, 2018; editorial decision on March 22, 2018; accepted on March 27, 2018
Abstract
Summary: We present pyNBS: a modularized Python 2.7 implementation of the network-based
stratification (NBS) algorithm for stratifying tumor somatic mutation profiles into molecularly and
clinically relevant subtypes. In addition to release of the software, we benchmark its key parameters and provide a compact cancer reference network that increases the significance of tumor
stratification using the NBS algorithm. The structure of the code exposes key steps of the algorithm
to foster further collaborative development.
Availability and implementation: The package, along with examples and data, can be downloaded
and installed from the URL https://github.com/idekerlab/pyNBS.
Contact:
1 Introduction
The biomedical community increasingly relies on genomic information to diagnose and treat many different complex diseases, including cancer (Frampton, 2013; Johnson, 2014). In parallel,
developments in molecular interaction mapping technologies and
network analysis algorithms have enabled the systematic elucidation
of pathways involved in cancer and other complex diseases
(Schaefer et al., 2009). These two technologies—genomics and network analysis—have been recently combined to contextualize somatic mutations in tumors against the knowledge contained in
molecular interaction networks and disease pathway maps. For example, numerous algorithms now use molecular network information to discover significantly mutated pathways in particular cohorts
of patients (Ciriello, 2012; Drake, 2016; Leiserson, 2013, 2014;
Paull, 2013; Vandin, 2011a,b; Vaske, 2010).
Recently, we introduced an algorithm that uses molecular network
information to guide the stratification of tumor somatic mutation profiles into clinically relevant subtypes (Hofree, 2013). Such mutation
profiles have been notoriously difficult to stratify (i.e. cluster) due to
their extreme heterogeneity from patient to patient. Our algorithm,
called Network-Based Stratification (NBS), relies upon aggregating
these mutations in molecular network neighborhoods to gain power
in separating patients. The underlying assumption is that cancer arises
due to disruptions in specific molecular pathways, not only disruptions in isolated genes (Vanunu et al., 2010). It is commonly observed
that similar cancer types arise from mutations that affect different
genes that are participants in common pathways. However, traditional gene-wise clustering methods fail to capture similarities that
are observed only on the pathway level, since mutations do not necessarily fall on the same genes and therefore do not contribute to any
measure of similarity between patients despite affecting the same
pathway. The information of each somatic mutation is smoothed
across its network neighborhood, spreading the signal to other functionally related genes in network space. It is then possible to obtain
robust clusters of patients based on the similarity of these networksmoothed mutation profiles.
In the original publication of NBS, the code used to develop the
project was provided in MATLAB, a proprietary programming language, making open access to this software difficult. Additionally,
the code lacked modularization, making individual steps of the algorithm difficult to control, analyze and test. In what follows, we implement and organize the NBS algorithm as an installable Python
package, which we call pyNBS. This package modularizes and exposes the major steps in the algorithm to better control, analyze and
improve the approach in future studies.
C The Author(s) 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
V
2859
Justin K. Huang1,*, Tongqiu Jia2, Daniel E. Carlin2 and Trey Ideker1,2
2860
J.K.Huang et al.
Repeat
N Times
Step 2: Smooth
somatic mutations over
molecular network
Step 3b: Threshold
network by retaining
nearest neighbors of
each gene
Step 4: Cluster network-smoothed mutation
profiles with graph-regularized non-negative matrix
factorization (GNMF)
A
5
HN90
CRN
CRN genes, no propagation
CRN, shuffled
4
-log10 (P-value)
Step 3a: Construct
influence distance
matrix
3
2
1
0
BLCA, k=4
B
COAD, k=3
HNSC, k=4
UCEC, k=4
Cancer Type
1.0
20
0.8
15
0.6
ARI
HN90
CRN
0.4
Time
HN90
CRN
200
400
600
800
5
0
1000
Number of Iterations
Step 5: Consensus clustering on aggregate
GNMF results
Fig. 1. Overview and stepwise factorization of the NBS algorithm
2 Materials and methods
The NBS algorithm requires two inputs: a matrix of binary values
describing all somatic tumor mutations found within a cohort of cancer patients (patients genes) and a second file describing the genegene interactions defining a reference molecular network. Given these
inputs, the NBS algorithm clusters the tumor mutation profiles into
molecular subtypes as seen in Figure 1. Additional details of the algorithm are described in the original NBS manuscript (Hofree, 2013).
3 Results
3.1 pyNBS usage and validation
The NBS algorithm can be executed using the pyNBS package in
two modes: using a wrapper script via the command line, or by running the provided Jupyter Notebooks. Documentation for both code
execution modes are provided within a GitHub repository, which
can be found at: https://github.com/idekerlab/pyNBS.
It should be noted that each full run of pyNBS does not necessarily produce the exact same cluster assignments on the same cohort.
This variation is due to the stochastic nature of the sub-sampling
step as well as the non-unique nature of matrix factorization (Cai
et al., 2011). However, this variance is largely controlled by the final
consensus clustering step.
We tested the pyNBS package by generating patient subtypes in
ovarian and uterine cancer using the data and corresponding networks released with the original Hofree et al. manuscript. PyNBS
nearly perfectly recovered the original Hofree patient cluster assignments for ovarian and uterine cancer (v2 P-value: 2.3 10 107 and
5.3 10 88, respectively). These two test examples are provided,
along with the required datasets (re-formatted for usage with
pyNBS), as Jupyter Notebooks in the GitHub repository.
3.2 A cancer-specific network for pyNBS
In addition to reconstructing the original NBS algorithm, we also
explored alternative reference networks for their ability to separate
tumor cohorts into clinically relevant subty (...truncated)