Pre-pro is a fast pre-processor for single-particle cryo-EM by enhancing 2D classification
ARTICLE
https://doi.org/10.1038/s42003-020-01229-0
OPEN
Pre-pro is a fast pre-processor for single-particle
cryo-EM by enhancing 2D classification
1, Hsin-Hung Lin2, Po-Yao Niu1, Shih-Hsin Huang2, I-Ping Tu
1 ✉ & Wei-Hau Chang
2✉
1234567890():,;
Szu-Chi Chung
2D classification plays a pivotal role in analyzing single particle cryo-electron microscopy
images. Here, we introduce a simple and loss-less pre-processor that incorporates a fast
dimension-reduction (2SDR) de-noiser to enhance 2D classification. By implementing this
2SDR pre-processor prior to a representative classification algorithm like RELION and ISAC,
we compare the performances with and without the pre-processor. Tests on multiple cryoEM experimental datasets show the pre-processor can make classification faster, improve
yield of good particles and increase the number of class-average images to generate better
initial models. Testing on the nanodisc-embedded TRPV1 dataset with high heterogeneity
using a 3D reconstruction workflow with an initial model from class-average images highlights the pre-processor improves the final resolution to 2.82 Å, close to 0.9 Nyquist.
Those findings and analyses suggest the 2SDR pre-processor, of minimal cost, is widely
applicable for boosting 2D classification, while its generalization to accommodate neural
network de-noisers is envisioned.
1 Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan. 2 Institute of Chemistry, Academia Sinica,
128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan. ✉email: ;
COMMUNICATIONS BIOLOGY | (2020)3:508 | https://doi.org/10.1038/s42003-020-01229-0 | www.nature.com/commsbio
1
ARTICLE
COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-020-01229-0
C
ryo-EM (cryo-electron microscopy) uses an electron beam
transmitted through a biological sample to generate projection images. The projection images of a sample can be
used to reconstruct the 3D structure when many views are
available1. For a sample of protein solution frozen in vitreous ice2,
each particle can assume arbitrary orientation that the projection
images from different particles may represent different views of a
3D structure. Since cryo-EM only uses a small number of electrons for imaging to alleviate radiation damage on biological
specimens, the recorded images are heavily contaminated by shot
noise. To process those noisy particle images, a step-wise computation pipeline that aims to obtain a reliable 3D map of the
target macro-molecule has been constructed (Fig. 1a and Fig. 1 in
ref. 3). 2D classification serves a pivotal role in the entire workflow—it curates a dataset by grouping together the particles of
similar view to enhance the signal-to-noise ratio (SNR) and
meanwhile discarding invalid particles or contaminants. The class
averages can be used for assessing the degree of heterogeneity in
data whereas the good ones are chosen for calculating an initial
model. As particle images of similar orientation are related to
each other by image translation and rotation, clustering alike
particles entails the images to be properly aligned first. Since
aligning low-SNR images is error-prone, 2D classification is a
fundamentally demanding task while the results are often nonideal. A typical 2D classification algorithm therefore couples
clustering with image alignment and uses iterations to strive for
the best alignment parameters and classification indices. In the
era of cryo-EM “resolution revolution”4, the computation burden
of 2D classification is further aggravated by the rapid increase in
the number and the size of images. A standard computation
framework for 2D classification has been established since the
early development of single-particle cryo-EM5—this framework
combines K-means clustering with a multi-reference alignment
(MRA) approach where a number of images are chosen from the
data to serve as initialization seeds and alignment references. To
mitigate the issue by initialization, RELION classification6—a
now widely used classification method, employs maximumlikelihood (ML) approach7 to do MRA, allowing each image to be
compared with all images in all possible rotations and translations. An image is then allocated to all classes, yet with different
probabilities derived by maximizing the likelihood of observing
the experimental dataset using the expectation-maximization
algorithm8. This originally slow process has been recently accelerated thanks to GPU parallelism9. As a result, RELION has
become a popular approach. Nonetheless, as RELION reports all
classes—clear and blurred ones, human inspection is required to
select good classes. Some of the good classes can still be heterogeneous as they have the potential to attract less frequent views or
low-SNR images3,10. Moreover, optimal outcome of RELION
may depend on customer-specified regularization parameters or a
good guess on the number of classes. Currently, the best classification results can be obtained from ISAC—iterative stable
alignment and clustering11. ISAC uses repeated stability tests to
validate the members of each class to ensure its homogeneity. In
addition, ISAC restricts the size of each class with the same bound
by using a modified K-means to suppress the above-mentioned
attractor effect3,10. These features make ISAC an attractive
approach when one works on a very heterogeneous dataset. Since
ISAC automatically discards the classes that are not stable or
reproducible, it may not need human intervention when it comes
to selecting good classes. However, ISAC is recommended only
for tough problems because it is extremely time consuming.
Here, instead of inventing a new 2D classification algorithm,
we propose a pre-processing strategy to enhance the performance
of existing algorithms. The rationale comes from a finding that
salient features of cryo-EM particles can emerge from the
Particles
Re-position particles
2SDRPreprocessing
Re-position
particles
Movie
Alignment
2D
Classification
2D averages
Clean particles
Initial Model
Micrographs
Initial 3D volume
CTF
Estimation
Perform 2SDR
to obtain
denoised images
2D Referencefree alignment
3D
Classification
Seperate 3D
volumes
Coordinates
3D Refinement
Particle Picking
2D Referencefree alignment
Extract alignment parameters
0001 x,y,
0002 x,y,
ɃɃ
5000 x,y,
Particles
(a)
Apply on
original
images
(b)
Fig. 1 The flow charts of the processing and the pre-processing. (a) A single Cryo-EM image processing workflow. (b) The workflow of proposed preprocessing. The upper panel in the left column represents the original particle images; The lower panel in the left column represents the denoised version.
The bottom panel in the central column shows the x-and-y shifts and in-plane rotation angle reported by a reference-free alignment procedure applied on
the denoised particles. The lower panel in the right column represents the re-positioned particle images obtained by applying the alignment par (...truncated)