Pre-pro is a fast pre-processor for single-particle cryo-EM by enhancing 2D classification

Communications Biology, Oct 2021

2D classification plays a pivotal role in analyzing single particle cryo-electron microscopy images. Here, we introduce a simple and loss-less pre-processor that incorporates a fast dimension-reduction (2SDR) de-noiser to enhance 2D classification. By implementing this 2SDR pre-processor prior to a representative classification algorithm like RELION and ISAC, we compare the performances with and without the pre-processor. Tests on multiple cryo-EM experimental datasets show the pre-processor can make classification faster, improve yield of good particles and increase the number of class-average images to generate better initial models. Testing on the nanodisc-embedded TRPV1 dataset with high heterogeneity using a 3D reconstruction workflow with an initial model from class-average images highlights the pre-processor improves the final resolution to 2.82 Å, close to 0.9 Nyquist. Those findings and analyses suggest the 2SDR pre-processor, of minimal cost, is widely applicable for boosting 2D classification, while its generalization to accommodate neural network de-noisers is envisioned.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s42003-020-01229-0.pdf

Pre-pro is a fast pre-processor for single-particle cryo-EM by enhancing 2D classification

ARTICLE https://doi.org/10.1038/s42003-020-01229-0 OPEN Pre-pro is a fast pre-processor for single-particle cryo-EM by enhancing 2D classification 1, Hsin-Hung Lin2, Po-Yao Niu1, Shih-Hsin Huang2, I-Ping Tu 1 ✉ & Wei-Hau Chang 2✉ 1234567890():,; Szu-Chi Chung 2D classification plays a pivotal role in analyzing single particle cryo-electron microscopy images. Here, we introduce a simple and loss-less pre-processor that incorporates a fast dimension-reduction (2SDR) de-noiser to enhance 2D classification. By implementing this 2SDR pre-processor prior to a representative classification algorithm like RELION and ISAC, we compare the performances with and without the pre-processor. Tests on multiple cryoEM experimental datasets show the pre-processor can make classification faster, improve yield of good particles and increase the number of class-average images to generate better initial models. Testing on the nanodisc-embedded TRPV1 dataset with high heterogeneity using a 3D reconstruction workflow with an initial model from class-average images highlights the pre-processor improves the final resolution to 2.82 Å, close to 0.9 Nyquist. Those findings and analyses suggest the 2SDR pre-processor, of minimal cost, is widely applicable for boosting 2D classification, while its generalization to accommodate neural network de-noisers is envisioned. 1 Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan. 2 Institute of Chemistry, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan. ✉email: ; COMMUNICATIONS BIOLOGY | (2020)3:508 | https://doi.org/10.1038/s42003-020-01229-0 | www.nature.com/commsbio 1 ARTICLE COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-020-01229-0 C ryo-EM (cryo-electron microscopy) uses an electron beam transmitted through a biological sample to generate projection images. The projection images of a sample can be used to reconstruct the 3D structure when many views are available1. For a sample of protein solution frozen in vitreous ice2, each particle can assume arbitrary orientation that the projection images from different particles may represent different views of a 3D structure. Since cryo-EM only uses a small number of electrons for imaging to alleviate radiation damage on biological specimens, the recorded images are heavily contaminated by shot noise. To process those noisy particle images, a step-wise computation pipeline that aims to obtain a reliable 3D map of the target macro-molecule has been constructed (Fig. 1a and Fig. 1 in ref. 3). 2D classification serves a pivotal role in the entire workflow—it curates a dataset by grouping together the particles of similar view to enhance the signal-to-noise ratio (SNR) and meanwhile discarding invalid particles or contaminants. The class averages can be used for assessing the degree of heterogeneity in data whereas the good ones are chosen for calculating an initial model. As particle images of similar orientation are related to each other by image translation and rotation, clustering alike particles entails the images to be properly aligned first. Since aligning low-SNR images is error-prone, 2D classification is a fundamentally demanding task while the results are often nonideal. A typical 2D classification algorithm therefore couples clustering with image alignment and uses iterations to strive for the best alignment parameters and classification indices. In the era of cryo-EM “resolution revolution”4, the computation burden of 2D classification is further aggravated by the rapid increase in the number and the size of images. A standard computation framework for 2D classification has been established since the early development of single-particle cryo-EM5—this framework combines K-means clustering with a multi-reference alignment (MRA) approach where a number of images are chosen from the data to serve as initialization seeds and alignment references. To mitigate the issue by initialization, RELION classification6—a now widely used classification method, employs maximumlikelihood (ML) approach7 to do MRA, allowing each image to be compared with all images in all possible rotations and translations. An image is then allocated to all classes, yet with different probabilities derived by maximizing the likelihood of observing the experimental dataset using the expectation-maximization algorithm8. This originally slow process has been recently accelerated thanks to GPU parallelism9. As a result, RELION has become a popular approach. Nonetheless, as RELION reports all classes—clear and blurred ones, human inspection is required to select good classes. Some of the good classes can still be heterogeneous as they have the potential to attract less frequent views or low-SNR images3,10. Moreover, optimal outcome of RELION may depend on customer-specified regularization parameters or a good guess on the number of classes. Currently, the best classification results can be obtained from ISAC—iterative stable alignment and clustering11. ISAC uses repeated stability tests to validate the members of each class to ensure its homogeneity. In addition, ISAC restricts the size of each class with the same bound by using a modified K-means to suppress the above-mentioned attractor effect3,10. These features make ISAC an attractive approach when one works on a very heterogeneous dataset. Since ISAC automatically discards the classes that are not stable or reproducible, it may not need human intervention when it comes to selecting good classes. However, ISAC is recommended only for tough problems because it is extremely time consuming. Here, instead of inventing a new 2D classification algorithm, we propose a pre-processing strategy to enhance the performance of existing algorithms. The rationale comes from a finding that salient features of cryo-EM particles can emerge from the Particles Re-position particles 2SDRPreprocessing Re-position particles Movie Alignment 2D Classification 2D averages Clean particles Initial Model Micrographs Initial 3D volume CTF Estimation Perform 2SDR to obtain denoised images 2D Referencefree alignment 3D Classification Seperate 3D volumes Coordinates 3D Refinement Particle Picking 2D Referencefree alignment Extract alignment parameters 0001 x,y, 0002 x,y, ɃɃ 5000 x,y, Particles (a) Apply on original images (b) Fig. 1 The flow charts of the processing and the pre-processing. (a) A single Cryo-EM image processing workflow. (b) The workflow of proposed preprocessing. The upper panel in the left column represents the original particle images; The lower panel in the left column represents the denoised version. The bottom panel in the central column shows the x-and-y shifts and in-plane rotation angle reported by a reference-free alignment procedure applied on the denoised particles. The lower panel in the right column represents the re-positioned particle images obtained by applying the alignment par (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s42003-020-01229-0.pdf
Article home page: https://www.nature.com/articles/s42003-020-01229-0

Chung, Szu-Chi, Lin, Hsin-Hung, Niu, Po-Yao, Huang, Shih-Hsin, Tu, I-Ping, Chang, Wei-Hau. Pre-pro is a fast pre-processor for single-particle cryo-EM by enhancing 2D classification, Communications Biology, DOI: 10.1038/s42003-020-01229-0