A genome-wide survey of copy number variations reveals an asymmetric evolution of duplicated genes in rice

BMC Biology, Jun 2020

Copy number variations (CNVs) are an important type of structural variations in the genome that usually affect gene expression levels by gene dosage effect. Understanding CNVs as part of genome evolution may provide insights into the genetic basis of important agricultural traits and contribute to the crop breeding in the future. While available methods to detect CNVs utilizing next-generation sequencing technology have helped shed light on prevalence and effects of CNVs, the complexity of crop genomes poses a major challenge and requires development of additional tools. Here, we generated genomic and transcriptomic data of 93 rice (Oryza sativa L.) accessions and developed a comprehensive pipeline to call CNVs in this large-scale dataset. We analyzed the correlation between CNVs and gene expression levels and found that approximately 13% of the identified genes showed a significant correlation between their expression levels and copy numbers. Further analysis showed that about 36% of duplicate pairs were involved in pseudogenetic events while only 5% of them showed functional differentiation. Moreover, the offspring copy mainly contributed to the expression levels and seemed more likely to become a pseudogene, whereas the parent copy tended to maintain the function of ancestral gene. We provide a high-accuracy CNV dataset that will contribute to functional genomics studies and molecular breeding in rice. We also showed that gene dosage effect of CNVs in rice is not exponential or linear. Our work demonstrates that the evolution of duplicated genes is asymmetric in both expression levels and gene fates, shedding a new insight into the evolution of duplicated genes.

Article PDF cannot be displayed. You can download it here:

https://bmcbiol.biomedcentral.com/track/pdf/10.1186/s12915-020-00798-0

A genome-wide survey of copy number variations reveals an asymmetric evolution of duplicated genes in rice

Research article Open Access Published: 26 June 2020 A genome-wide survey of copy number variations reveals an asymmetric evolution of duplicated genes in rice Fengli Zhao1 na1, Yuexing Wang2 na1, Jianshu Zheng1 na1, Yanling Wen3 na1, Minghao Qu1, Shujing Kang1, Shigang Wu1, Xiaojuan Deng1, Kai Hong1, Sanfeng Li2, Xing Qin1, Zhichao Wu1, Xiaobo Wang1, Cheng Ai1, Alun Li1, Longjun Zeng1,4, Jiang Hu2, Dali Zeng2, Lianguang Shang1, Quan Wang1, Qian Qian1,2, Jue Ruan1 & Guosheng Xiong  ORCID: orcid.org/0000-0002-4312-16761,4  BMC Biology volume 18, Article number: 73 (2020) Cite this article 771 Accesses 3 Altmetric Metrics details Abstract Background Copy number variations (CNVs) are an important type of structural variations in the genome that usually affect gene expression levels by gene dosage effect. Understanding CNVs as part of genome evolution may provide insights into the genetic basis of important agricultural traits and contribute to the crop breeding in the future. While available methods to detect CNVs utilizing next-generation sequencing technology have helped shed light on prevalence and effects of CNVs, the complexity of crop genomes poses a major challenge and requires development of additional tools. Results Here, we generated genomic and transcriptomic data of 93 rice (Oryza sativa L.) accessions and developed a comprehensive pipeline to call CNVs in this large-scale dataset. We analyzed the correlation between CNVs and gene expression levels and found that approximately 13% of the identified genes showed a significant correlation between their expression levels and copy numbers. Further analysis showed that about 36% of duplicate pairs were involved in pseudogenetic events while only 5% of them showed functional differentiation. Moreover, the offspring copy mainly contributed to the expression levels and seemed more likely to become a pseudogene, whereas the parent copy tended to maintain the function of ancestral gene. Conclusion We provide a high-accuracy CNV dataset that will contribute to functional genomics studies and molecular breeding in rice. We also showed that gene dosage effect of CNVs in rice is not exponential or linear. Our work demonstrates that the evolution of duplicated genes is asymmetric in both expression levels and gene fates, shedding a new insight into the evolution of duplicated genes. Background Natural variations are the basis of genetic diversity and genome evolution. The detection of natural variations and evaluation of their genetic effects are the keys to understand and interpret the formation of biological phenotypes. Natural variations generally include single nucleotide polymorphisms (SNPs), small InDels (no more than 50 bp), and structural variations (SVs). Copy number variations (CNVs), including deletion and duplication, typically ranged from 1 kb to several Mb [1], are important source of structural variations [2,3,4]. Many methods have been developed to detect CNV, such as fluorescence in situ hybridization (FISH), quantitative polymerase chain reaction (qPCR), and microarray. However, these methods are not suitable to detect CNVs in natural population, due to the low throughput or the low resolution and sensitivity. With the advantages of next-generation sequencing (NGS) technologies, new approaches and algorithms have been developed to detect novel CNVs in recent years [5, 6]. These methods are mainly based on the individual or combination of the following strategies: read-pair (RP), split read (SR), read depth (RD), de novo assembly (AS) [7,8,9]. The complexity of crop genomes and the structure and distribution of CNVs, make it a challenge to comprehensively and accurately detect CNVs among different germplasms of crop. The CNVs occurred in the regulatory sequence region will change the gene expression of their flanking regions; nevertheless, the CNVs occurred in the gene region usually show the dosage effect on gene expression, thus affecting the biological phenotype. The dosage effect of CNVs was more obviously observed in human [10,11,12,13], and mice [14], as genome-wide analysis suggested that 85–95% of detected CNVs were associated with changes in gene expression [10, 14]. However, very few genome-wide analyses of CNVs [15,16,17,18,19,20] and only a few examples of CNVs contributing to phenotypic variation [21,22,23,24,25,26,27] have been reported in crops, but these works were mainly focused on the biological function of a single CNV. Therefore, a large-scale CNV data set with high accuracy will be beneficial to understand the dynamic of genome evolution, provide an insight into the genetic basis of important agricultural traits, and contribute to the crop breeding in the future. Here, we reported a large-scale analysis of the correlation between CNVs and gene expression levels and revealed CNV’s contribution to genetic diversity of germplasms in rice. We generated genomic and transcriptomic data of 93 accessions of rice and developed a new pipeline, which could comprehensively detect genome-wide CNVs with high accuracy. The correlation analysis between gene copy number and expression level found that approximately 13.1% of genes showed significant correlations. Moreover, the analysis of the expression levels and evolutionary fates of different copies revealed an asymmetric evolution of duplicated genes. Results Detection of copy number variations in 93 rice accessions A total of 93 rice accessions including representative landraces and modern cultivars (Additional file 1: Table S1, Fig. 1a, b) were selected for whole-genome resequencing with average depths about 50× and generated a total of 2.06 Tb of raw reads. Using the Nipponbare RefSeq [28] (version 7.0) as reference, the coverage of these accessions’ resequencing data ranged from 82.81% to 96.06%. The rice root samples grown in hydroponic culture for 35 days were collected for RNA-Seq. The data volume of each sample was above 5 Gb (ranged from 5.03 to 9.86 Gb) and 576 Gb raw RNA-seq data were generated from the 93 accessions in total. The rate of uniquely mapped reads ranged from 79.64% to 90.95% (Additional file 1: Table S1). Fig. 1. The result and verification of CNV calling for the 93 rice accessions. a The phylogenetic tree of the 93 O. sativa accessions based on SNP markers, with two O. glumaepatula accessions (W1183 and W1187, purple branches) used as outgroup. And O. sativa Xian group and Geng group were marked yellowgreen and blue, respectively. The red branches represent two tropical O. sativa accessions from Southeast Asia. b Number of deletions (red) and duplications (blue) of each accession compared with the Nipponbare RefSeq. c, d The depth distribution around GL7 (c) and the promotor of IPA1 (d). The red and blue bars showed the duplicated and normal regions, respectively. Each bin represents a length of 5 bp. And XF13 and XF75 were selected as negative controls. e, f The PCR verificati (...truncated)


This is a preview of a remote PDF: https://bmcbiol.biomedcentral.com/track/pdf/10.1186/s12915-020-00798-0
Article home page: https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-020-00798-0

Fengli Zhao, Yuexing Wang, Jianshu Zheng, Yanling Wen, Minghao Qu, Shujing Kang, Shigang Wu, Xiaojuan Deng, Kai Hong, Sanfeng Li, Xing Qin, Zhichao Wu, Xiaobo Wang, Cheng Ai, Alun Li, Longjun Zeng, Jiang Hu, Dali Zeng, Lianguang Shang, Quan Wang, Qian Qian, Jue Ruan, Guosheng Xiong. A genome-wide survey of copy number variations reveals an asymmetric evolution of duplicated genes in rice, BMC Biology, 2020, pp. 1-16, Volume 18, Issue 1, DOI: 10.1186/s12915-020-00798-0