The complete swine olfactory subgenome: expansion of the olfactory gene repertoire in the pig genome
Nguyen et al. BMC Genomics 2012, 13:584
http://www.biomedcentral.com/1471-2164/13/584
RESEARCH ARTICLE
Open Access
The complete swine olfactory subgenome:
expansion of the olfactory gene repertoire in the
pig genome
Dinh Truong Nguyen1†, Kyooyeol Lee1†, Hojun Choi1, Min-kyeung Choi1, Minh Thong Le1, Ning Song1,
Jin-Hoi Kim1, Han Geuk Seo1, Jae-Wook Oh2, Kyungtae Lee3, Tae-Hun Kim3 and Chankyu Park1*
Abstract
Background: Insects and animals can recognize surrounding environments by detecting thousands of chemical
odorants. Olfaction is a complicated process that begins in the olfactory epithelium with the specific binding of
volatile odorant molecules to dedicated olfactory receptors (ORs). OR proteins are encoded by the largest gene
superfamily in the mammalian genome.
Results: We report here the whole genome analysis of the olfactory receptor genes of S. scrofa using conserved OR
gene specific motifs and known OR protein sequences from diverse species. We identified 1,301 OR related
sequences from the S. scrofa genome assembly, Sscrofa10.2, including 1,113 functional OR genes and 188
pseudogenes. OR genes were located in 46 different regions on 16 pig chromosomes. We classified the ORs into 17
families, three Class I and 14 Class II families, and further grouped them into 349 subfamilies. We also identified
inter- and intra-chromosomal duplications of OR genes residing on 11 chromosomes. A significant number of pig
OR genes (n = 212) showed less than 60% amino acid sequence similarity to known OR genes of other species.
Conclusion: As the genome assembly Sscrofa10.2 covers 99.9% of the pig genome, our analysis represents an
almost complete OR gene repertoire from an individual pig genome. We show that S. scrofa has one of the largest
OR repertoires, suggesting an expansion of OR genes in the swine genome. A significant number of unique OR
genes in the pig genome may suggest the presence of swine specific olfactory stimulation.
Keywords: Olfactory receptor, Pigs, Olfaction, OR genes
Background
Insects and animals can recognize the world around
them by detecting thousands of chemical odorants. In
mammals, odorant molecules are detected by olfactory
receptors (ORs), which are part of the G-proteincoupled receptor superfamily of proteins having seven
transmembrane domains. This superfamily was first discovered in rodents about two decades ago [1]. Olfaction
is a complicated process; it begins in the olfactory epithelium with the specific binding of volatile odorant
molecules to dedicated ORs expressed by olfactory sensory neurons (OSNs) [2-5].
* Correspondence:
†
Equal contributors
1
Department of Animal Biotechnology, Konkuk University, 263 Achasan-ro,
Gwangjin-gu, Seoul 143-701, South Korea
Full list of author information is available at the end of the article
OR proteins are encoded by the largest gene superfamily in the mammalian genome. Using the available genome sequences, several studies have been conducted to
elucidate OR subgenomes in species such as mice [6-9],
humans [10-13], dogs and rats [14-16], and other vertebrates [14,17-19]. OR gene families can be grouped into
the following two classes: the fish-like Class I ORs consisting of 17 families and the tetrapod-specific Class II
ORs consisting of 14 families [18]. The number of functional OR genes ranges from less than 100 in some
fishes including fugu (n = 44) and tetraodon (n = 42) [20]
to ~1,200 in rats. A significant number of OR genes
have pseudogenes, and the fraction of OR pseudogenes
ranges from less than 20% in the opossum to more than
50% in humans or platypus [14,17]. Interestingly, in spite
of the large number of genes that make up the OR
© 2012 Nguyen et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Nguyen et al. BMC Genomics 2012, 13:584
http://www.biomedcentral.com/1471-2164/13/584
subgenome, most OR neurons express a single gene and
in fact, even just a single allele [1,21].
Pigs are an attractive animal model to study olfaction
and its influence on animal behavior because of their
agricultural importance and their strong reliance on
their sense of smell in various behavioral contexts. The
characterization of the swine OR gene repertoire is necessary to better understand the underlying biology of
olfaction in pigs. In addition, the comparison of OR
gene repertoires and the abilities to smell among evolutionarily important animals is an interesting subject. In
this study, we analyzed the pig genome assembly
Sscrofa10.2, constructed by the Swine Genome Sequencing Consortium (SGSC), to characterize OR genes in
pigs. We report here the nearly complete porcine olfactory subgenome. In addition, we classified the pig OR
genes into families and compared OR gene repertoires
of humans, dogs, mice, and pigs.
Methods
Detection of OR genes from the pig genome
The swine draft genome sequences (Sscrofa10.2) were
retrieved from the National Center for Biotechnology Information (NCBI). A translated basic local alignment
search tool (TBLASTN) search was performed to identify
regions containing OR related sequences that had at least
two of the following conserved motifs: MAYDRYVAIC
(TMIII), KAFSTCASH (TMVI), and PMLNPFIY (TMVII),
or their variants with less than 50% of sequence difference
from the conserved motifs. From the identified regions,
we selected the sequences in the region one kilobase (kb)
upstream and downstream of the BLAST matches. From
the analysis, we identified 1,644 OR candidate sequences
that were 2 kb in length and translated to amino acid
sequences in all six frames. Then, we retrieved 24,809 OR
protein sequences from 222 species from NCBI and performed a protein BLAST (BLASTP) analysis against the
translated OR candidate sequences to determine the positions of the start and stop codons of the open reading
frames (ORFs) on the basis of structural similarity to
known OR proteins. For sequences that deviated from the
sequences of reported OR proteins, the methionine and
stop codon most similar in sequence context to those of
the coding sequences of known OR proteins were selected
as the start and end of the coding regions. We again performed TBLASTN analysis against the 1,644 sequences to
evaluate the presence of all four conserved motifs [GN,
MAYDRYVAIC (TMIII), KAFSTCASH (TMVI), and
PMLNPFIY (TMVII)]. The candidate sequences were considered “functional ORs” if they were at least 300-amino
acid long without any interrupting stop codons and/or frameshifts within the ORFs, “OR pseudogenes” if they were
at least 300-amino acid long but contained stop codons or
frameshifts within the ORFs, and “partial ORs” if they
Page 2 of 12
were shorter than 300 amino acids in length but
matched the sequences of the known OR genes.
Sequen (...truncated)