LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase

Database, Jan 2011

Compositional bias (i.e. a skew in the composition of a biological sequence towards a subset of residue types) can occur at a wide variety of scales, from compositional biases of whole genomes, down to short regions in individual protein and gene–DNA sequences that are compositionally biased (CB regions). Such CB regions are made from a subset of residue types that are strewn along the length of the region in an irregular way. Here, we have developed the database server LPS-annotate, for the analysis of such CB regions, and protein disorder in protein sequences. The algorithm defines compositional bias through a thorough search for lowest-probability subsequences (LPSs) (i.e., the least likely sequence regions in terms of composition). Users can (i) initially annotate CB regions in input protein or nucleotide sequences of interest, and then (ii) query a database of greater than 1 500 000 pre-calculated protein-CB regions, for investigation of further functional hypotheses and inferences, about the specific CB regions that were discovered, and their protein disorder propensities. We demonstrate how a user can search for CB regions of similar compositional bias and protein disorder, with a worked example. We show that our annotations substantially augment the CB-region annotations that already exist in the UniProt database, with more comprehensive annotation of more complex CB regions. Our analysis indicates tens of thousands of CB regions that do not comprise globular domains or transmembrane domains, and that do not have a propensity to protein disorder, indicating a large cohort of protein-CB regions of biophysically uncharacterized types. This server and database is a conceptually novel addition to the workbench of tools now available to molecular biologists to generate hypotheses and inferences about the proteins that they are investigating. It can be accessed at http://libaio.biol.mcgill.ca/lps-annotate.html. Database URL: http://libaio.biol.mcgill.ca/lps-annotate.html

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:


LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase

Pascale Gaudet Amos Bairoch Dawn Field Susanna-Assunta Sansone Chris Taylor Teresa K. Attwood Alex Bateman Judith A. Blake Carol J. Bult J. Michael Cherry Rex L. Chisholm Guy Cochrane Charles E. Cook Janan T. Eppig Michael Y. Galperin Robert Gentleman Carole A. Goble Takashi Gojobori John M. Hancock Douglas G. Howe Tadashi Imanishi Janet Kelso David Landsman Suzanna E. Lewis Ilene Karsch Mizrachi Sandra Orchard B.F. Francis Ouellette Shoba Ranganathan Lorna Richardson Philippe Rocca-Serra Paul N. Schofield Damian Smedley Christopher Southan Tin W. Tan Tatiana Tatusova Patricia L. Whetzel Owen White Chisato Yamasaki on behalf of the BioDBCore working group Introduction Gather the necessary information to provide a general overview of the database landscape, and compare and contrast the various resources. Encourage consistency and interoperability between resources. Promote the uptake and use of semantic and syntactic standards. Provide guidance for users when evaluating the scope and relevance of a resource, as well as details of the data access methods supported. Ensure that the collective impact of these resources is maximized. 1. Database name 2. Main resource URL 8. Taxonomic coverage Participation of the biocuration community in the BioDBCore initiative Long-term vision and potential impact ; g in to cn en om sam ,sd rth and iteon i(vano istxe itto aab a t ce in te n u e n l n o seeeoqunm lilscedonud irrtceeddpo ,seeenGO i,fsttaoon i(ttaoong ,)seeeng ;i;sstranp iirtscxedpo Ii,)trraePno irtLahoCOM ,i)rraaPnodp llllitrcaauo ;i)ttreopno i,i-)sstrcPo srrscaeehd n m p typ ann ann anm rteu tte (v (v in ce rP Sw re e e .2 saeabnm rsrceeou itfrcaonm srrceeou lisaebhd iitfsono :e tsyep leb taaD ianM tCno teaD ste oCnd cSop taaD a . . . . . . T 1 2 3 4 5 6 , I : y / le .c lz go en in le ce GM t lo e ra l t G tS eA eun I,D g O in ,v rd n ) s q pn :s a A ad yO CN seu ou se ed , e f g om ST n a n m GM oM IM ed io itf a D FA (ts to ( n rs ,f im ta : n an IG IG GM isro ve i,g 3D aD cen ICD io A M M ,n ve IR g t se , , jp ) ) eu SD lrcau ouM IISFEH iItnD itaom SCND /LPEBM .IebD :seag randd randd seebq IedN anu PA ISM rePo Irfno II,D I,D rpo Im (tsa (tsa rPo ison a I P P P P P r a L se iv Q m m S pu ted ro t u e d u f sh d lb e n .t te la iz s from itrb i. i a i w m ic a m Q n w b d va to FA ta co w su e a /rg /sogm .reog tru nd .eo .c m rae spo is om l ilx on ilt rre sse en tm h eg l ed co lya e g sh en sta tm h a n st .s p e .h ils ta a ea tn .o .y ry b d d d .y te w w sa pu P eh tea ww Con /:/ww //:ww l/sgo rom eTH lisbu rrpo //:w lep ttp ttp lep f m p co p H h h h ta So to in tt a h D o d s O n a e se eh to G a le le h c t k /g , t t a g .tyaed lssaeee iongbg ltogo l.thm isak rcaph iitrop teanm ,sem iirann b r on .sc lin se sc . cu AQ ,t iinngo ltyhon staeaab //:eegn .tsdon itaon iGOm llfeud frtceea ttedon ,sFaeg rfeobm s r m d tp te m A a in ir p w e t V h W , m . 7 ry ta rfo r ts 40 to 1 is 0 sno ftae enm 468 o g 3 ep itn ly is d ee 1 rF itm tae ism tep rag I:D AG bu im bu cce ce PM 8 s x S a n , 2 to d ro .) 1 1 e p e sp rau 5 0 s i 6 2 tea ifr (ap itm ruo ssa 20 99 d ve s t g y 8 1 p p en re lit :10 I:D k s f .u lep lin e o g g h n ap ion o ive ,s / ll_ ,.stce .xeengh l/ll_eap irtaoom srcaeh isrtcep ,sedm i,llam iitxehb . l.straog ii/ssaon .ltshm ,FnAQ /:/ttph /eahg ifann llano llfadu .s ,FnAQ ,rssyea iteegn ssrkpoh saeou s_ubm itonop ito ree /kem l,so le o se ito lo m o .em taa _n a b t c a g t w o ten dh .cau .lAm livaaa ing rop ten i,se ao co ww /ed iiss e h u u a a p F m t d q d /g th on ta (se re tbu isse .ro .x rko ,ead ly th g lif se ed ek (o n in l ab Is txe trs sw sa ew lyh ison irca itcy ilFe ,L op trao taab eon ton rve isto //:d lep le taa uo aD rf Conflict of interest. None declared.

This is a preview of a remote PDF: https://database.oxfordjournals.org/content/2011/baq031.full.pdf

Djamel Harbi, Manish Kumar, Paul M. Harrison. LPS-annotate: complete annotation of compositionally biased regions in the protein knowledgebase, Database, 2011, DOI: 10.1093/database/baq031