Advanced search    

Search: authors:"Geoffrey I. Webb"

21 papers found.
Use AND, OR, NOT, +word, -word, "long phrase", (parentheses) to fine-tune your search.

Structural and dynamic properties that govern the stability of an engineered fibronectin type III domain

Consensus protein design is a rapid and reliable technique for the improvement of protein stability, which relies on the use of homologous protein sequences. To enhance the stability of a fibronectin type III (FN3) domain, consensus design was employed using an alignment of 2123 sequences. The resulting FN3 domain, FN3con, has unprecedented stability, with a melting temperature ...

Discovering significant patterns

Geoffrey I. Webb 0 0 G.I. Webb ( ) Faculty of Information Technology, Monash University , PO Box 75, Clayton, Vic. 3800, Australia The following error appeared in the paper by G. I. Webb

Layered critical values: a powerful direct-adjustment approach to discovering significant patterns

Geoffrey I. Webb 0 0 G.I. Webb ( ) Faculty of Information Technology, Monash University , Clayton Campus, Wellington Road, Clayton, Vic, Australia Standard pattern discovery techniques, such as

Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning

Fei Zheng Geoffrey I. Webb Pramuditha Suraweera Liguang Zhu Editors: Mark Craven and Johannes Frnkranz. Semi-naive Bayesian techniques seek to improve the accuracy of naive Bayes (NB) by relaxing

Discretization for naive-Bayes learning: managing discretization bias and variance

Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we ...


Data Min Knowl Disc Geoffrey I. Webb 0 0 G. I. Webb ( With this issue, Data Mining and Knowledge Discovery celebrates 10 years of publication. Data Mining is arguably the most successful area of

Discovering Significant Patterns

Geoffrey I. Webb 0 0 G.I. Webb ( ) Faculty of Information Technology, Monash University , PO Box 75, Clayton, Vic. , 3800, Australia Pattern discovery techniques, such as

Anytime classification for a pool of instances

In many real-world applications of classification learning, such as credit card transaction vetting or classification embedded in sensor nodes, multiple instances simultaneously require classification under computational resource constraints such as limited time or limited battery capacity. In such a situation, available computational resources should be allocated across the ...

Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs

Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. ...

PROSPER: An Integrated Feature-Based Tool for Predicting Protease Substrate Cleavage Sites

The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically ...

TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the ...

Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification

Averaged n-Dependence Estimators (AnDE) is an approach to probabilistic classification learning that learns by extrapolation from marginal to full-multivariate probability distributions. It utilizes a single parameter that transforms the approach between a low-variance high-bias learner (Naive Bayes) and a high-variance low-bias learner with Bayes optimal asymptotic error. It ...

Feature-subspace aggregating: ensembles for stable and unstable learners

This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our ...

EGM: encapsulated gene-by-gene matching to identify gene orthologs and homologous segments in genomes

Motivation: Identification of functionally equivalent genes in different species is essential to understand the evolution of biological pathways and processes. At the same time, identification of strings of conserved orthologous genes helps identify complex genomic rearrangements across different organisms. Such an insight is particularly useful, for example, in the transfer of ...

On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions

We counsel caution in the application of ROC analysis for prediction of classifier performance under varying class distributions. We argue that it is not reasonable to expect ROC analysis to provide accurate prediction of model performance under varying distributions if the classes contain causally relevant subclasses whose frequencies may vary at different rates or if there are ...

Cascleave: towards more accurate prediction of caspase substrate cleavage sites

Motivation: The caspase family of cysteine proteases play essential roles in key biological processes such as programmed cell death, differentiation, proliferation, necrosis and inflammation. The complete repertoire of caspase substrates remains to be fully characterized. Accordingly, systematic computational screening studies of caspase substrate cleavage sites may provide insight ...

Not So Naive Bayes: Aggregating One-Dependence Estimators

GEOFFREY I. WEBB JANICE R. BOUGHTON ZHIHAI WANG 0 Tom Fawcett 0 School of Computer Science and Software Engineering, Monash University , Vic. 3800, Australia Of numerous proposals to improve the

Prodepth: Predict Residue Depth by Support Vector Regression Approach from Protein Sequences Only

Residue depth (RD) is a solvent exposure measure that complements the information provided by conventional accessible surface area (ASA) and describes to what extent a residue is buried in the protein structure space. Previous studies have established that RD is correlated with several protein properties, such as protein stability, residue conservation and amino acid types. ...

An Experimental Evaluation of Integrating Machine Learning with Knowledge Acquisition

GEOFFREY I. WEBB JASON WELLS ZIJIAN ZHENG 0 Pat Langley 0 School of Computing and Mathematics, Deakin University , Geelong, Victoria 3217, Australia Machine learning and knowledge acquisition