PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-022-04727-6

PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling

(2022) 23:197 Joshi and Blankenberg BMC Bioinformatics https://doi.org/10.1186/s12859-022-04727-6 BMC Bioinformatics Open Access SOFTWARE PDAUG: a Galaxy based toolset for peptide library analysis, visualization, and machine learning modeling Jayadev Joshi1 and Daniel Blankenberg1,2* *Correspondence: 1 Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA Full list of author information is available at the end of the article Abstract Background: Computational methods based on initial screening and prediction of peptides for desired functions have proven to be effective alternatives to lengthy and expensive biochemical experimental methods traditionally utilized in peptide research, thus saving time and effort. However, for many researchers, the lack of expertise in utilizing programming libraries, access to computational resources, and flexible pipelines are big hurdles to adopting these advanced methods. Results: To address the above mentioned barriers, we have implemented the peptide design and analysis under Galaxy (PDAUG) package, a Galaxy-based Python powered collection of tools, workflows, and datasets for rapid in-silico peptide library analysis. In contrast to existing methods like standard programming libraries or rigid singlefunction web-based tools, PDAUG offers an integrated GUI-based toolset, providing flexibility to build and distribute reproducible pipelines and workflows without programming expertise. Finally, we demonstrate the usability of PDAUG in predicting anticancer properties of peptides using four different feature sets and assess the suitability of various ML algorithms. Conclusion: PDAUG offers tools for peptide library generation, data visualization, built-in and public database peptide sequence retrieval, peptide feature calculation, and machine learning (ML) modeling. Additionally, this toolset facilitates researchers to combine PDAUG with hundreds of compatible existing Galaxy tools for limitless analytic strategies. Introduction Interest in peptides-related research has been gaining in popularity over the last several decades [35]. A large number of naturally occurring peptides (over 7000) with potentially important roles in human physiology have been identified. Currently, more than 140 peptide therapeutics are in different stages of clinical trials [17]. In view of their integral importance in a number of signal transduction pathways, they are ideal candidates for functioning as drugs, especially as anticancer or antimicrobial agents [1]. Usually, peptides are naturally occurring molecules that are synthesized by cellular processes and adopt alternative conformations according to their biological © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Joshi and Blankenberg BMC Bioinformatics (2022) 23:197 functions [35]. Peptides can either act as natural ligands in the form of cofactors, coenzymes, and hormones, or directly interact with macromolecules including proteins, RNA, or DNA [15]. The research underlying the design of therapeutic peptides, such as peptide-based drugs and vaccines, demands intense effort and assets for establishing their pharmacokinetic and pharmacodynamic properties such as serum stability, bioavailability toxicity, etc. [7, 44]. Peptide-based vaccines have emerged as a powerful approach to counter infectious diseases and cancer [37]. Characterization of peptides that bind to specific major histocompatibility complex (MHC) molecules is therefore of great importance for peptide-based vaccines. However, in comparison to expensive and lengthy biochemical experiments, bioinformatics methods for predicting MHC binding peptides have been very popular in recent years [24, 28, 45]. Various computational approaches have been shown to offer the best cost–benefit ratio across translational research areas [50, 59, 60]. Leveraging in-silico approaches to uncover peptides with desired pharmacological action can be expected to significantly lower the cost and time required to establish a drug or a vaccine candidate [34]. In fact, computational predictions of peptides with desired functions have been providing effective alternatives to traditional methods in peptide research, thus saving time and effort [5, 22, 25, 33, 39, 52]. The concept of prioritizing sequence-based properties of a protein sequence as a function of sequence-derived features is not new [29]. Over the past decade, approaches based on physicochemical, compositional properties, k-mer counting, etc. have been proposed [10, 51, 62]. With the rise of computational power, feature-based methods have evolved substantially, expanding into the analysis of 3D structure level of biomolecules [23]. However, necessary programming and mathematics expertise, as well as limitations in hardware resources, are among the core challenges associated with utilizing programming-based resources [30, 49]. Web-based data analysis platforms, such as Galaxy [2, 19, 26], have been providing a user-friendly solution to enable researchers to include advanced data analysis methods in their work. Galaxy is an open-source, web-based platform for accessible, reproducible, and transparent computational research. It provides a wealth of computational tools, workflows, and training materials for advance data visualization and analysis. In this paper, we present PDAUG, a Galaxy tool suite that includes 24 different tools for the analysis of peptide libraries. The main objective of this paper is to provide a set of user-friendly tools for peptide library generation, visualization, machine learning (ML) modeling and analysis. PDAUG provides user-friendly tools in various categories including peptide library generation, feature analysis, data visualization and plotting, ML modeling, and dataset retrieval. These modular command-line tools leverage the Galaxy platform to provide an interactive graphical interface for each tool as well as a (...truncated)