TypeTaxonScript: sugarifying and enhancing data structures in biological systematics and biodiversity research

Biology Methods and Protocols, Jan 2024

Object-oriented programming (OOP) embodies a software development paradigm grounded in representing real-world entities as objects, facilitating a more efficient and structured modelling approach. In this article, we explore the synergy between OOP principles and the TypeScript (TS) programming language to create a JSON-formatted database designed for storing arrays of biological features. This fusion of technologies fosters a controlled and modular code script, streamlining the integration, manipulation, expansion, and analysis of biological data, all while enhancing syntax for improved human readability, such as through the use of dot notation. We advocate for biologists to embrace Git technology, akin to the practices of programmers and coders, for initiating versioned and collaborative projects. Leveraging the widely accessible and acclaimed IDE, Visual Studio Code, provides an additional advantage. Not only does it support running a Node.js environment, which is essential for running TS, but it also efficiently manages GitHub versioning. We provide a use case involving taxonomic data structure, focusing on angiosperm legume plants. This method is characterized by its simplicity, as the tools employed are both fully accessible and free of charge, and it is widely adopted by communities of professional programmers. Moreover, we are dedicated to facilitating practical implementation and comprehension through a comprehensive tutorial, a readily available pre-built database at GitHub, and a new package at npm.

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/biomethods/article-pdf/9/1/bpae017/57136701/bpae017.pdf

TypeTaxonScript: sugarifying and enhancing data structures in biological systematics and biodiversity research

Biology Methods and Protocols, 2024, bpae017 https://doi.org/10.1093/biomethods/bpae017 Advance Access Publication Date: 14 March 2024 Methods Article TypeTaxonScript: sugarifying and enhancing data structures in biological systematics and biodiversity research 3 , 1 ~ o da Flora—CNCFlora, Instituto de Pesquisas Jardim Bota ^ nico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil Centro Nacional de Conservaça ^ nico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil Diretoria de Pesquisa Cient�ıfica—DIPEQ, Instituto de Pesquisas Jardim Bota 3 �ticos e Biotecnologia, Parque Estaça ~ o Biolo � gica–PqEB, Bras�ılia, 70770-901, Brazil Embrapa Recursos Gene 2 �Correspondence address. Centro Nacional de Conservaça ~ o da Flora—CNCFlora, Instituto de Pesquisas Jardim Bota ^ nico do Rio de Janeiro, Rio de Janeiro, 22460-030, Brazil. E-mail: Abstract Object-oriented programming (OOP) embodies a software development paradigm grounded in representing real-world entities as objects, facilitating a more efficient and structured modelling approach. In this article, we explore the synergy between OOP principles and the TypeScript (TS) programming language to create a JSON-formatted database designed for storing arrays of biological features. This fusion of technologies fosters a controlled and modular code script, streamlining the integration, manipulation, expansion, and analysis of biological data, all while enhancing syntax for improved human readability, such as through the use of dot notation. We advocate for biologists to embrace Git technology, akin to the practices of programmers and coders, for initiating versioned and collaborative projects. Leveraging the widely accessible and acclaimed IDE, Visual Studio Code, provides an additional advantage. Not only does it support running a Node.js environment, which is essential for running TS, but it also efficiently manages GitHub versioning. We provide a use case involving taxonomic data structure, focusing on angiosperm legume plants. This method is characterized by its simplicity, as the tools employed are both fully accessible and free of charge, and it is widely adopted by communities of professional programmers. Moreover, we are dedicated to facilitating practical implementation and comprehension through a comprehensive tutorial, a readily available pre-built database at GitHub, and a new package at npm. Keywords: JavaScript; TypeScript; JSON; Mimosa; Node.js; taxonomy; morphology; Leguminosae; Fabaceae; Visual Studio Code; plant Introduction The endeavour to describe and catalogue organisms spans gener ations, contributing significantly to the foundations of biological knowledge and classification. Rooted in historical scientific liter ature, the practice of representing organisms through textual descriptions acts as a bridge connecting past and present scien tific communities [1, 2]. As the digital age dawns, traditional methods merge with contemporary technology [2, 3]. In the present day, taxonomists and systematists often resort to familiar text editors, like Microsoft (MS) Word, to meticulously craft their descriptions. While some practitioners venture into spread sheets for structured data [4], rapid technological advancements unveil new avenues for documentation and data organization. Amidst this evolving landscape, untapped potential arises through cutting-edge methodologies. While digital tools have sig nificantly streamlined numerous research tasks, a notable gap persists between these contemporary solutions and their wide spread acceptance within the scientific community. In this con text, our exploration delves into the symbiotic relationship between object-oriented programming (OOP) and documentoriented databases (DOD). Through this lens, we foresee a paradigm shift propelling biodiversity research into an era of effi ciency, collaboration, and innovation. TypeScript (TS), an extension of JavaScript (JS), is a robust choice for intricate and organized systems [5]. Combining OOP principles with TS creates a powerful development ecosystem, facilitating the building of a JS Object Notation (JSON) database. Moreover, it permits to incorporate multiple layers of data vali dation to establish a highly reliable database. The JSON format emerges as crucial within DOD, standing out for data structuring [6]. Diverging from spreadsheets, JSON’s versa tility and hierarchy accommodate varied data types, ideal for hous ing diverse biological data and annotations. This aligns with complex domains like systematics, chemistry, ecology, reproduc tion, genomics, and proteomics, often better represented through nested hierarchies. In parallel, another approach for biological data management involves ontologies like Gene Ontology [7, 8] and Plant Ontology [9, 10]. These structured vocabularies connect biological concepts intricately. Yet, not only their complexity, but also their repre sentation demands specialized expertise, making JSON simplic ity’s appealing to a wider community. Received: 28 December 2023. Revised: 19 February 2024. Editorial decision: 26 February 2024. Accepted: 12 March 2024 # The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/ by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial reuse, please contact 1,� � Barreto Jorda ~o � Fernando A. Baumgratz2, Marcelo Fragomeni Simon , Marli Pires Morim2, Jose Lucas Sa � L. C. Eppinghaus1 and Vicente A. Calfo1 Andre 2 | ~o et al. Jorda Background JS and Node.js environment JS [12] serves as the foundation for a wide range of modern pro gramming endeavours, including the management of biological data [13–15]. Its versatility and widespread adoption have cata lysed the development of tools and platforms that harness its capabilities. Node.js, a runtime environment built on Chrome’s V8 JS engine, extends the potential of JS beyond the confines of web browsers [16]. It enables the execution of JS code outside of browsers, facili tating server-side scripting. This is particularly advantageous for tasks involving data processing, handling API requests, and manag ing databases [15]. Moreover, Node.js offers access to a wide array of libraries and packages, expediting the development of databases while enhancing its overall functionality. JS and Node.js are powerful tools in the field of biological data management. Their capabilities contribute to the development of efficient, dynamic, and scalable databases, facilitating advance ments in biodiversity research. OOP and TS In the ever-evolving realm of biological data management, the fusion of OOP principles with TS, a powerful programming lan guage extension, marks a significant leap forward. OOP, a para digm in software development, revol (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/biomethods/article-pdf/9/1/bpae017/57136701/bpae017.pdf
Article home page: https://academic.oup.com/biomethods/article/9/1/bpae017/7628625

Jordão, Lucas Sá Barreto, Morim, Marli Pires, Baumgratz, José Fernando A, Simon, Marcelo Fragomeni, Eppinghaus, André L C, Calfo, Vicente A. TypeTaxonScript: sugarifying and enhancing data structures in biological systematics and biodiversity research, Biology Methods and Protocols, 2024, Volume 9, Issue 1, DOI: 10.1093/biomethods/bpae017