Mathématiques et Informatique Appliquées
du Génome à l'Environnement

 

 

Lundi 19 octobre 2020

Séminaire
Organisme intervenant (ou équipe pour les séminaires internes)
Icahn School of Medicine at Mount Sinai, Karr Lab
Nom intervenant
Jonathan Karr
Titre
Making biochemical data more accessible, reusable, and composable
Résumé

A more comprehensive understanding of cellular biochemistry will likely be critical to the advancement of precision medicine and synthetic biology. For example, computer-aided design tools based on biochemical models could help bioengineers design synthetic genomes for a wide range of applications. Understanding biochemistry requires multiple types of data about different cellular subsystems. Despite the development of various formats, ontologies, and repositories, obtaining, reusing, and composing data remain three of the biggest bottlenecks to integrative biochemical research. For example, most supplementary materials remain difficult to reuse, most data sets do not provide enough metadata to understand exactly what was measured, and the data that is publicly available is scattered across numerous databases.

To make it easier to find the data needed for integrative biochemical research, we have developed Datanator (https://datanator.info), an integrated database of several key types of molecular data and tools for finding relevant data for specific projects about specific species and reactions in particular organisms and environmental conditions. We assembled much of the content in Datanator from ad hoc supplementary spreadsheets to articles. To make it easier to reuse supplementary tables, we have developed ObjTables (https://objtables.org), a toolkit which makes it easier both for authors to create high quality spreadsheets and for other investigators to reuse them. Due to the importance of modifications to DNA, RNA, and proteins and macromolecular complexes, Datanator captures measurements of non-canonical proteins and complexes. To concretely describe these molecules, we have developed BpForms (https://bpforms.org) and BcForms (https://bcforms.org). BpForms generalizes IUPAC/IUBMB/FASTA to encompass canonical and modified nucleic and amino acids, crosslinks, nicks, and bonds which form circular molecules. BcForms enables concrete descriptions of complexes which can include modified polymers and inter-subunit crosslinks. Together, we anticipate that Datanator, ObjTables, BpForms, and BcForms will facilitate integrative biochemical research.

Lieu
Salle de réunion 142, bâtiment 210 (en viosioconférence)
Date du jour