Mathématiques et Informatique Appliquées
du Génome à l'Environnement

 

 

 

ZHU Xingyu

Type
Doctorant.e
Sujet
Design of Information Extraction tools to characterize molecules produced or degraded by microbes and applications to plant-fermented food ecosystems
Date de début
Date de fin
Encadrant(s)
Robert Bossy
Equipe(s)
Bibliome
Contrat de recherche
FAIROmics
Ecole doctorale (pour les thèses)
STIC
Directeur.trice (pour les thèses)
Claire Nédellec
Ecole/université (pour les thèses et les stages)
Université Paris-Saclay
Description/résumé

The Ph.D. project aims to develop information extraction (IE) methods to automatically produce a knowledge graph about microbe biology involved in plant-based food transformation or preservation. The knowledge graph will formalize the molecules produced and degraded by microorganisms in the fermentation process. 

The IE methods will involve named-entity recognition, entity normalization with respect to semantic references and relationship extraction. They will be based on the most recent deep learning approaches that train language models using few or no training examples by transfer learning or exploiting existing structured information, i.e. knowledge bases and ontologies for distant or weak learning by including relevant information according to the needs of the FAIROmics dedicated use cases (e.g. NCBI Taxonomy for taxa, FoodEX2 for food, ChEBI for molecules, KEGG for pathways). Existing annotated corpora will serve as a starting point for training (e.g. CHEMDNER, Pathway Curation, Bacteria Biotope).

The project will rely on existing tools and resources on microbe biology developed by MaIAGE partners (e.g. Omnicrobe application*, Ontobiotope ontology*, extraction workflow).