ZHU Xingyu

Type

Doctorant.e

Sujet

Design of Information Extraction tools to characterize molecules produced or degraded by microbes and applications to plant-fermented food ecosystems

Date de début

dim 01/12/2024 - 12:00

Date de fin

ven 31/12/2027 - 12:00

Encadrant(s)

Robert Bossy

Accueil du doctorant

Le ou la doctorant·e est hébergé·e, rattaché·e à l'unité MaIAGE.

Equipe(s)

Bibliome

Contrat de recherche

FAIROmics

Ecole doctorale (pour les thèses)

STIC

Directeur.trice (pour les thèses)

Claire Nédellec

Ecole/université (pour les thèses et les stages)

Université Paris-Saclay

Description/résumé

The Ph.D. project aims to develop information extraction (IE) methods to automatically produce a knowledge graph about microbe biology involved in plant-based food transformation or preservation. The knowledge graph will formalize the molecules produced and degraded by microorganisms in the fermentation process.

The IE methods will involve named-entity recognition, entity normalization with respect to semantic references and relationship extraction. They will be based on the most recent deep learning approaches that train language models using few or no training examples by transfer learning or exploiting existing structured information, i.e. knowledge bases and ontologies for distant or weak learning by including relevant information according to the needs of the FAIROmics dedicated use cases (e.g. NCBI Taxonomy for taxa, FoodEX2 for food, ChEBI for molecules, KEGG for pathways). Existing annotated corpora will serve as a starting point for training (e.g. CHEMDNER, Pathway Curation, Bacteria Biotope).

The project will rely on existing tools and resources on microbe biology developed by MaIAGE partners (e.g. Omnicrobe application*, Ontobiotope ontology*, extraction workflow).

Mathématiques et Informatique Appliquéesdu Génome à l'Environnement

Mathématiques et Informatique Appliquées
du Génome à l'Environnement