Mathématiques et Informatique Appliquées
du Génome à l'Environnement

 

Characterisation of Knowledge Graph through network tools: application to the Omnicrobe knowledge graph (DC11)

Durée
36 mois
Date de début
Date limite de candidature

Offer description
In brief:
We are looking for one Doctoral Candidate (DC) to join our project at multiple sites in the EU with a master’s degree in a relevant discipline (Physics, Computer science, Mathematics, Statistics, Engineering or related fields) interested in learning and developing Network and Machine Learning tools for the analysis and construction of Knowledge Graphs related to bacterial communities used for food processing.

FAIROmics project:
The FAIROmics initiative, an interdisciplinary research programme, will gather universities, research centres and private companies to enable the FAIRification of omics data and databases interoperability and develop knowledge graphs for data-driven decision-making to rationally design microbial communities for imparting desirable characteristics to plant-based fermented foods in the context of open science and its regulations. The FAIROmics training programme aims to develop doctoral candidates’ skills at the interface between artificial intelligence, life sciences, humanities, and social sciences.

Scientific context:
Plant-based dairy and meat alternatives have grown in popularity in recent years for various reasons, including sustainability and health benefits, as well as lifestyle trends and dietary restrictions. However, plant-based food products can be nutritionally unbalanced, and their flavour profiles may limit their acceptance by consumers. Microorganisms have been used in making food products for millennia. However, the diversity of microbial communities driving plant-based fermentations, as well as their key genetic and phenotypic traits and potential synergies among community members, remain poorly characterised. Many data exist, but they are spread into different literature (scientific and grey) or, in the best case, in different databases, but not always reusable because they are difficult to find and access and because databases are not systematically interoperable.

Objectives:
To understand the network structure of Knowledge Graphs to develop and test algorithms for their characterisation and optimisation. In particular, spectral approaches based on the network Laplacian operator and techniques for node embedding derived from AI (eg DeepWalk, node2vec, Transformer Networks or Autoencoders) will be tested, providing an interpretation of KG elements useful for manifold learning or geometric deep learning. Moreover, network analysis theory, such as community structure characterisation or identification of key elements (nodes, links, pathways), will be studied and applied to the available cases.

In FAIROmics project framework, the nodes of the graph represent biological entities, e.g. bacteria, food matrix, food ingredient, metabolites, genes, function of these genes, etc. The edges represent relationships between these entities e.g. bacteria growing in food matrix. Moreover, the entities are themselves linked to reference classes, defined in knowledge graphs such as ontologies (e.g. bacteria taxa in NCBI taxonomy, food matrix in FoodEx2).

Specific data will be analysed and produced within the FAIRomics project, and as a starting case study the Omnicrobe knowledge graph, containing bacteria habitats and phenotypes, will be analysed to characterise element similarity at different levels (nodes, paths, modules, communities), allowing to check network structure, possible inconsistencies, missing or hidden relationships.

Expected results:
It is expected to get a deeper comprehension of Knowledge Graph structures and how to query and manipulate them. This should allow them to improve their understanding and usability, for example by 1) identifying hidden relationships through link imputation and analysis of node embedding similarity; 2) extracting possible relevant outliers or anomalies (regarding ontologies/nodes and/or relationships/links) corresponding to wrong elements within KGs; 3) clustering of KG elements through network community algorithms; 4) identify "knowledge modules" through network diffusion algorithms. The developed Network tools are also useful in a wide range of contexts, from biological networks (eg Protein interaction, Gene regulation, Microbial community ecology) to Social networks (structure and dynamics of social networks, sentiment analysis, node classification, etc).

Location and planned secondments:
The secondment will occur in INRAE MaIAGE (DrProf. H. Chiapello) at Month 14 (12 months period) to apply the developed algorithms to the Omnicrobe Knowledge Graph https://maiage.inrae.fr/fr/node/2694 in which ground truth information is available, in order to train and test the models and tools designed for the analysis of FAIROmics KG.

Enrolment in Doctoral degree
1st-degree awarding organisation: Alma Mater Studiorum – University of Bologna, Bologna IT https://www.unibo.it/en/homepage
2nd-degree awarding organisation: University Paris- Saclay https://www.universite-paris-saclay.fr/en

Required skills/qualifications
- Master's degree in Physics, Computer science, Mathematics, Statistics, Engineering or related fields, giving access to PhD school and NOT to have any kind of PhD degree. Although appreciated, previous research experience (which must be no longer than four years) is not mandatory.
- Networking and good communication skills (writing and presentation skills).
- Willingness to travel abroad for the purpose of research, training and dissemination.
- Good skills in programming of high-level languages like Python, R, Matlab (not mandatory but highly recommended for network tools development and usage).

Eligibility criteria
- Any nationality
- Doctoral Candidate (DC): The applicant must not have been awarded a doctoral degree.
- Mobility rule: The DC must not have resided or carried out main activity (work, studies, etc.) in the country of their host organisation for more than 12 months* in the 3 years immediately prior to the date of selection in the same appointing international organisation.
* EXCLUDED: short stays such as holidays, compulsory national services such as mandatory military service and procedures for obtaining refugee status under the General Convention.
- Language: Applicants must demonstrate fluent reading, writing and speaking abilities in English (B2).

Supervisors team
The lead supervisor is D. Remondini, full professor at the Department of Physics and Astronomy at Alma Mater Studorium – Bologna University. He works in the application of mathematical models in Biology, such as Network Theory for the study of Complex Systems, and the development of innovative algorithms for the analysis of high dimensional biological, biomedical and virological data (multiple omics, NGSeq, Neuroimaging, text data) with Machine Learning and AI techniques. He actually leads a group with 4 PhD students (1 ITN PhD student) 3 PostDoc Students, and 3 Research Assistants, with >50 Undergraduate and Master Thesis students in Physics. The Co-supervisor is E. Giampieri (Assistant Professor), with expertise in scientific computing, data management analysis and modelling, and supervisor of >20 Undergraduate and Master Thesis students in Physics.

During the secondment at INRAE-Paris Saclay University, two MaIAGE teams will be involved in the PhD supervision: the StatInfOmics team (https://maiage.inrae.fr/en/statinfomics) and the Bibliome team (https://maiage.inrae.fr/en/bibliome):
- Hélène Chiapello (StatInfOmics): Microbial bioinformatics, omics data;
- Sandra Dérozier (StatInfOmics): Microbial bioinformatics, software engineering;
- Robert Bossy (Bibliome): Natural Language Processing and application to microbiology, software engineering;
- Claire Nédellec (Bibliome): Natural Language Processing and application to microbiology, knowledge representation and ontology.

Host institutions description
The project will occur at the Laboratory of Applied Physics and Systems Biophysics, Department of Physics and Astronomy (DIFA) of the Alma Mater Studiorum - University of Bologna, Italy. DIFA is one Department of the Science School and one of the most scientifically productive Physics Departments in Italy. DIFA has a large computing facility, available to the Biophysics group (14-core HPC, 2 GPU server with >1TB RAM and 2 nVidia A100, mirrored storage server with >100 Tb storage), and to the whole Department (OPH HPC facility, >200 cores). Prof. Daniel Remondini is the director of the lab, with specific expertise in biomedical data analysis (Machine Learning, Deep Learning), complex network theory and its applications to BioMedicine. Dr Enrico Giampieri has specific expertise in scientific computing, including networks, stochastic processes and statistics in High-Performance Computing environments. All the lab members are involved in several national and EU projects (Precision Medicine, Epidemiology, Public Health, Food Production).

The secondment will take place at INRAE MaIAGE. INRAE is Europe’s top agricultural research institute and the world’s number two center for the agricultural sciences. Its scientists are working towards solutions for society’s major challenges. RU1404 MaIAGE gathers mathematicians, computer scientists, bioinformaticians and biologists to tackle problems from biology, agronomy and ecology. Our research concerns processes at various levels, ranging from molecular, cellular or multicellular levels to organisms, populations, and entire ecosystems.

We offer
- A comprehensive, interactive and international training programme covering the broader aspects and interface between life science, data science, artificial intelligence and humanities and social sciences, as well as transferable skills
- An enthusiastic team of professionals to co-operate with
- Personal Career Development Plan (PDCP) to prepare young researchers for their future careers
- Each DC will undergo individual training at individual institutes according to the PCDP description
- An attractive compensation package in accordance with the MSCA-DN programme regulations for doctoral candidates. The exact salary will be confirmed and will be based on a living allowance of 3400€/month (correction factor to be applied per country) + mobility allowance of 600€/month. Additionally, researchers may also qualify for a family allowance* of 660€/month, depending on the family situation. Taxation and social (including pension) contribution deductions based on national and company regulations will apply. 
*family = be married/be in a relationship with equivalent status to a marriage recognised by the legislation of the country or region where it was formalised/have dependent children who are being maintained by the researcher.

Selection process

  1. Candidates apply for a position using the online application form (accessible here).
  2. The FAIROmics Project Manager provides a first screen of the written applications to check the eligibility of the candidate and forwards the eligible applications to the DC supervisors.
  3. The DC supervisors will select the best candidates based on CV, academic records, recommendation and motivation letters and adequate skill set. To better assess the best candidate, the shortlisted candidates might be asked to write an abstract of provided scientific documents relevant to the research subject.
  4. The selected applicants will be interviewed through an online meeting by the Selection Committee (two main supervisors and two representatives of a beneficiary or associated partner, with at least one person external to the DC’s project).
  5. The best candidates will be chosen by the main supervisors. The European Project Manager will communicate the successful candidates to the Consortium and Partners.
Contact
Hélène Chiapello, helene.chiapello@inrae.fr
Sandra Dérozier, sandra.derozier@inrae.fr