Mathématiques et Informatique Appliquées
du Génome à l'Environnement

 

 

 

PASSERI Iacopo

Type
Doctorant.e
Sujet
Statistical analysis of methylation patterns from S. meliloti
Date de début
Date de fin
Encadrant(s)
G. Kon Kam King, G. Gautreau, H. Chiapelo
Equipe(s)
StatInfOmics
Ecole doctorale (pour les thèses)
University of Florence ComBo
Directeur.trice (pour les thèses)
Alessio Mengoni
Ecole/université (pour les thèses et les stages)
University of Florence
Description/résumé

The primary focus of our collaborative efforts will be on the statistical analysis of methylation data derived from Pac-Bio-sequenced DNA. The overarching goal is to develop a robust machine learning and statistical model that will pro-vide mathematical insights into the underlying biological processes reflected in the data.

More specifically the project will involve a comprehensive analysis of methylation data obtained through PacBio se-quencing technology. This cutting-edge technique offers a high-resolution view of DNA methylation patterns, provid-ing a wealth of information about epigenetic modifications. The ultimate objective is to unravel the intricate relation-ships between methylation patterns (i.e., methylation of DNA motifs) and biological processes in the symbiotic nitro-gen-fixing alphaproteobacterium Sinorhizobium meliloti. Strains of this species exhibit a multipartite genome struc-ture, comprising a chromosome, a chromid, and a megaplasmid: the pronounced genomic and phenotypic variation observed in these strains positions them as exemplary models for investigating evolutionary hypotheses concerning the interplay between epigenomic signatures, genome structure evolution, and phenotypic transitions. Moreover, since its capability of conducting symbiotic nitrogen fixation upon interacting with legume hosts such as the Medica-go plant, S. meliloti represents an element of strong interest for the agritech field and for green revolution technolo-gies applications.

 

1. Data Collection and Preprocessing:

Acquirement of PacBio sequencing data from S. meliloti.

Quality control and preprocessing steps to ensure data integrity.

 

2. Feature Selection and Extraction:

Implement quality control measures to eliminate noise and irrelevant information.

Develop methods for extracting meaningful features from the raw data.

Identify relevant features that contribute significantly to the methylation patterns (MeStudio software).

 

3. Model Development:

Utilize machine learning techniques to build a predictive model.

Implement statistical methods to quantify the relationships between methylation patterns and biological factors.

Validate and refine the model through iterative testing and optimization.

 

4. Mathematical Insight and Interpretation:

Derive mathematical insights from the developed model.

Interpret the findings in the context of biological processes and phenomena.

Collaborate closely with the scientific group to ensure the biological relevance of the mathematical insights.

 

5. Documentation and Reporting:

Maintain detailed documentation of the entire process, including methodologies and code.

Provide regular updates to the scientific group on progress, challenges, and potential solutions.

Generate a comprehensive final report summarizing the methodology, results, and implications of the study.