Mathématiques et Informatique Appliquées
du Génome à l'Environnement


SULTAN Ibrahim

Statistical modeling of bacterial promoter sequences for regulatory motif discovery
Start date
End date
P. Nicolas (MaIAGE, INRA Jouy en Josas)

Transcription factors play a key role in mediating the adaptation of bacteria to environmental conditions. Powerful algorithms and approaches have been developed for the discovery of their binding sites but automatic de novo identification of the main regulons of a bacterium from genome and transcriptome data remains a challenge. The approach that we propose here to address this task is based on a probabilistic model of the DNA sequence that can make use of precise information on the position of the transcription start sites and of condition-dependent transcription profiles. Two main novelties of our model are to allow overlaps between motif occurrences and to incorporate covariates summarising transcription profiles into the probability of occurrence in a given promoter region. Each covariate may correspond to the coordinate of the gene on an axis (e.g. obtained by PCA or ICA) or to its position in a tree (e.g. obtained by hierarchical clustering). All the parameters are estimated in a Bayesian framework using a dedicated trans-dimensional MCMC algorithm. This allows simultaneously adjusting, for many motifs and with many transcription covariates, the width of the corresponding position weight matrices, the number of parameters to describe positions with respect to the transcription start site, and the covariates that are relevant.

Ecole doctorale (pour les thèses)
Directeur.trice (pour les thèses)
S. Schbath (MaIAGE, INRA Jouy en Josas)
Année de soutenance (pour les thèses ou les stages)
Date de soutenance (pour les thèses)
Ecole/université (pour les thèses et les stages)
Université Paris-Saclay