The inclusion of prior biological knowledge in statistical and machine learning models raises fundamental questions about how phenotypes are defined and inferred from data. Rather than being fixed targets, phenotypes can be viewed as model-dependent objects shaped by data representation, constraints, and external knowledge sources. This seminar will discuss this perspective through two types of biological data: protein sequences and gut microbiota profiles. We will examine how Transformer-based models for sequence analysis and knowledge-informed representations of microbiota data incorporate prior information at different stages of the modeling pipeline. These examples will be used to discuss the impact of prior knowledge on interpretability, robustness, and phenotype inference.
Mathématiques et Informatique Appliquées
du Génome à l'Environnement