The complex microbial community living in the human gastrointestinal tract, known as the gut microbiota, exhibits many important functions for its host and is now recognized as a crucial factor in the maintenance of health. It has thus been suggested that it might be used as a medical tool for diagnosis, prognosis, and even prediction of the response to treatment.
However, the specific structure of gut microbiota (notably sparse, compositional and with a hierarchical structure) has poorly been taken into account until now. Inspired by the Poisson-Log-Normal (PLN) approach developed to model dependent count data, we introduce the PLN-Tree model, specifically designed for modeling hierarchical count data. By integrating structured variational inference techniques, we propose an adapted training procedure and establish identifiability results. Experimental evaluations on synthetic datasets as well as real-world microbiome data demonstrate the interest of accounting for the tree structure of the data for capturing complex dependencies.
Mathématiques et Informatique Appliquées
du Génome à l'Environnement