The internship project will aim at identifying micro-organisms fluxes between ecosystems. This work is a part of the TANDEM project, which is an INRAE flagship project that gathers 10 teams. It aims to better understand micro-organisms fluxes in an agri-food cheese production chain. Metagenomic samples have been collected from 10 compartments, from grass to cheese through milk, cow bedding, rumen... Our idea is to identify species present in metagenomic samples and detect if some strains are shared between samples, to infer flows between the various compartments. A workflow has been developed for a previous project, based on the mapping of metagenomic reads on a dedicated catalog of reference genomes. Shared nucleotide polymorphisms across samples are used to identify strain fluxes with various statistical techniques.
The objective of the internship will be to adapt the workflow to the specificities of samples of the current TANDEM project, analyze the results and improve the statistical model to identify strain fluxes. It will require programming in python3, R, and snakemake, using git and R notebook for analysis reproducibility, and computational resources of the cluster of Migale platform).