The aim of this thesis is to develop and analyse Bayesian nonparametric models to explore diversity in metagenomic data. This involves 1) furthering the knowledge about the fundamental properties of existing Bayesian nonparametric processes, 2) using them as building blocks to develop flexible models for noisy and high-dimensional data and 3) designing efficient and scalable inference algorithms, via parallelisation, optimisation and/or careful approximations.