Mathématiques et Informatique Appliquées
du Génome à l'Environnement

 

 

Lundi 1er juin 2026

Titre
Mining biodiversity literature for arthropod organismal traits with machine learning
Nom intervenant
Joseph Cornelius
Organisme intervenant (ou équipe pour les séminaires internes)
University of Applied Sciences and Arts of Southern Switzerland; invité par Robert B
Lieu
Salle de réunion 142, bâtiment 210
Date du jour
Résumé

The fields of taxonomy and biodiversity research have witnessed an exponential growth in published literature, holding vast information on the diverse biological traits of organisms and their ecologies. However, access to and extraction of relevant data from this extensive resource remain challenging. Advances in text and data mining (TDM) and Natural Language Processing (NLP) techniques offer new opportunities for liberating such information, and testing these approaches to annotate articles in machine-actionable formats is necessary to enable the exploitation of existing knowledge in biology, ecology and evolution research. Here, we explore the potential of these methods to annotate and extract organismal trait data for the most diverse animal group on Earth, the arthropods. Using manually curated trait dictionaries with trained NLP models, we evaluate performance in entity recognition, normalisation and relationship extraction, highlighting several important technical challenges. The results are made available through the ArTraDB Arthropod Trait Database. These methodological explorations provide a framework that could be extended beyond the arthropods, where TDM and NLP approaches will greatly facilitate data synthesis studies, the identification of knowledge gaps and biases, and the data-informed investigation of ecological and evolutionary trends and patterns.