TANG Anfu

PhD student

Sujet

Extraction of relational information from text in specific domain - adaptability and scalability

Date de début

jeu 01/10/2020 - 12:00

Date de fin

sam 30/09/2023 - 12:00

Encadrant(s)

C. Nédellec, L. Deléger, P. Zweigenbaum

Bibliome

Contrat de recherche

DigiCosme

Description/résumé

This thesis addresses the extraction of relational information from scientific documents in Life Sciences, i.e. transforming unstructured text into machine-readable structured information. The extraction of semantic relationships between entities detected in text makes explicit and formalizes the underlying structures. Current state-of-the art methods rely on supervised machine learning. Supervised learning, and even more so recent deep learning methods, require many training examples that are costly to produce, all the more in specific domains such as Life Sciences. We hypothesize that combining information and knowledge available in specific domains with the latest deep learning word embedding models can offset the absence or limited amount of annotated training data. For this purpose, the thesis will design a rich representation of texts that draws both from linguistic information obtained from syntactic parsing and domain knowledge obtained from knowledge graphs such as ontologies. Integrating ontologies in the information extraction process will additionally facilitate information integration with other data, such as experimental or analytical data.

Ecole doctorale (pour les thèses)

ED STIC

Directeur.trice (pour les thèses)

A. Denise

Année de soutenance (pour les thèses ou les stages)

2023

Date de soutenance (pour les thèses)

mer 06/12/2023 - 12:00

Ecole/université (pour les thèses et les stages)

Université Paris-Saclay

Mathématiques et Informatique Appliquéesdu Génome à l'Environnement

Mathématiques et Informatique Appliquées
du Génome à l'Environnement