Titre du projet
General Open Miner for Knowledge Extraction and Interaction
Nom de l'appel d'offre
PRCI
Agence de moyen
ANR
Etat
Présélectionné
Année de soumission
2024
Défi/axe ANR
Intelligence artificielle et science des données
Equipe(s)
Bibliome
StatInfOmics
Coordinateur.trice
A. Ferré
Participants de MaIAGE
L. Deléger, S. Derozier
Partenaires (hors MaIAGE)
RALI (Montréal, Canada)
Année de démarrage - Année de fin de projet
2025-2027
Date de fin du projet
Résumé
This project aims to develop an Open Information Extraction platform capable of extracting structured knowledge from unstructured texts, in both French and English, without being limited to any specific domain of application. The platform will be designed to adapt to any extraction task, regardless of the objectives or types of texts being processed, thereby facilitating the integration and exploitation of knowledge across various sectors (e.g. microbiology).
The value of such a platform in the age of conversational assistants like ChatGPT lies in its ability to produce directly exploitable databases. While a conversational assistant can provide responses to natural language queries, it remains limited when it comes to structuring and managing large quantities of factual data. An extraction platform would generate enriched knowledge bases, allowing professionals to directly manipulate the extraction results. This ability to structure information beyond mere conversational interaction is crucial in fields such as social sciences or life sciences, where extracted data must be reused, shared, or interfaced with other analytical processes or information systems.
With this platform, non-expert users will be able to automatically launch extraction methods tailored to their specific needs. The interface, designed to be user-friendly and accessible, will allow them to focus on manipulating and interpreting the extracted data, while the system will automatically adapt and execute extraction pipelines based on recent technologies, such as language models and neural classifiers.
Current technologies, especially large language models and deep learning algorithms, have reached a level of maturity that enables the development of generic and robust solutions. This ensures that the platform can be effectively applied across any domain while maintaining relatively high performance, regardless of the extraction tasks involved.
The value of such a platform in the age of conversational assistants like ChatGPT lies in its ability to produce directly exploitable databases. While a conversational assistant can provide responses to natural language queries, it remains limited when it comes to structuring and managing large quantities of factual data. An extraction platform would generate enriched knowledge bases, allowing professionals to directly manipulate the extraction results. This ability to structure information beyond mere conversational interaction is crucial in fields such as social sciences or life sciences, where extracted data must be reused, shared, or interfaced with other analytical processes or information systems.
With this platform, non-expert users will be able to automatically launch extraction methods tailored to their specific needs. The interface, designed to be user-friendly and accessible, will allow them to focus on manipulating and interpreting the extracted data, while the system will automatically adapt and execute extraction pipelines based on recent technologies, such as language models and neural classifiers.
Current technologies, especially large language models and deep learning algorithms, have reached a level of maturity that enables the development of generic and robust solutions. This ensures that the platform can be effectively applied across any domain while maintaining relatively high performance, regardless of the extraction tasks involved.