Classification of business documents to aid extraction: the use of typed and weighted lexical-semantic relations

October 3, 2023

Natural Language Processing

2 minutes read

I’ve been a member of the Berger-Levrault research team since 2020. I’m working on a CIFRE thesis on methods and models for the automated construction of multi-domain ontology databases, in collaboration with the Montpellier Laboratory of Computer Science, Robotics and Microelectronics (LIRMM). This work was presented at the PFIA (Plate-forme Intelligence Artificielle) 2023 in Strasbourg last July.

Legend has it that it all began with the story of BerLo, a knight who had accumulated numerous ancient books containing tales, legends and observations on mythical creatures! But the information was scattered, unstructured and very difficult to exploit… One day, his path crossed with the great sage Librarian LITEX, who explained to him the importance of organizing raw texts into structured knowledge.

Chevalier Berlo convinced the wise man to send his apprentice Camilléa to help him build up an organized, usable knowledge base on the Kingdom’s mythical creatures. The young Camilléa went to the Royal Library and spent long hours poring over books and articles on mythical creatures. Then she set about transforming her texts into organized, interrelated information. She began by identifying key entities such as creature names, characteristics, powers and associated stories. Then she determined the creatures’ regions, filtering this initial knowledge base by dividing them according to their natural habitat: forest, mountain and sea. Thanks to Camilléa, the Chevalier BerLo finds himself equipped with a cartography both rich in knowledge and easily exploitable, enabling him to advance in his quest for knowledge of the world while targeting his research.

BerLo actually stands for Berger-Levrault. As part of our day-to-day commitment to supporting local authorities and their users in the digital transformation of society, we use our database of legal and practical texts. However, this database is so extensive that it remains difficult to use. To solve this problem, I have been conducting doctoral applied research for just over 2 years.

To improve the quality of the relations in the knowledge base, we have chosen to determine the domain of the texts containing these relations from among the eight Berger-Levrault domains, namely: Civil Status and Cemeteries, Elections, Public Procurement, Urban Planning, Local Accounting and Finance, Territorial Human Resources, Justice and Health.

This work is part of a large-scale, unbiased evaluation project designed to compare the various models available on the market!

More ...

Innovation

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Classification of business documents to aid extraction: the use of typed and weighted lexical-semantic relations

More ...

Not all Hallucinations are Good to Throw Away When it Comes to Legal Abstractive Summarization

Improved comment generation by LLMs

Information extraction from Berger-Levrault books and articles

What are the evolutions of this law? Between abstraction and hallucination in the field of the summary of legal texts

Classification of business documents to aid extraction: the use of typed and weighted lexical-semantic relations

More ...

Not all Hallucinations are Good to Throw Away When it Comes to Legal Abstractive Summarization

Improved comment generation by LLMs

Information extraction from Berger-Levrault books and articles

What are the evolutions of this law? Between abstraction and hallucination in the field of the summary of legal texts

Start typing and press enter to search