Classification of business documents to aid extraction: the use of typed and weighted lexical-semantic relations

Ontology network.

I’ve been a member of the Berger-Levrault research team since 2020. I’m working on a CIFRE thesis on methods and models for the automated construction of multi-domain ontology databases, in collaboration with the Montpellier Laboratory of Computer Science, Robotics and Microelectronics (LIRMM). This work was presented at the PFIA (Plate-forme Intelligence Artificielle) 2023 in Strasbourg last July.

Legend has it that it all began with the story of BerLo, a knight who had accumulated numerous ancient books containing tales, legends and observations on mythical creatures! But the information was scattered, unstructured and very difficult to exploit… One day, his path crossed with the great sage Librarian LITEX, who explained to him the importance of organizing raw texts into structured knowledge.

Chevalier Berlo convinced the wise man to send his apprentice Camilléa to help him build up an organized, usable knowledge base on the Kingdom’s mythical creatures. The young Camilléa went to the Royal Library and spent long hours poring over books and articles on mythical creatures. Then she set about transforming her texts into organized, interrelated information. She began by identifying key entities such as creature names, characteristics, powers and associated stories. Then she determined the creatures’ regions, filtering this initial knowledge base by dividing them according to their natural habitat: forest, mountain and sea. Thanks to Camilléa, the Chevalier BerLo finds himself equipped with a cartography both rich in knowledge and easily exploitable, enabling him to advance in his quest for knowledge of the world while targeting his research.

BerLo actually stands for Berger-Levrault. As part of our day-to-day commitment to supporting local authorities and their users in the digital transformation of society, we use our database of legal and practical texts. However, this database is so extensive that it remains difficult to use. To solve this problem, I have been conducting doctoral applied research for just over 2 years.

To improve the quality of the relations in the knowledge base, we have chosen to determine the domain of the texts containing these relations from among the eight Berger-Levrault domains, namely: Civil Status and Cemeteries, Elections, Public Procurement, Urban Planning, Local Accounting and Finance, Territorial Human Resources, Justice and Health.

    This work is part of a large-scale, unbiased evaluation project designed to compare the various models available on the market!

            More ...

            Scroll to Top