The EGC conference (Extraction et Gestion des Connaissances – Knowledge Extraction and Management) is an annual event reuniting searchers and practitioners of data science and knowledge fields (e.g: machine learning, web semantic and open data, etc.). This year was the 22nd edition which was held in Blois (France) from January 24th to January 28th.
The event was organized with two school days about the main topic: Human in the loop of data mining and learning for students and Ph.D who wanted to enrich their knowledge in the domain, and three days of conferences.
Ikram Boukharouba, Camille Gosset and Nicolas Ringuet, who are Ph.D students in the DRIT went to the event to represent Berger-Levrault and they took part to both, school days and conference days.
School days were made of introductory tutorials around several problematic, school topics and new techniques:
- “How to model a clustering task in order to obtain a result as close as possible to what the user wants?” Presented by Professor Christel Vrain from Orléans University.
- In the continuity, Professor Bruno Crémilleux from Caen Normandie University broached the problem of searching extensively for patterns in data mining and the utility of finding constraint-based patterns to be user-centric.
- Vanessa Murdock, Applied Science Research Director for Amazon’s product “Alexa”, explained how Amazon detects “offensive contexts” to recommend a product to customers.
- Patrick Marcel et Veronika Peralta – (LIFAT, Tours University) about “Exploratory data analysis: from insights to storytelling”
- Sarah Cohen-Boulakia (LISN, Paris-Saclay University) explained the importance of the problem of reproducibility in research through her presentation “Reproducibility of bioinformatics data analysis pipelines”
- Sihem Amer-Yahia (LIG, Grenoble Alpes University) – “Humans in Online Labor Markets”
During the conference days, the persons who published have presented their work.
Camille Gosset presented her publication: Creation and validation of vector representations for lexicosemantics relationships: application to the identification/classification of relationships from a business corpus. Camille Gosset, Mokhtar Boumedyen Billami, Mathieu Lafourcade, Christophe Bortolaso and Mustapha Derras. A comparative study on vector representations for lexicosemantics relationships on two systems: (1) ours, where embedding vectors relationships are deducted from embedding vector terms, and (2) RELATIVE (Camacho-Collados et al., 2019), a state of the art system where embedding vectors relationships are learned. We propose embedding vectors for key terms (several words and expressions) and not for a single word. If you want to learn more about it, follow this link.
As often, machine learning has been broached in two lines: supervised methods and unsupervised methods.
A domain presented was about textual data, in which Berger-Levrault is more and more interested. It’s about setting up a system to treat texts presented with a different style (e.g: tweets, legal texts etc.).
The other main fields explored were temporal data (how to treat data that come to us over time), interaction and multimodality (how to automatically recognize emotions), and AI emerging applications (e.g: Bitcoin).
The keynotes were presented by:
- Siham Amer-Yahia : Research Director at CNRS, Grenoble.
- Vanessa Murdock : Applied Science Research Director for Amazon
- Sonia Ben Mokhtar : Research Director at CNRS, LIRIS Laboratory
- Sarah Cohen-Boulakia : Professor at Paris-Saclay University
- Frédérique Segond : Inria Defense and Security Mission Director and ERTIM team associated searcher at Oriental languages and cultures National Institute
- Maya Ramanath : Associate Professor, Department of Computer Science and Engineering, IIT Delhi
As you can observe, all conference’s guests were women, when inclusion and gender equality are society current topics. A round table has also been held on the subject: “Are we diverse and inclusive” exploring how we can be more diverse and inclusive.