Dear Project Managers, imagine if you could regroup automatically the thousands of maintenance tickets by topic in just one click. It would be nice no? This would have tremendous implications in priorising the bug fixing, improvement requests, new features, etc. Thus, saving significant amount of time and making better decisions. Well, this is what we tried to achieve with the MOTUS project and thanks to Natural Language Processing techniques it works pretty well.
The term Automatic Natural Language Processing (ANLP) encompasses all research and development aimed at modeling and reproducing, with the help of machines, the human capacity to produce and understand linguistic statements for communication purposes. ANLP The concepts and techniques that automatic natural language processing uses are at the crossroads of multiple disciplinary fields: “traditional” AI, theoretical computer science, linguistics, but also statistics.
Distributional word representation models are the predominant paradigm for their modelling. This paradigm is based on the assumption that “words with similar distributions are semantically close”. As a result, words that are often found together in similar contexts and belong to different texts can be grouped in different clusters.
This project concerns the analysis of the case study “technical support and assistance”. In this work, we were interested in extracting topics from textual comments from Berger-Levrault’s support requests. The corpus of requests analyzed is that of the citizen relationship management tool. This corpus is not formatted and is not very structured with several speakers involved (the citizen and one or more support technicians). In this work, we conducted an experimental study based on the use of two systems. The first system applies an LDA (Latent Dirichlet Allocation), while the second combines the application of an LDA with the k-Means algorithm. We compared the results obtained with a sample of this corpus, annotated by an expert in the field. Our results show that we obtain a good quality classification, comparable to the one performed manually by an expert using an LDA/k-Means combination. Figure 1 describes the architecture of our approach.
An overview of the words in our body of work is described in the following: Figure 2 presents an analysis with WordCloud while Figure 3 proposes a visualization of the most frequent words.
Capturing higher level semantic topics is an essential task for a better understanding of texts (comments in our case). Formally, a topic is a group of keywords that can be intuitively considered as representing a latent semantic theme described in a text; these topics are calculated according to probability distributions on words in texts. Figure 4 shows the map returned by pyLDAvis on corpus data.
Finally, we show here the study carried out on support requests for the e.elections product but it should be noted that our theme extraction system can be used on support requests related to other products such as e.magnus or even e.sedit RH.