Fraudulent behaviors can be detected based on the applicative traces. In this work, we used application traces from BL.Enfance, one of our software dedicated to children’s activities billing and management. We specifically focused on detecting the modification of variables such as the family quotient (Q-CAF) and period of billing, which can be modified to change invoices’ amounts.
BL.Enfance Fraud Case & Traces’ Structure
Among the possible fraud cases in the BL.Enfance application, «fraud over the CAF quotient » is the easiest to handle with. It consists of the alteration of a CAF quotient over an already completed billing period (for the same payer). This fraud scenario is generally carried out through these three use cases:
- UC1: « giving a Q-CAF value over a specific period »
- UC2: « bill calculation over the same period »
- UC3: « modification of the given Q-CAF for the previously calculated period »
To detect the execution of each of these three use case, we relied on four specific events, which are:
- CREATION_QUOTIENT : The modification of a CAF quotient
- SUPPRESSION_QUOTIENT: The deletion of a CAF quotient
- FACTURATION_CALCUL_FACTURE_INDIVIDUELLE: The computing of one individual bill
- FACTURATION_SUPPRESSION_FACTURE_INDIVIDUELLE: the deletion of one individual bill
Fraud detection as a Time-Serie Analysis
Before getting into the proposed fraud detection prototype, the traces have to pass through two phases first:
Traces extraction phase : Events for which the “action” field has one of the following values:
The extraction is done from the mongoDB dataset.
Anonymization phase : The anonymization concerns the fields:
Use of an SHA 2-256 hash function (reproducible and non-invertible).
We also need to consider the user who initiated the actions to see if the activities of altering the family quotient and that of invoicing in the suspicious case come from the same user or not and the durations between actions to analyze the time between the alteration of the family quotient and the billing which should be short in a suspicious case.
A Simple 5 Steps Algorithm
Step 1: reading of anonymized data and construction of a correspondence table: payer => ordered list of events related to this payer:
Step 2: construction of the activity periods invoiced from the events: FACTURATION_CALCUL_FACTURE_INDIVIDUELLE and FACTURATION_SUPPRESSION_FACTURE_INDIVIDUELLE
Step 3: construction of the periods subject to the Q-CAF quotients with the events: CREATION_QUOTIENT, SUPPRESSION_QUOTIENT
Step 4: Overlay of the periods.
Step 5: Detection of suspicious contexts according to the order of actions performed in the software (for the same payer).
Results and statistics
Below we provide with some quick numbers about the results on the production database in BL.Enfance.
- Total number of traces: 35466
- Number of payers: 25204
- Number of activities per payer: 1,41 activity/ payer
- Frequency of payer appearance: How many payer has number of activities X
Conclusion and future works:
- To conclude we can say that we still have to work on an other type of billing that BLEnfance offers as well which is the « grouped billing »
- This type of billing is the one where we can find more possible fraud cases in real life
- After doing this second step we are planning to generalize the fraud detection prototype using some machine learning technics or maybe using some graph navigation tools
- But until this hour, we didn’t get access to the traces with the labels that should allow us to continue working
Until getting the needed traces we investigated other “historical” fields. In the “CAF data” and “additional accounting data” form, we have identified 8 fields that may be a candidate for fraud detection variable: