Following the cooperation of three institutions: the University of Montpellier, the Centre National de Recherche Scientifique, and the Berger-Levrault company, the SoftScanner project is now underway. SoftScanner is a Model-Driven Approach for the Automatic Generation of Optimal Anchoring Configurations for Software System Execution Tracing Operations.
Context and Problem
Developers write “logging instructions” in the source code to extract useful information related to the behavior of the system at runtime. A logging statement such as alert/warn usually consists of an event recorded using static text and variables related to the context of the event (e.g., warn (“Unable to access hive” + root-path)). At runtime, invocation of these logging instructions generates logs that can be used for various software development/maintenance activities such as bug fixes, anomaly detection, test analysis results, and system monitoring.
In addition, the increasing size of applications and the growing need for these logs due to their great utility are leading developers to integrate large amounts of logging instructions into their source code. For example, the OpenSSH server contains 3407 logging instructions in its code base. This trend is further encouraged by the availability of log processing infrastructures such as Splunk and ELK stack which facilitate systematic analysis of collected log files.
An important issue that has arisen in relation to this need for logging is the difficulty in identifying an exact correspondence between the coverage of tracing operations and the logging objectives. An inaccurate match occurs in the following cases :
- An oversized blanket. This can be an oversized number of tracing instructions executed or an oversized amount of data collected.
- Partial coverage. Partial coverage returns only a sub-set of the data set needed to achieve the logging objective. Partial coverage may be the result of an insufficient set of tracing instructions or tracing operations returning only a subset of the required data set.
- Incorrect coverage. This type of coverage does not return the correct data needed (adapted) to meet the set logging objective. A coverage may be poorly adapted with respect to the tracing instructions chosen (not the right instructions) or with respect to the data collected (not the right data).
A second major problem with logging is the lack of flexibility in coverage. For example, a coverage put in place to satisfy one objective is difficult to adapt to satisfy another objective (e.g. if the original objectives change or if the objectives are not the same at different periods).
These logging problems can considerably reduce the usefulness of newspapers and/or make their use complex and require considerable costs (human expertise, set-up time, operating time, etc.). For example, oversized coverage leads to a deterioration in the performance of the application being logged; missing logging instructions in critical parts of the source code can prevent developers from having sufficient knowledge of how the system is running; misleading textual description in logging statements can lead to erroneous decisions being made by system operators; and the large amounts of information in the logs would prevent developers from identifying important information.
Proposal
Our objective for this proposal is to define a general approach and framework for the control of the software logging process. This includes 5 axes :
- Comprehensive tracing configuration definition
- Static generation of optimal run trace configurations.
- Dynamic generation of optimal runtime tracing configurations.
- Model-driven generation of trace configurations.
- Interaction via the visualization for the adaptation of tracing configurations.
For this project, we will only focus on Axis 1: Comprehensive tracing configuration definition.
The objective of this axis is to automate the process of generating run-time tracing configurations according to a given objective. A tracing configuration represents a list of pairs made up of a type of tracing instruction (examples: warn, info, error, etc) and a position in the source code where this instruction is to be anchored (add). To do this, it is necessary to identify the execution tracing objectives and to define the exhaustive set of tracing operations necessary to satisfy one or more objectives. The exhaustive set here means all the pairs (tracing instruction, tracing position) that allow to meet the need and without any reduction in the number of these pairs based on heuristics/techniques for optimizing their number. This goes through:
- The proposal of a model for refining the tracing objectives. This refinement will be based on the software quality feature refinement models proposed in the literature such as the ISO/IEC 9126 model, which proposes to successively refine the features into subfeatures, properties and metrics. As an example, among the objectives to be refined, we can cite the detection of security incidents, anomaly analysis or profiling (to capture user behavior).
- The objective-tracing association. It is a question of proposing for each sub-configuration of tracing allowing to answer the need for evaluation (measurement) for each sub-objective intervening in the refinement of a global objective such as the test or the analysis of anomaly. A tracing sub-configuration is a set of data collected by a set of tracing operations anchored in determined positions in the source code.
It should be noted that the tracing objective refinement model (inspired by the ISO/IEC 9126 characteristics model) successively refines each objective into sub objectives up to a level where the objective can be satisfied by a set of data that can be extracted by a tracing sub configuration. The set of tracing subconfigurations defined by successive refinement of a given tracing objective constitutes the exhaustive configuration to be associated with that objective.
As for the other axes, they will be the subject of another contract (an a priori thesis).