Publications

You can also find my articles on my Google Scholar profile and on HAL.

CasinoLimit: An Offensive Dataset Labeled with MITRE ATT&CK Techniques

Published in 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2025), 2025

Cybersecurity exercises are a common way to train and evaluate the skills of cybersecurity professionals. These exercises also provide a unique opportunity to generate datasets with realistic attack traces on non-sensitive systems. Nevertheless, the collected logs are unlabeled, and deciding which logs are related to pentesters is a difficult problem. In this paper, we present a novel methodology to label efficiently both system and network logs using MITRE ATT&CK techniques. To demonstrate the effectiveness of our approach, we introduce CL, a dataset generated from a pentest exercise that has been played by 114 participants where we collected 540 Go of attack data. We apply our methodology to accurately label these logs with a semiautomatic approach: labels are inferred from the shell sessions and propagated to the network sessions, and eventually corrected by a junior analyst. An expert analyst has manually reviewed all the labels that have been computed to ensure the quality of the labeling process. The results of the pentest exercise are deeply discussed. We show the variability of players’ behaviors and that players can be distinguished by their command line habits. In addition, the high level of granularity of labels coupled with the number of participants enables multiple other applications. With this paper, we release the full dataset and the associated labeling tool, MANATEE, which can be used to browse the logs and labels. To support the generalization of our approach, we made it possible to load other datasets with this tool.

Recommended citation: Sebastian Killian, et al.. CasinoLimit: An Offensive Dataset Labeled with MITRE ATT&CK Techniques. The 28th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2025)

ROSCA: Robust and Scalable Security Alert Correlation and Prioritisation using the MITRE ATT&CK Framework

Published in First International Workshop on Analytics, Telemetry, and Cybersecurity for HPCC (High Performance Computing and Communications) (WATCH2025), 2025

In large organisations and complex infrastructures, the overwhelming volume of security alerts often results in analyst fatigue, delayed responses, and missed attacks. Moreover, Security Operations Centers (SOCs) typically rely on commercial solutions that operate as black boxes, offering limited transparency into their alert classification mechanisms and lacking the flexibility for in-house adaptation or re-implementation. To address these limitations and improve situational awareness, this paper proposes ROSCA, an efficient alert prioritisation method grounded in the MITRE ATT&CK kill chain model. The proposed approach automatically aggregates and correlates alerts based on their shared attributes, enabling the construction of contextualised cases. Each case is then assigned a score reflecting its threat relevance and urgency, before being presented to analysts within a prioritised queue. Our method emphasises the detection of multi-stage attack patterns and supports rapid processing through a robust, noise-tolerant scoring mechanism designed for interpretability and operational integration. We validate the effectiveness of ROSCA using real-world alert data and compare it against the MATE framework, demonstrating superior prioritisation accuracy and more reliable identification of critical alerts.

Recommended citation: Garcia, R., Lahmadi, A., Gimenez, P. F. & Sala, C. ROSCA: Robust and Scalable Security Alert Correlation and Prioritisation using the MITRE ATT&CK Framework. First International Workshop on Analytics, Telemetry, and Cybersecurity for HPCC (High Performance Computing and Communications) (WATCH2025)

Superviz25-SQL: High-Quality Dataset to Empower Unsupervised SQL Injection Detection Systems

Published in ESORICS 2025 International Workshops, 2025

The digitalization of public and private services led to more sophisticated and serious cybersecurity threats. Among them, SQL injection attacks leverage user inputs to remotely execute malicious actions on a database, such as data exfiltration and deletion, or privilege escalation. They are regularly classified as one of the most prominent threats to web services. Intrusion detection systems are widely used to detect such injection attacks and react to them, but it is difficult to assess their actual effectiveness and compare them because of a lack of high-quality datasets. Current SQL injection datasets lack structural diversity, are poorly documented, and generated samples are not representative of real-world infrastructures. This article presents a new dataset, called S25-SQL, whose design is structured around four quality dimensions: realism, diversity, benchmarking capabilities and the presence of good documentation. Nine novelty-detection baselines leveraging standard feature extraction techniques and machine learning algorithms are provided, serving as reference points for future SQL injection detection techniques.

Recommended citation: Quetel, G., Alata, A., Gimenez, P. F., Pautet, L. & Robert, T., (2025). SuperviZ25-SQL: High-Quality Dataset to Empower Unsupervised SQL Injection Detection Systems. In the ESORICS 2025 International Workshops.

Synthetic Network Traffic Generation for Intrusion Detection Systems: a Systematic Literature Review

Published in ESORICS 2025 International Workshops, 2025

Network data can be difficult to collect due to privacy and confidentiality reasons. For these reasons, network datasets are typically created with controlled environments called testbeds. However, these datasets are regularly criticized for their limited size, class imbalance, obsolescence, and lack of actual user activity. Following the rapid development of generative artificial intelligence, new methods have been applied to synthetic network traffic generation without emulation or simulation. This systematic literature review assesses the current state of synthetic network traffic generation: why these data are generated, what AI models are used, what kind of data is generated, and how these generated methods are evaluated. We derive from this review a set of open problems and recommendations for researchers and industry practitioners towards fast and high-quality network data.

Recommended citation: Gimenez, P. F., (2025). Synthetic Network Traffic Generation for Intrusion Detection Systems: a Systematic Literature Review. In the ESORICS 2025 International Workshops.

Certifiably robust malware detectors by design

Published in 40th International Conference on ICT Systems Security and Privacy Protection (IFIPSEC25), 2025

Malware analysis involves analyzing suspicious software to detect malicious payloads. Static malware analysis, which does not require software execution, relies increasingly on machine learning techniques to achieve scalability. Although such techniques obtain very high detection accuracy, they can be easily evaded with adversarial examples where a few modifications of the sample can dupe the detector without modifying the behaviour of the software. Unlike other domains, such as computer vision, creating an adversarial example for malware without altering its functionality requires specific transformations. We propose a new model architecture that can lead to certifiably robust malware detection by design. In addition, we show that every robust detector can be decomposed into a specific structure, which can be applied to learn empirically robust malware detectors, even on fragile features. Our framework ERDALT is based on this structure. We compare and validate these approaches with machine-learning-based malware detection methods, allowing for robust detection with limited reduction in detection performance.

Recommended citation: Gimenez, P. F., Sivaprasad, S. & Fritz, M., (2025 May). Certifiably robust malware detectors by design. In the 40th International Conference on ICT Systems Security and Privacy Protection (IFIPSEC25)

Towards programming languages free of injection-based vulnerabilities by design

Published in The 11th Workshop on Language-theoretic Security and Applications (LangSec25), 2025

Many systems are controlled via commands built upon user inputs. For systems that deal with structured commands, such as SQL queries, XML documents, or network messages, such commands are generally constructed in a “fill-in-the-blank” fashion: the user input is concatenated with a fixed part written by the developer (the template). However, the user input can be crafted to modify the command’s semantics intended by the developer and lead to the system’s malicious usages. Such an attack, called an injection-based attack, is considered one of the most severe threat to web applications. Solutions to prevent such vulnerabilities exist but are generally ad hoc and rely on the developer’s expertise and diligence. Our approach addresses these vulnerabilities from the formal language theory’s point of view. We formally define two new security properties. The first one, “intent-equivalence”, guarantees that a developer’s template cannot lead to malicious injections. The second one, “intent-security”, guarantees that every possible template is intent-equivalent, and therefore that the programming language itself is secure. From these definitions, we show that new design patterns can help create programming languages that are secure by design.

Recommended citation: Alata, E. & Gimenez, P. F., (2025 May). Towards programming languages free of injection-based vulnerabilities by design. In the 11th Workshop on Language-theoretic Security and Applications (LangSec25).

TADAM: Learning Timed Automata From Noisy Observations

Published in SIAM International Conference on Data Mining (SDM25), 2025

Timed Automata (TA) are formal models capable of representing regular languages with timing constraints, making them well-suited for modeling systems where behavior is driven by events occurring over time. Most existing work on TA learning relies on active learning, where access to a teacher is assumed to answer membership queries and provide counterexamples. While this framework offers strong theoretical guarantees, it is impractical for many real-world applications where such a teacher is unavailable. In contrast, passive learning approaches aim to infer TA solely from sequences accepted by the target automaton. However, current methods struggle to handle noise in the data, such as symbol omissions, insertions, or permutations, which often result in excessively large and inaccurate automata. In this paper, we introduce TADAM, a novel approach that leverages the Minimum Description Length (MDL) principle to balance model complexity and data fit, allowing it to distinguish between meaningful patterns and noise. We show that TADAM is significantly more robust to noisy data than existing techniques, less prone to overfitting, and produces concise models that can be manually audited. We further demonstrate its practical utility through experiments on real-world tasks, such as network flow classification and anomaly detection.

Recommended citation: Cornanguer, L. & Gimenez, P. F., (2025 May). TADAM: Learning Timed Automata From Noisy Observations. In the SIAM International Conference on Data Mining (SDM25).

Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation

Published in arXiv, 2025

The literature and multiple experts point to many potential risks from large language models (LLMs), but there are still very few direct measurements of the actual harms posed. AI risk assessment has so far focused on measuring the models’ capabilities, but the capabilities of models are only indicators of risk, not measures of risk. Better modeling and quantification of AI risk scenarios can help bridge this disconnect and link the capabilities of LLMs to tangible real-world harm. This paper makes an early contribution to this field by demonstrating how existing AI benchmarks can be used to facilitate the creation of risk estimates. We describe the results of a pilot study in which experts use information from Cybench, an AI benchmark, to generate probability estimates. We show that the methodology seems promising for this purpose, while noting improvements that can be made to further strengthen its application in quantitative AI risk assessment.

Recommended citation: Malcolm Murray, Henry Papadatos, Otter Quarks, Pierre-François Gimenez, Simeon Campos. Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation. https://doi.org/10.48550/arXiv.2503.04299

FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining

Published in 20th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2024

Network traffic datasets are regularly criticized, notably for the lack of realism and diversity in their attack or benign traffic. Generating synthetic network traffic using generative machine learning techniques is a recent area of research that could complement experimental test beds and help assess the efficiency of network security tools such as network intrusion detection systems. Most methods generating synthetic network flows disregard the temporal dependencies between them, leading to unrealistic traffic. To address this issue, we introduce FlowChronicle, a novel synthetic network flow generation tool that relies on pattern mining and statistical models to preserve temporal dependencies. We empirically compare our method against state-of-the-art techniques on several criteria, namely realism, diversity, compliance, and novelty. This evaluation demonstrates the capability of FlowChronicle to achieve high-quality generation while significantly outperforming the other methods in preserving temporal dependencies between flows. Besides, in contrast to deep learning methods, the patterns identified by FlowChronicle are explainable, and experts can verify their soundness. Our work substantially advances synthetic network traffic generation, offering a method that enhances both the utility and trustworthiness of the generated network flows.

Recommended citation: Cüppers, J., Schoen, A., Blanc, G. & Gimenez, P. F., (2024, December). FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining Generation. In the 20th International Conference on emerging Networking EXperiments and Technologies (CoNEXT). https://dl.acm.org/doi/10.1145/3696407

Learning Conditional Preference Networks: an Approach Based on the Minimum Description Length Principle

Published in IJCAI International Joint Conference on Artificial Intelligence, 2024

CP-nets are a very expressive graphical model for representing preferences over combinatorial spaces. They are particularly well suited for settings where an important task is to compute the optimal completion of some partially specified alternative; this is, for instance, the case of interactive configurators, where preferences can be used at every step of the interaction to guide the decision maker towards a satisfactory configuration. Learning CP-nets is challenging when the input data has the form of pairwise comparisons between alternatives. Furthermore, this type of preference data is not commonly stored: it can be elicited but this puts an additional burden on the decision maker. In this article, we propose a new method for learning CP-nets from sales history, a kind of data readily available in many e-commerce applications. The approach is based on the minimum description length (MDL) principle. We show some theoretical properties of this learning task, namely its sample complexity and its NP-completeness, and we experiment with this learning algorithm in a recommendation setting with real sales history from a car maker.

Recommended citation: Gimenez, P. F., & Mengin, J. (2024). Learning Conditional Preference Networks: an Approach Based on the Minimum Description Length Principle. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).

A Tale of Two Methods: Unveiling the limitations of GAN and the Rise of Bayesian Networks for Synthetic Network Traffic Generation

Published in 9th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2024), 2024

The evaluation of network intrusion detection systems requires a sufficient amount of mixed network traffic, i.e., composed of both malicious and legitimate flows. In particular, obtaining realistic legitimate traffic is hard. Synthetic network traffic is one of the tools to respond to insufficient or incomplete real-world datasets. In this paper, we only focus on synthetically generating high-quality legitimate traffic and we do not delve into malicious traffic generation. For this specific task, recent contributions make use of advanced machine learning-driven approaches, notably through Generative Adversarial Networks (GANs). However, evaluations of GAN-generated data often disregards pivotal attributes, such as protocol adherence. Our study addresses the gap by proposing a comprehensive set of metrics that assess the quality of synthetic legitimate network traffic. To illustrate the value of these metrics, we empirically compare advanced network-oriented GANs with a simple and yet effective probabilistic generative model, Bayesian Networks (BN). According to our proposed evaluation metrics, BN-based network traffic generation outperforms the state-of-the-art GAN-based opponents. In our study, BN yields substantially more realistic and useful synthetic benign traffic and minimizes the computational costs simultaneously.

Recommended citation: Schoen, A., Blanc, G., Gimenez, P. F., Han, Y., Majorczyk, F., & Mé, L. (2024). A Tale of Two Methods: Unveiling the limitations of GAN and the Rise of Bayesian Networks for Synthetic Network Traffic Generation. In Proceedings of the 9th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2024).

A Parser-Based Data Collector for Intrusion Detection

Published in RESSI 2024-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, 2024

Intrusion detection systems often struggle to identify attacks directed at applications. A contributing factor is the various syntactical forms these attacks can take. This paper introduces a methodology to design and adapt applicative data collectors (DCs) to software projects by integrating them into the application’s parsers. This data collector aims to enhance applications’ security by providing semantic information to intrusion detection mechanisms.

Recommended citation: Grégor Quétel, Eric Alata, Pierre-François Gimenez, Laurent Pautet, Thomas Robert. A Parser-Based Data Collector for Intrusion Detection. Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, May 2024, Eppe-Sauvage, France. https://hal.science/hal-04871463

Survey on system-level graph-based and anomaly-based intrusion detection

Published in RESSI 2024-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, 2024

Intrusion Detection Systems (IDS) are tools for monitoring a system, in order to identify potential malicious activities within it. This survey presents an analysis of graph-based anomaly-based IDS at system level. We also present open issues regarding those IDS, and propose a taxonomy of suitable features to compare them.

Recommended citation: Dijoud, F., Gimenez, P. F., Hurfin, M., Majorczyk, F., & Pilastre, B. (2024, May). Survey on system-level graph-based and anomaly-based intrusion detection. In RESSI 2024-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information (pp. 1-2). https://inria.hal.science/hal-04714325/document

Towards Understanding Alerts raised by Semi-supervised Network Intrusion Detection Systems

Published in 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023), 2023

The use of Machine Learning for anomaly detection in cyber security-critical applications, such as intrusion detection systems, has been hindered by the lack of explainability. Without understanding the reason behind anomaly alerts, it is too expensive or impossible for human analysts to verify and identify cyber-attacks. Our research addresses this challenge and focuses on semi-supervised network intrusion detection, where only benign network traffic is available for training the detection model. We propose a novel post-hoc explanation method, called AE-pvalues, which is based on the p-values of the reconstruction errors produced by an Auto-Encoder-based anomaly detection method. Our work identifies the most informative network traffic features associated with an anomaly alert, providing interpretations for the generated alerts. We conduct an empirical study using a large-scale network intrusion dataset, CICIDS2017, to compare the proposed AE-pvalues method with two state-of-the-art baselines applied in the semi-supervised anomaly detection task. Our experimental results show that the AE-pvalues method accurately identifies abnormal influential network traffic features. Furthermore, our study demonstrates that the explanation outputs can help identify different types of network attacks in the detected anomalies, enabling human security analysts to understand the root cause of the anomalies and take prompt action to strengthen security measures.

Recommended citation: Maxime Lanvin, Pierre-François Gimenez, Yufei Han, Frédéric Majorczyk, Ludovic Mé, et al.. Towards Understanding Alerts raised by Unsupervised Network Intrusion Detection Systems. The 26th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2023) https://dl.acm.org/doi/abs/10.1145/3607199.3607247

Conditionally Acyclic CO-Networks for Efficient Preferential Optimization

Published in 26th European Conference on Artificial Intelligence (ECAI 2023), 2023

This paper focuses on graphical models for modelling preferences in combinatorial space and their use for item optimisation. The preferential optimisation task seeks to find the preferred item containing some defined values, which is useful for many recommendation settings in e-commerce. We show that efficient (i.e., with polynomial time complexity) preferential optimisation is achieved with a subset of cyclic CP-nets called conditional acyclic CP-net. We also introduce a new graphical preference model, called Conditional-Optimality networks (CO-networks), that are more concise than conditional acyclic CP-nets and LP-trees but have the same expressiveness with respect to optimisation. Finally, we empirically show that preferential optimisation can be used for encoding alternatives into partial instantiations and vice versa, paving the way towards CO-nets and CP-nets unsupervised learning with the minimal description length (MDL) principle.

Recommended citation: Gimenez, Pierre-François & Mengin, Jérôme. (2023). Conditionally Acyclic CO-Networks for Efficient Preferential Optimization. 10.3233/FAIA230352. https://www.researchgate.net/publication/374310696

BAGUETTE: Hunting for evidence of malicious behavior in dynamic analysis reports

Published in 20th International conference on security and cryptography SECRYPT 2023, 2023

Malware analysis consists of studying a sample of suspicious code to understand it and producing a representation or explanation of this code that can be used by a human expert or a clustering/classification/detection tool. The analysis can be static (only the code is studied) or dynamic (only the interaction between the code and its host during one or more executions is studied). The quality of the interpretation of a code and its later detection depends on the quality of the information contained in this representation. To date, many analyses produce voluminous reports that are difficult to handle quickly. In this article, we present BAGUETTE, a graph-based representation of the interactions of a sample and the resources offered by the host system during one execution. We explain how BAGUETTE helps automatically search for specific behaviors in a malware database and how it efficiently assists the expert in analyzing samples.

Recommended citation: Vincent Raulin, Pierre-François Gimenez, Yufei Han, Valérie Viet Triem Tong. BAGUETTE: Hunting for Evidence of Malicious Behavior in Dynamic Analysis Reports. 20th International conference on security and cryptography SECRYPT 2023, Jul 2023, Rome, Italy. https://hal.science/hal-04102144/

Errors in the CICIDS2017 dataset and the significant differences in detection performances it makes

Published in CRiSIS 2022 17th International Conference on Risks and Security of Internet and Systems, 2022

Among the difficulties encountered in building datasets to evaluate intrusion detection tools, a tricky part is the process of labelling the events into malicious and benign classes. The labelling correctness is paramount for the quality of the evaluation of intrusion detection systems but is often considered as the ground truth by practitioners and is rarely verified. Another difficulty lies in the correct capture of the network packets. If it is not the case, the characteristics of the network flows generated from the capture could be modified and lead to false results. In this paper, we present several flaws we identified in the labelling of the CICIDS2017 dataset and in the traffic capture, such as packet misorder, packet duplication and attack that were performed but not correctly labelled. Finally, we assess the impact of these different corrections on the evaluation of supervised intrusion detection approaches.

Recommended citation: Lanvin, M., Gimenez, P. F., Han, Y., Majorczyk, F., Mé, L., & Totel, E. (2022, December). Errors in the CICIDS2017 dataset and the significant differences in detection performances it makes. In CRiSIS 2022-International Conference on Risks and Security of Internet and Systems. https://link.springer.com/chapter/10.1007/978-3-031-31108-6_2

Explainable artificial intelligence for cybersecurity: a literature survey

Published in Annals of Telecommunications, 2022

Artificial intelligence (AI) and machine learning (ML) are increasingly becoming essential in the development of cybersecurity solutions, with deep learning (DL) algorithms being extensively applied in recent years, e.g., for detecting Android malware or vulnerable source code. However, sharing the same fundamental limitation with other DL application domains, such as computer vision and natural language processing, AI-based cybersecurity solutions lack the capability of justifying the results (ranging from detection and prediction to reasoning and decision-making) and making them human-understandable. As a result, explainable AI (XAI) has emerged as a paramount topic addressing the related challenges of making AI models explainable or interpretable to human users. It is particularly relevant in the cybersecurity domain, in that XAI may allow security operators, who are overwhelmed with tens of thousands of security alerts per day (most of which are false positives) per day, to better assess the potential threats and reduce alert fatigue. With such a strong motivation, we conduct an extensive literature review on the intersection between XAI and cybersecurity. Particularly, we investigate the academic literature from two perspectives: the applications of XAI to cybersecurity (e.g., intrusion detection, malware classification, etc.) and the application of cybersecurity to XAI (e.g., attacks on XAI pipelines, potential countermeasures, etc.). We characterize the security of XAI with several security properties that have been discussed in the literature. We also formulate open questions that are either left out or not properly addressed in the literature and provide tentative answers.

Recommended citation: Fabien Charmet, Harry Chandra Tanuwidjaja, Solayman Ayoubi, Pierre-François Gimenez, Yufei Han, Houda Jmila, Gregory Blanc, Takeshi Takahashi & Zonghua Zhang. Explainable artificial intelligence for cybersecurity: a literature survey. Ann. Telecommun. (2022). https://doi.org/10.1007/s12243-022-00926-7

The complexity of unsupervised learning of lexicographic preferences

Published in M-PREF 2022 13th Multidisciplinary Workshop on Advances in Preference Handling, 2022

This paper considers the task of learning users’ preferences on a combinatorial set of alternatives, as generally used by online configurators, for example. In many settings, only a set of selected alternatives during past interactions is available to the learner. Fargier et al. [2018] propose an approach to learn, in such a setting, a model of the users’ preferences that ranks previously chosen alternatives as high as possible; and an algorithm to learn, in this setting, a particular model of preferences: lexicographic preferences trees (LP-trees). In this paper, we study complexity-theoretical problems related to this approach. We give an upper bound on the sample complexity of learning an LP-tree, which is logarithmic in the number of attributes. We also prove that computing the LP tree that minimises the empirical risk can be done in polynomial time when restricted to the class of linear LP-trees.

Recommended citation: Fargier, H., Gimenez, P. F., Mengin, J., & Le Nguyen, B. N. (2022, July). The complexity of unsupervised learning of lexicographic preferences. In 13th Multidisciplinary Workshop on Advances in Preference Handling (M-pref 2022)@ IJCAI 2022 https://hal.inria.fr/hal-03784693/document

Debiasing Android Malware Datasets: How can I trust your results if your dataset is biased?

Published in IEEE Transactions on Information Forensics and Security, 2022

Android security has received a lot of attention over the last decade, especially malware investigation. Researchers attempt to highlight applications’ security-relevant characteristics to better understand malware and effectively distinguish malware from benign applications. The accuracy and the completeness of their proposals are evaluated experimentally on malware and goodware datasets. Thus, the quality of these datasets is of critical importance: if the datasets are outdated or not representative of the studied population, the conclusions may be flawed. We specify different types of experimental scenarios. Some of them require unlabeled but representative datasets of the entire population. Others require datasets labeled with valuable characteristics that may be difficult to compute, such as malware datasets. We discuss the irregularities of datasets used in experiments, questioning the validity of the performances reported in the literature. This article focuses on providing guidelines for designing debiased datasets. First, we propose guidelines for building representative datasets from unlabeled ones. Second, we propose and experiment a debiasing algorithm that, given a biased labeled dataset and a target representative dataset, builds a representative and labeled dataset. Finally, from the previous debiased datasets, we produce datasets for experiments on Android malware detection or classification with machine learning algorithms. Experiments show that debiased datasets perfom better when classifying with machine learning algorithms.

Recommended citation: Miranda, T. C., Gimenez, P. F., Lalande, J. F., Tong, V. V. T., & Wilke, P. (2022). Debiasing Android Malware Datasets: How can I trust your results if your dataset is biased?. IEEE Transactions on Information Forensics and Security. https://ieeexplore.ieee.org/abstract/document/9787514

Towards a Representation of Malware Execution Traces for Experts and Machine Learning

Published in RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, 2022

Dynamic analysis is a common technique to analyze the run-time behavior of software and identify malware (malicious software). Execution traces typically contain the list of system calls with their parameters, the list of accessed files, etc. Several representations have been proposed to organize these data better and help both human experts and automated tools analyze them effectively. This paper reviews these representations and identifies four research problems that the first author plans to investigate during his Ph.D.

Recommended citation: Raulin, V., Gimenez, P. F., Han, Y., & Tong, V. V. T. (2022). Towards a Representation of Malware Execution Traces for Experts and Machine Learning. RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information. https://hal.archives-ouvertes.fr/hal-03675366/document

Detecting APT through graph anomaly detection

Published in RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, 2022

Despite fruitful achievements made by unsupervised machine learning-based anomaly detection for network intrusion detection systems, they are still prone to the issue of high false alarm rates, and it is still difficult to reach very high recalls. In 2020, Leichtnam et al. proposed Sec2graph, an unsupervised approach applied to security objects graphs that exhibited interesting results on single-step attacks. The graph representation and the embedding allowed for better detection since it creates qualitative features. In this paper, we present new experiments to assess the performances of this approach for detecting APT attacks. We achieve better detection performances than the original work’s baseline detection methods on the DAPT2020 dataset. This work is realised in the context of the Ph.D. thesis of Maxime Lanvin, which started in October 2021.

Recommended citation: Lanvin, M., Gimenez, P. F., Han, Y., Majorczyk, F., Mé, L., & Totel, É. (2022, May). Detecting APT through graph anomaly detection. In RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information. https://hal.archives-ouvertes.fr/hal-03675346/document

Towards generic quality assessment of synthetic traffic for evaluating intrusion detection systems

Published in RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information, 2022

Network Intrusion Detection Systems (NIDSes) evaluation requires background traffic. However, real background traffic is hard to collect. We hence rely on synthetic traffic generated especially for this task. The quality of the generated traffic has to be evaluated according to some clearly defined criteria. In this paper, we show how to adapt the quality assessment solutions proposed for different fields of data generation such as image or text generation to network traffic. We summarize our study by discussing the criteria that evaluate the quality of a generated network traffic and by proposing functions to evaluate these criteria. This is the first contribution in the context of the Ph.D. thesis of Adrien Schoen.

Recommended citation: Schoen, A., Blanc, G., Gimenez, P. F., Han, Y., Majorczyk, F., & Mé, L. (2022, May). Towards generic quality assessment of synthetic traffic for evaluating intrusion detection systems. In RESSI 2022-Rendez-Vous de la Recherche et de l'Enseignement de la Sécurité des Systèmes d'Information. https://hal.archives-ouvertes.fr/hal-03675359/document

GUI-Mimic, a cross-platform recorder and fuzzer of Graphical User Interface

Published in 9th GreHack Conference, 2021

In program analysis, a fuzzing toolset is needed to automatically trigger software operations in a natural while efficient way. Especially in dynamic analysis of malware, such a toolset can help execute the suspicious files to unveil their malicious payloads hidden by other benign-looking behaviors. In the fields of software testing, this tool is necessary for triggering and testing the programmed functionalities. Nevertheless, there has not yet been an easy-to-use tool that works on Windows for the purpose of generating activity through the Graphical User Interface (GUI). To meet this requirement, in our work, we develop GUI-Mimic. It is designed to integrate some useful features for stimulating different types of software – mouse and keyboard recording, random mouse and keyboard inputs, editing, trimming, randomization, transformations – to deliver an easy-to-deploy GUI fuzzer over different Operating Systems.

Recommended citation: Raulin, V., Gimenez, P. F., Han, Y., Viet Triem Tong, V., Ouairy, L. (2021). GUI-Mimic, a cross-platform recorder and fuzzer of Graphical User Interface. 9th GreHack Conference https://hal.archives-ouvertes.fr/hal-03449827/document

RIDS: Radio intrusion detection and diagnosis system for wireless communications in smart environment

Published in ACM Transactions on Cyber-Physical Systems, 2021

The expansion of the Internet-of-Things (IoT) market is visible in homes, factories, public places, and smart cities. While the massive deployment of connected devices offers opportunities to improve quality of life and to develop new services, the impact of such devices on the security of the users in a context where the level of malicious threat continues to increase is a major concern. One of the challenges is the heterogeneity and constant evolution of wireless technologies and protocols used. To overcome this problem, we propose RIDS, a Radio Intrusion Detection System that is based on the monitoring and profiling of radio communications at the physical layer level using autoencoder neural networks. RIDS is independent of the wireless protocols and modulation technologies used. Besides, it is designed to provide a threefold diagnosis of the detected anomalies: temporal (start and end date of the detected anomaly), frequential (main frequency of the anomaly), and spatial (location of the origin of the anomaly). To demonstrate the relevance and the efficiency of our approach, we collected a large dataset of radio-communications recorded with three different probes deployed in an experimental room. Multiple real-world attacks involving a wide variety of communication technologies are also injected to assess the detection and diagnosis efficiency. The results demonstrate the efficiency of RIDS in detecting and diagnosing anomalies that occurred in the 400–500 Mhz and 800–900 Mhz frequency bands. It is noteworthy that compromised devices and attacks using these communication bands are generally not easily covered by traditional solutions.

Recommended citation: Gimenez, P. F., Roux, J., Alata, E., Auriol, G., Kaâniche, M., & Nicomette, V. (2021). RIDS: Radio intrusion detection and diagnosis system for wireless communications in smart environment. ACM Transactions on Cyber-Physical Systems, 5(3), 1-1. https://dl.acm.org/doi/10.1145/3441458

Hardware-Performance-Counters-based anomaly detection in massively deployed smart industrial devices

Published in IEEE 19th International Symposium on Network Computing and Applications (NCA), 2020

Energy providers are massively deploying devices to manage distributed resources or equipment. These devices are used for example to manage the energy of smart factories efficiently or to monitor the infrastructure of smart-grids. By design, they typically exhibit homogeneous behavior, with similar software and hardware architecture. Unfortunately, these devices are also of interest to attackers aiming to develop botnets or compromise companies’ security. This paper presents a new protection approach based on Hardware Performance Counters (HPC) to detect anomalies in massively deployed devices. These HPC are processed using outlier detection algorithms. Compared to existing solutions, we propose a lightweight approach based on a comparative analysis of devices’ HPC without relying on the modeling of the software applications running on the devices. To assess the relevance and the effectiveness of the approach, a thorough experimental analysis is carried out in a representative industrial-type environment, sampling the data from 100 Raspberry Pi to simulate about 10,000 devices deployed simultaneously. The results show high detection and performance efficiency under different software profiles and attack payloads. Moreover, the calibration of the approach depends primarily on the hardware rather than the application software running on the devices. It should ease its deployment in an operational environment.

Recommended citation: Bourdon, M., Gimenez, P. F., Alata, E., Kaaniche, M., Migliore, V., Nicomette, V., & Laarouchi, Y. (2020, November). Hardware-Performance-Counters-based anomaly detection in massively deployed smart industrial devices. In 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA) (pp. 1-8). IEEE. https://ieeexplore.ieee.org/document/9306726

On-board Diagnosis: A First Step from Detection to Prevention of Intrusions on Avionics Applications

Published in IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 2020

Nowadays, air travel is one of the safest transportation means. While safety is historically well integrated into avionics systems, it is becoming increasingly important to take into account the security of such systems for the future. In particular, Host-based Intrusion Detection Systems (HIDS) are commonly used in traditional information systems to improve their security. The adaptation of such systems for deployment inside an aircraft has been studied in another work and has shown to be effective in detecting anomalous behavior in an avionic application. However, the detection itself is not sufficient to provide an on-board reaction, and to prevent such intrusion. This paper proposes to improve such HIDS by introducing a signature-based system capable of providing a first diagnosis after the detection of an anomalous behavior. The proposed diagnosis approach is based on the definition of the signature of an alert, and its comparison with a knowledge database that is regularly updated throughout aircraft lifetime. This approach has been implemented on a real avionic computer and yielded good results in terms of classification accuracy and resources consumption.

Recommended citation: Damien, A., Gimenez, P. F., Feyt, N., Nicomette, V., Kaâniche, M., & Alata, E. (2020, October). On-board Diagnosis: A First Step from Detection to Prevention of Intrusions on Avionics Applications. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) (pp. 358-368). IEEE. https://ieeexplore.ieee.org/document/9251064

Experimental evaluation of three value recommendation methods in interactive configuration

Published in Journal of Universal Computer Science, 2020

The present work deals with the recommendation of values in interactive configuration, with no prior knowledge about the user, but given a list of products previously configured and bought by other users (“sales histories”). The basic idea is to recommend, for a given variable at a given step of the configuration process, a value that has been chosen by other users in a similar context, where the context is defined by the variables that have already been decided, and the values that the current user has chosen for these variables. From this point, two directions have been explored. The first one is to select a set of similar configurations in the sales history (typically, the k closest ones, using a distance measure) and to compute the best recommendation from this set – this is the line proposed by [Coster et al., 2002]. The second one, that we propose here, is to learn a model from the entire sample as representation of the users’ preferences, and to use it to recommend a pertinent value; three families of models are experimented: the Bayesian networks, the naive Bayesian networks and the lexicographic preferences trees.

Recommended citation: H Fargier, PF Gimenez, J Mengin - Journal of Universal Computer Science, 2020 https://www.jucs.org/jucs_26_3/experimental_evaluation_of_three.html

Apprentissage de préférences en espace combinatoire et application à la recommandation en configuration interactive

Published in Université Paul Sabatier - Toulouse III, 2018

The analysis and the exploitation of preferences occur in multiple domains, such as economics, humanities and psychology. E-commerce got interested in the subject a few years ago with the surge of product personalisation. Our study deals with the representation and the learning of preferences on objects described by a set of attributes. These combinatorial spaces are huge, which makes the representation of an ordering in extenso intractable. That’s why preference representation languages have been built: they can represent preferences compactly on these huge spaces. In this dissertation, we study preference representation languages and preference learning.

Recommended citation: Pierre-François Gimenez. Apprentissage de préférences en espace combinatoire et application à la recommandation en configuration interactive. Intelligence artificielle [cs.AI]. Université Paul Sabatier - Toulouse III, 2018. Français. ⟨NNT : 2018TOU30182⟩. ⟨tel-02303275⟩ https://www.theses.fr/2018TOU30182

Learning lexicographic preference trees from positive examples

Published in AAAI Conference on Artificial Intelligence, 2018

This paper considers the task of learning the preferences of users on a combinatorial set of alternatives, as it can be the case for example with online configurators. In many settings, what is available to the learner is a set of positive examples of alternatives that have been selected during past interactions. We propose to learn a model of the users’ preferences that ranks previously chosen alternatives as high as possible. In this paper, we study the particular task of learning conditional lexicographic preferences. We present an algorithm to learn several classes of lexicographic preference trees, prove convergence properties of the algorithm, and experiment on both synthetic data and on a real-world bench in the domain of recommendation in interactive configuration.

Recommended citation: Fargier, H., Gimenez, P. F., & Mengin, J. (2018, April). Learning lexicographic preference trees from positive examples. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 32, No. 1). https://ojs.aaai.org/index.php/AAAI/article/download/11808/11667

Recommendation for product configuration: an experimental evaluation (Best paper award)

Published in 18th International Configuration Workshop, 2016

The present work deals with the the recommendation of values in interactive configuration, with no prior knowledge about the user, but given a list of products previously configured and bought by other users (“sale histories”). The basic idea is to recommend, for a given variable at a given step of the configuration process, a value that has been chosen by other users in a similar context, where the context is defined by the variables that have already been decided, and the values that the current user has chosen for these variables. From this point, two directions have been explored. The first one is to select a set of similar configurations in the sale history (typically,the k closest ones, using a distance measure) and to compute the best recommendation from this set - this is the line proposed by [9]. The second one, that we propose here, is to learn a Bayesian network from the entire sample as model of the users’ preferences, and to use it to recommend a pertinent value.

Recommended citation: Fargier, H., Gimenez, P. F., & Mengin, J. (2016, September). Recommendation for product configuration: an experimental evaluation. In 18th International Configuration Workshop (CWS 2016) within CP 2016: 22nd International Conference on Principles and Practice of Constraint Programming (pp. pp-9). https://hal.archives-ouvertes.fr/hal-01445239/file/fargier_17217.pdf