Pierre-François Gimenez

Fos-R aims to generate realistic and diverse network and system activity from patterns learned from actual data.

Download binary

You can download last compiled binaries from https://github.com/Fos-R/Fos-R/releases/latest.

Compile from source

You can also compile Fos-R from source directly. Fos-R is distributed with crates.io. First, install Rust with rustup. Then, you can install Fos-R with:

cargo install fosr

You can check the install with:

fosr

Library

Fos-R also includes a Rust library that exposes the main part of the software. Its documentation is here.

Real-world uses

BreizhCTF 2025

Fos-R created background network traffic during the BreizhCTF 2025 hacking competition for about 600 participants. During that competition, Fos-R was deployed on 750 virtual machines for a total of 23,000 cumulated hours. Fos-R was also used for smaller CTF competitions, such as TCE CTF and RESSI CTF.

URSID

Fos-R is integrated as an option to Ursid.

TADAM

Published: May 01, 2025

Authors: Lénaïg Cornanguer and Pierre-François Gimenez

License: BSD-3-Clause

TADAM is a probabilistic time automata learner from noisy observations, made in the context of the SecGen associate team between Inria and CISPA. It an implementation of the learning algorithm described in TADAM: Learning Timed Automata From Noisy Observations. Pypi page. Source code.

Fos-R

Authors: Pierre-François Gimenez et al.

License: GPLv3

Check https://pfgimenez.fr/fosr/ for more information.

FlowChronicle: network flow data miner and generator

Published: December 09, 2024

Authors: Joscha Cüppers, Pierre-François Gimenez and Adrien Schoen

License: MIT

FlowChronicle is a network flow records data miner and generator, made in the context of the SecGen associate team between Inria and CISPA. It an implementation of the system described in FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining. Pypi page. Source code.

TADAM: Learning Timed Automata From Noisy Observations

Published in SIAM International Conference on Data Mining (SDM25), 2025

Timed Automata (TA) are formal models capable of representing regular languages with timing constraints, making them well-suited for modeling systems where behavior is driven by events occurring over time. Most existing work on TA learning relies on active learning, where access to a teacher is assumed to answer membership queries and provide counterexamples. While this framework offers strong theoretical guarantees, it is impractical for many real-world applications where such a teacher is unavailable. In contrast, passive learning approaches aim to infer TA solely from sequences accepted by the target automaton. However, current methods struggle to handle noise in the data, such as symbol omissions, insertions, or permutations, which often result in excessively large and inaccurate automata. In this paper, we introduce TADAM, a novel approach that leverages the Minimum Description Length (MDL) principle to balance model complexity and data fit, allowing it to distinguish between meaningful patterns and noise. We show that TADAM is significantly more robust to noisy data than existing techniques, less prone to overfitting, and produces concise models that can be manually audited. We further demonstrate its practical utility through experiments on real-world tasks, such as network flow classification and anomaly detection.

Recommended citation: Cornanguer, L. & Gimenez, P. F., (2025 May). TADAM: Learning Timed Automata From Noisy Observations. In the SIAM International Conference on Data Mining (SDM25).

FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining

Published in 20th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2024

Network traffic datasets are regularly criticized, notably for the lack of realism and diversity in their attack or benign traffic. Generating synthetic network traffic using generative machine learning techniques is a recent area of research that could complement experimental test beds and help assess the efficiency of network security tools such as network intrusion detection systems. Most methods generating synthetic network flows disregard the temporal dependencies between them, leading to unrealistic traffic. To address this issue, we introduce FlowChronicle, a novel synthetic network flow generation tool that relies on pattern mining and statistical models to preserve temporal dependencies. We empirically compare our method against state-of-the-art techniques on several criteria, namely realism, diversity, compliance, and novelty. This evaluation demonstrates the capability of FlowChronicle to achieve high-quality generation while significantly outperforming the other methods in preserving temporal dependencies between flows. Besides, in contrast to deep learning methods, the patterns identified by FlowChronicle are explainable, and experts can verify their soundness. Our work substantially advances synthetic network traffic generation, offering a method that enhances both the utility and trustworthiness of the generated network flows.

Recommended citation: Cüppers, J., Schoen, A., Blanc, G. & Gimenez, P. F., (2024, December). FlowChronicle: Synthetic Network Flow Generation through Pattern Set Mining Generation. In the 20th International Conference on emerging Networking EXperiments and Technologies (CoNEXT). https://dl.acm.org/doi/10.1145/3696407

A Tale of Two Methods: Unveiling the limitations of GAN and the Rise of Bayesian Networks for Synthetic Network Traffic Generation

Published in 9th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2024), 2024

The evaluation of network intrusion detection systems requires a sufficient amount of mixed network traffic, i.e., composed of both malicious and legitimate flows. In particular, obtaining realistic legitimate traffic is hard. Synthetic network traffic is one of the tools to respond to insufficient or incomplete real-world datasets. In this paper, we only focus on synthetically generating high-quality legitimate traffic and we do not delve into malicious traffic generation. For this specific task, recent contributions make use of advanced machine learning-driven approaches, notably through Generative Adversarial Networks (GANs). However, evaluations of GAN-generated data often disregards pivotal attributes, such as protocol adherence. Our study addresses the gap by proposing a comprehensive set of metrics that assess the quality of synthetic legitimate network traffic. To illustrate the value of these metrics, we empirically compare advanced network-oriented GANs with a simple and yet effective probabilistic generative model, Bayesian Networks (BN). According to our proposed evaluation metrics, BN-based network traffic generation outperforms the state-of-the-art GAN-based opponents. In our study, BN yields substantially more realistic and useful synthetic benign traffic and minimizes the computational costs simultaneously.

Recommended citation: Schoen, A., Blanc, G., Gimenez, P. F., Han, Y., Majorczyk, F., & Mé, L. (2024). A Tale of Two Methods: Unveiling the limitations of GAN and the Rise of Bayesian Networks for Synthetic Network Traffic Generation. In Proceedings of the 9th International Workshop on Traffic Measurements for Cybersecurity (WTMC 2024).

Scientific contributors

Inria: Pierre-François Gimenez, Yufei Han, Ludovic Mé, Adrien Schoen
CISPA: Lénaïg Cornanguer, Joscha Cüppers
Télécom SudParis: Grégory Blanc
DGA: Frédéric Majorczyk

Software contributors

Inria: Pierre-François Gimenez, Adrien Schoen
CISPA: Lénaïg Cornanguer, Joscha Cüppers
CentraleSupélec: Evan Morin, Florentin Labelle

Links

https://github.com/Fos-R