Network traffic generation: a non-technical look at its characteristics and stakes


Intrusion detection is an essential mechanism in information systems security. Machine learning has been successfully applied to this problem. These techniques rely on training data used to train a detection model. This training data generally comes from datasets that are often more or less automatically generated. Worse, the number of datasets remains small enough that the diversity of the dataset is questionable, and its aging is problematic. A solution to these problems is synthetic data generation: it would be free of experimental inaccuracies, could be easily updated, and alleviate the class imbalance by generating more data on rare classes. We plan to generate benign data only, as attacks are easier to generate with dedicated tools. This talk highlights the characteristics of network traffic generation from a non-technical point of view, adapted to data mining practitioners, as well as the issues and opportunities. Slides