Synthetic Network Traffic Generation for Intrusion Detection Systems: a Systematic Literature Review

Date:

Network data can be difficult to collect due to privacy and confidentiality reasons. For these reasons, network datasets are typically created with controlled environments called testbeds. However, these datasets are regularly criticized for their limited size, class imbalance, obsolescence, and lack of actual user activity. Following the rapid development of generative artificial intelligence, new methods have been applied to synthetic network traffic generation without emulation or simulation. This systematic literature review assesses the current state of synthetic network traffic generation: why these data are generated, what AI models are used, what kind of data is generated, and how these generated methods are evaluated. We derive from this review a set of open problems and recommendations for researchers and industry practitioners towards fast and high-quality network data.