The Cert Dataset Decade: A Systematic Review of Methodological Evolution and Performance Bias
Purpose:
The purpose of this paper is to identify methodological biases and limitations in machine learning–based insider threat detection using the Computer Emergency Response Team [CERT] dataset, in order to guide the development of more realistic, robust, and operationally relevant detection approaches.
Design/Methods/Approach:
The objectives are achieved through a systematic literature analysis of 131 peer-reviewed studies published between 2013 and 2025 that apply machine learning to insider threat detection using the CERT dataset, employing a Preferred Reporting Items for Systematic Reviews and Meta-Analyses [PRISMA]-guided selection process and a structured comparative framework to examine dataset versions, feature engineering strategies, model architectures, and evaluation metrics from a methodological and empirical perspective.
Findings:
The analysis shows that most studies rely on the less realistic CERT v4.2 dataset, resulting in inflated performance that does not generalize to operational settings. It also finds that feature engineering is a stronger determinant of detection performance than model complexity, while inconsistent evaluation practices hinder meaningful comparison across studies.
Research Limitations / Implications:
The study is limited by its reliance on published research using a single synthetic dataset, which constrains generalization to real-world environments.
Practical Implications:
The findings indicate that practitioners should be cautious when adopting models validated on simplified benchmark settings, and instead prioritize solutions tested under extreme class imbalance. Emphasis should be placed on robust feature engineering, unsupervised or hybrid detection approaches, and evaluation metrics.
Originality/Value:
This paper provides the first large-scale, methodologically focused analysis of insider threat detection research that explicitly exposes performance inflation caused by dataset version bias and evaluation inconsistency, offering concrete, evidence-based guidance for improving the realism, comparability, and operational value of future studies in the field.
UDC: 004.056
Keywords: insider threat detection, CERT dataset, machine learning, anomaly detection, dataset bias, evaluation metrics