Comparative Analysis of SQLi Detection Models
Time: 29 Dec 2025, 18:30-18:45
Session:
[S3] Track 3: Privacy, Security for Networks » [S3] Track 3: Privacy, Security for Networks
Type: Online
Abstract:
SQL injection (SQLi) remains a common and ongoing threat to web applications. Although various SQLi detection techniques have been proposed, most studies still evaluate them on a single dataset, which makes their conclusions lack verifiability across data conditions. This also makes it difficult to reveal the performance differences of the model under different scales and distributions. This study compares and evaluates machine learning (ML) and deep learning (DL) models based on two publicly available SQLi datasets that differ in size and composition.
The machine learning (ML) pipelines use a hybrid representation that combines character-level TF-IDF, word-level TF-IDF obtained from a SQL-aware tokenizer, and numeric behavioral indicators. The DL branch uses placeholder-based normalization and token-sequence modeling, covering recurrent networks (LSTM and GRU) as well as attention-based variants and a Transformer architecture.
Empirical results have shown that the scale of the dataset plays a significant role in the relative performance of DL models. On the smaller corpus, the Long Short-Term Memory (LSTM) model with multi-head attention achieves the best performance among all DL architectures, while several ML models perform at a comparable or higher level. On the larger and more heterogeneous corpus, the Transformer model attains the highest F1 macro, reaching 0.9946. Linear Support Vector Classification is one of the robust ML benchmarks on both datasets. These results show that ML models lead on the smaller dataset but are surpassed by the top-performing DL model once the dataset becomes larger and more diverse.
Keywords:
SQL injection detection, machine learning, deep learning, LinearSVC, Transformer, TF–IDF, tokenization, web application security
Speaker: