A cross-border community for researchers with openness, equality and inclusion
Comparative Analysis of SQLi Detection Models
ID:224 View protection:Participant Only Updated time:2025-12-28 11:13:49 Views:219 Online

Start Time:2025-12-29 18:30

Duration:15min

Session:[S3] Track 3: Privacy, Security for Networks [S3] Track 3: Privacy, Security for Networks

Abstract
SQL injection (SQLi) remains a common and ongoing threat to web applications. Although various SQLi detection techniques have been proposed, most studies still evaluate them on a single dataset, which makes their conclusions lack verifiability across data conditions. This also makes it difficult to reveal the performance differences of the model under different scales and distributions. This study compares and evaluates machine learning (ML) and deep learning (DL) models based on two publicly available SQLi datasets that differ in size and composition. 

The machine learning (ML) pipelines use a hybrid representation that combines character-level TF-IDF, word-level TF-IDF obtained from a SQL-aware tokenizer, and numeric behavioral indicators. The DL branch uses placeholder-based normalization and token-sequence modeling, covering recurrent networks (LSTM and GRU) as well as attention-based variants and a Transformer architecture. 

Empirical results have shown that the scale of the dataset plays a significant role in the relative performance of DL models. On the smaller corpus, the Long Short-Term Memory (LSTM) model with multi-head attention achieves the best performance among all DL architectures, while several ML models perform at a comparable or higher level. On the larger and more heterogeneous corpus, the Transformer model attains the highest F1 macro, reaching 0.9946. Linear Support Vector Classification is one of the robust ML benchmarks on both datasets. These results show that ML models lead on the smaller dataset but are surpassed by the top-performing DL model once the dataset becomes larger and more diverse.
Keywords
SQL injection detection, machine learning, deep learning, LinearSVC, Transformer, TF–IDF, tokenization, web application security
Speaker
Gegentana Altanhuyag
Mongolian University of Science and Technology; Mongolia

Post comments
Verification Code Change Another
All comments
Important Dates
  • Conference date

    12-29

    2025

    -

    12-31

    2025

  • 12-30 2025

    Presentation submission deadline

  • 02-10 2026

    Draft paper submission deadline

  • 02-10 2026

    Registration deadline

Sponsored By

United Societies of Science

Organized By

扎尔卡大学

Contact info
×

USS WeChat Official Account

USSsociety

Please scan the QR code to follow
the wechat official account.