RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment-2025 Asian Conference on Communication and Networks-UNITED SOCIETIES OF SCIENCE

Presentation details

RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment

ID：164 View protection：Participant Only Updated time：2025-12-23 13:29:05 Views：103 Online

Start Time：2025-12-30 13:45

Duration：15min

Session：[S7] Track 7: Pattern Recognition, Computer Vision and Image Processing [S7-2] Track 7: Pattern Recognition, Computer Vision and Image Processing

Video No right to play Slide

Abstract

We introduce RaP-ProtoViT, an end-to-end dual-head transformer for 8-class GI endoscopy (Kvasir-v2). A margin head (ArcFace/AM-Softmax) enforces angular separation, while a prototype head aggregates top-k token–prototype similarities (with M trainable prototypes/class); a lightweight input-adaptive MLP fuses the heads. A leakage-aware pipeline (pHash dedup + GroupKFold) prevents near-duplicate bleed-over. Training uses AdamW(+SAM) with cosine warm-up, DropPath, label smoothing, SWA, and post-hoc temperature scaling; two-stage HPO (MOTPE+ASHA → qEHVI) under Latency@224 ≤ 200 ms and memory constraints selects operating points. On Kvasir-v2 the model attains 99.1% accuracy, Macro-F1 = 0.991, Macro-AUPRC = 0.997, AUROC = 0.998, and ECE ≈ 0.9%, with per-class F1 tightly clustered in 0.988–0.994 and fold stability (±0.2 pp accuracy, ±0.002 Macro-F1). Ablations show margin-only/prototype-only variants reduce Macro-F1 to 0.967/0.975 and raise ECE to 2.8%/2.2%; removing adaptive fusion drops Macro-F1 to 0.984. The proposed HPO converges 2–3× faster and yields better final MF1/AUPRC/ECE than Bayesian TPE or Random+ASHA. The prototype head provides localized, intrinsically interpretable evidence, complementing the margin head’s discrimination, within a single-model deployment footprint. By advancing robust, interpretable, and computationally efficient AI for gastric endoscopy, our approach can improve early detection of gastrointestinal disease and enable reliable clinical deployment across diverse healthcare settings.

Keywords

Endoscopy classification, Vision transformer, Prototype learning, hyperparameter optimization.

Speaker

Mohamadreza Khosravi

Shiraz University of Medical Sciences

Post comments

All comments

Important Dates

Conference date

12-29

2025

-

12-31

2025
12-30 2025

Presentation submission deadline
02-10 2026

Draft paper submission deadline
02-10 2026

Registration deadline

Organized By

扎尔卡大学

Contact info

Miss AsianComNet
[email protected]

Previous Conferences

2024 Asian Conference on Communication and Networks

2025 Asian Conference on Communication and Networks (ASIANComNet 2025)

Presentation details

RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment

Abstract

Keywords

Speaker

Mohamadreza Khosravi

Post comments

All comments

Important Dates

Conference date

Sponsored By

Organized By

Contact info

Previous Conferences

2025 Asian Conference on Communication and Networks (ASIANComNet 2025)

Presentation details

RaP-ProtoViT: Efficient Dual-Head Transformers for Robust Gastric Endoscopy Classification and Generalizable Clinical Deployment

Abstract

Keywords

Speaker

Mohamadreza Khosravi

Post comments

All comments

Important Dates

Conference date

Sponsored By

Organized By

Contact info

Previous Conferences

WeChat Share

USS WeChat Official Account