Authors: Suresh Arthi, St. Joseph's College Of Engineering Manimaran Dharsha, St. Joseph's College Of Engineering Jeniffer Dr. J. Thresa, St. Joseph's College Of Engineering
Tuberculosis (TB) remains a major global health issue, which calls for speedy, precise, and affordable diagnostic tools, especially in low- and middle-income countries (LMICs). To overcome these constraints, we present a Tri-Modal Ensemble Framework that combines three synergistic data streams—CXR images, cough sound, and clinical text reports—to provide strong and explainable TB diagnosis. Each modality is processed by a specific deep learning model: a Feature Map Normalization (FMN) Convolutional Neural Network (CNN) for radiographic feature extraction, a Capsule Network (CapsNet) to maintain spatial and temporal hierarchies in cough spectrograms, and BioBERT embeddings to capture semantic and contextual information from clinical narratives. These models' predictions are combined through an ensemble weighting scheme driven by a Mayfly Optimization (MFO) that dynamically regulates the contribution of every modality according to predictive confidence and reliability. This framework heightens diagnostic sensitivity, minimizes false negatives, and facilitates scalability and modular deployment on various healthcare infrastructures. The system proves that tri-modal optimization and integration notably enhance diagnostic robustness compared to unimodal and traditional ensemble baselines.
Index Terms— Tri-Modal Ensemble, Tuberculosis Diagnosis, Mayfly Optimization, Capsule Networks, CNN, BioBERT, Weighted Fusion, Deep Learning.
Keywords: Tri-Modal Ensemble,Tuberculosis Diagnosis,Mayfly Optimization,Capsule Networks,CNN,bioBERT,Weighted feature fusion,deep learning
Published in: 2024 Asian Conference on Communication and Networks (ASIANComNet)
Date of Publication: --
DOI: -
Publisher: IEEE