AI-Driven Acoustic Biomarker Extraction and Deep Learning for Automated Cough-Based Respiratory Disease Classification
Time: 01 Jan 1970, 08:00
Session: [S1] Day-1 (06/12/2025) » [S1-2] Technical Sessions 1
Type: Oral Presentation
Abstract:
Respiratory illnesses continue to be a substantial global health challenge, and timely and accurate diagnosis is vital for effective treatment. Manual diagnosis of respiratory diseases such as COVID-19, asthma, and chronic obstructive pulmonary disease, is prohibitively time-consuming, resource-intensive, and often not feasible in limited-resource settings. This research proposes an automated machine learning system for detection of cough sounds classified for contactless and non-invasive screening of respiratory diseases. The proposed system applies state-of-the-art acoustic feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCCs), Spectral Centroid, and Zero-Crossing Rate to capture subtle acoustic signatures in cough sounds, which are not perceptible to the human ear. The proposed system employs the COUGHVID V3 crowdsourced dataset consisting of one of the largest freely available datasets for publicly available cough sounds, including over 25,000 cough recordings with geographic variation, annotated by expert physician. Several classification models, including Convolutional Neural Networks (CNNs), Support Vector Machines (SVMs), and Random Forests will be trained to discriminate across coughs annotated as healthy, COVID-like, and asthma-related. The proposed system consists of the complete pipeline for the task: data preprocessing (noise removal, normalization, and silence trimming extraction), feature engineering using the Librosa library, supervised model training, and real-time classification with confidence scoring. With initial results demonstrating classification accuracy of 85% to 90% on testing data, the proposed system is extensible to web-based interface, mobile applications, and telemedicine use. This lightweight, Python-based framework addresses an important need for accessible health AI by providing a scalable, inexpensive method for early respiratory disease screening, without requiring costly hardware or clinical facilities. The system demonstrates considerable potential as a deployable option for healthcare providers, diagnostic centers, telemedicine services, or remote health monitoring activities, especially in underserved populations where traditional diagnostic means are not widely available. In addition to the detection of COVID-19, this research describes a reproducible machine learning pipeline that could be applied to broader biomedical audio analyses, including breath sound profiling and diagnostics using speech as the diagnostic modality, thereby contributing to the field of audio-based digital health diagnostics.
Keywords:
cough sound detection, machine learning, respiratory disease classification, audio signal processing
Speaker: