Home/Projects/Urban Sound Classification
Back to Home
Urban Sound Classification background visual

Urban Sound Classification

Hybrid Deep Learning for Audio

CNNBiLSTMGRU
92%Accuracy
Jan – Apr 2025
View Source

Problem Statement

Urban noise pollution is a growing public health concern, linked to cardiovascular disease, cognitive impairment, and sleep disruption. Effective smart city noise management requires automated classification of urban sound sources — but traditional approaches struggle with the temporal complexity and environmental variability of urban audio.

Solution Architecture

Designed a novel hybrid architecture combining: (1) CNN layers for spectral feature extraction from mel-spectrograms; (2) Bidirectional LSTM for capturing temporal dependencies in audio sequences; (3) GRU layers for efficient sequential modeling. The pipeline processes raw audio → mel-spectrogram conversion → CNN feature maps → BiLSTM temporal encoding → GRU refinement → classification. Trained on the UrbanSound8K benchmark dataset.

Tech Stack

Python, TensorFlow/Keras, Librosa (audio processing), NumPy, Pandas, scikit-learn, Matplotlib, Seaborn.

CNNBiLSTMGRUTensorFlowLibrosa

Results & Metrics

Achieved 92% overall accuracy on the UrbanSound8K dataset, a 20% improvement over the CNN-only baseline (72%). The hybrid architecture demonstrated superior performance on temporally complex sound classes (engine idling, jackhammer) where spatial-only features are insufficient. Smart city noise monitoring application potential validated.

Challenges & Learnings

Audio preprocessing consistency across different recording conditions required robust normalization. Finding the optimal fusion point between CNN and RNN branches required extensive architecture search. Managing computational cost of the BiLSTM component for potential real-time deployment required careful layer sizing.