As digital music libraries continue to expand, the challenge of accurately categorizing musical genres remains high on the agenda. A study in the International Journal of Information and Communication Technology introduces a deep learning model designed to improve the classification of classical music genres.
By employing multi-channel learning (MCL) and Mel-spectrogram analysis, the model, known as MC-MelNet, offers what the research suggests is a more nuanced and efficient approach to genre identification. Tests carried out by its developer, Lei Zhang of the Henan Academy of Drama Arts at Henan University in Zhengzhou, China, show that it outperforms traditional classification methods.
The ability to classify music automatically has far-reaching implications for streaming services, music recommendation algorithms, and digital archiving. Classical music, with its intricate structures and subtle variations, presents a particular challenge for automated classification. Zhang explains that MC-MelNet addresses these issues by integrating multiple layers of analysis, capturing both the tonal and temporal characteristics of a composition.
At the core of MC-MelNet’s innovation is its multi-channel learning framework, which processes multiple audio features simultaneously. Conventional approaches rely primarily on Mel-spectrograms, which break down an audio signal into different frequency components in a way that mimics human hearing. However, while effective in capturing tonal elements, Mel-spectrograms alone do not fully represent the temporal dynamics of music.
MC-MelNet overcomes this limitation by incorporating additional audio features such as Mel-frequency cepstral coefficients (MFCC) and Chroma features. MFCCs capture the timbral qualities of a sound, making them useful for distinguishing between different instruments or playing styles. Chroma features, on the other hand, focus on pitch content and harmonic structure. By combining these elements, MC-MelNet creates a richer and more detailed representation of musical compositions, allowing it to distinguish between closely related classical genres with greater accuracy.
Unlike conventional classification methods, which require manual feature extraction, MC-MelNet uses an end-to-end deep learning approach. It utilizes convolutional neural networks (CNNs) to detect spatial patterns in audio data and recurrent neural networks (RNNs), specifically long short-term memory (LSTM) networks, to process sequential musical information.
MC-MelNet might have applications beyond classical music classification. It could, for instance, be adapted for real-time sound processing and audio event detection. Enhancing the model’s generalizability by training it on a more diverse dataset could make it applicable to a wider range of genres, improving automated music classification for commercial streaming platforms.
Zhang, L. (2025) ‘Classification of classical music genres based on Mel-spectrogram and multi-channel learning’, Int. J. Information and Communication Technology, Vol. 26, No. 5, pp.39–53.