Research in the International Journal of Arts and Technology outlines a new approach to the recognition of the emotional content of music, addressing many of the challenges in this field and opening up the possibility of classifying music more accurately for efficient retrieval. Fanguang Zeng of the Academy of Music at Pingdingshan University, China, explains how his approach based on multiple data fusion achieves an accuracy of up to 99% and takes less than 14 seconds per task.
Traditional methods of music emotion recognition have struggled with issues such as low accuracy and lengthy processing times, limiting their effectiveness in music retrieval and recommendation systems. Zeng’s approach uses non-negative matrix decomposition, a technique that breaks down multimodal music emotion into separate audio and text-based emotional data to improve accuracy and reduce processing time considerably.
The approach can extract emotional features from both the audio and the lyrical content of a music file. Audio features encompass elements such as pitch and intensity, while text features can be analysed for particular words and phrases associated with a given emotion using Doc2Vec. Zeng’s system then weights the various characteristics, fuses, and processes them to provide a multimodal music emotional dataset. The analysis uses a support vector machine to process the normalized multimodal data.
Zeng reports a significant improvement over conventional approaches, demonstrating markedly improved accuracy and a substantial reduction in the length of time needed to classify a given piece of music. The work thus addresses the need for efficient music retrieval based on the emotional characteristics of music content in a large streaming system for instance. Swift and accurate recognition would allow a large amount of music to be appropriately tagged so that a listener could home in on a selection of music based on its emotional content to suit their mood, for instance. The same approach would be useful in music recommendation systems, personalized playlists, and music therapy applications. It could also be used by content creators to associated a specific mood to their output, whether a podcast, photographic montage or other kinds of production.
Zeng, F. (2023) ‘Multimodal music emotion recognition method based on multi data fusion’, Int. J. Arts and Technology, Vol. 14, No. 4, pp.271–282.