A new computer program can automatically extract the vocals from a random collection of mp3 music files and classify each track depending on whether the singer is male or female, a trained singer, a semi-professional or an amateur. The program can also tell you in which vocal range the artist is singing. The next version might be able to tell you the song’s exact key.
Writing in the International Journal of Signal and Imaging Systems Engineering, computer scientists in the LabGed Laboratory at The University of Annaba, in Algeria, explain how they used a frequency masking tool to extract the vocal parts from the accompaniment in 1200 music samples. They then applied statistical tools known as Gaussian Mixture Models can be incorporated into a computer program to analyse the voices. In parallel, voice experts listened to the samples and classified them by into male and female groups and by vocal range: soprano, mezzo-soprano, contralto, tenor, baritone and bass. They also grouped them by quality (professional semi-professional and amateur).
Once the vocal is separated from the accompaniment by the program it can then analyse pitch and vibrato. Pitch is the attribute by which we “sort” sounds lowest to highest, the pitch of a sung note is a function of the fundamental frequency of vibration of the sound source and for the human voice that is usually most energetic in the range 200 to 2000 Hertz (vibrations per second). A bass singer might go as low as 80 Hz while a soprano would commonly hold notes at 1400 Hz. In contrast, the frequency of the spoken word is around 400 Hz at its highest, the musical note “A” above middle C is set at a frequency of 440 Hz.
How much the pitch varies and at what rate around a fundamental sung note is the vibrato and is very important in assessing the quality of a singing voice. Vibrato is largely absent from the spoken word but almost ubiquitous in the singing voice, where we perceive it to add character, emotion and other qualities to the voice. The frequency at which vibrato leads a sung note to deviate from the fundamental held note is itself not audible at between 4 and 8 Hz but the effect is obvious to the listener. The computer program can, of course, analyse the changes in fundamental frequencies in the singing voice, the vibrato frequency and depth and use this information to correlate each extract voice with a measure of quality.
The preliminary results from Faiz Maazouzi and Halima Bahi on the 1200 vocal music samples showed 97 percent accuracy for singing voice quality and almost 97% accuracy for singing voice type.
“Type-2 Fuzzy Gaussian mixture models for singing voice classification in commercial music production” in Int. J. Signal and Imaging Systems Engineering, 2013, 6, 111-118