A new system that improves on the detection of pronunciation errors among non-native speakers could improve English language learning. The technology, discussed in the International Journal of Computing Science and Mathematics, uses speech recognition tools and statistical modelling. It could offer English learners feedback and track their progress, particularly in regions where there is limited access to face-to-face human instruction.
Wenna Dou of the University of Civil Engineering and Architecture in Beijing, China, has focused on Chinese learners of English, a group that often faces challenges in mastering the nuances of English pronunciation due to phonetic and prosodic differences between the two languages.
At the core of this system lies the use of Mel Frequency Cepstral Coefficients (MFCC), a technique commonly employed in speech analysis that simulates how the human ear processes sound. By converting speech into digital signals and emphasizing features such as pitch and frequency, the method captures intonation points. These are key moments in spoken language where pronunciation is most susceptible to error.
To assess these intonation points, Dou used a statistical framework known as the Hidden Markov Model (HMM). HMMs are particularly effective in analysing time-dependent data, such as speech, because they model changing systems based on a series of probabilities. By using a segmentation process that breaks speech into smaller units, Dou has improved the system so that it can cope with longer sections of speech and maintain accuracy without being stymied by complexity.
Dou has also introduced a “degree component signal detection method.” This enhancement refines the system’s ability to identify spectral features, the variations in sound frequency, that often indicate mispronunciation. These features are then compared to a database of standard English pronunciations. The resulting system can quickly flag pronunciation errors with more than 97% accuracy, according to Dou’s tests.
As English continues to serve as a global medium for education, business, and international collaboration, tools that promote clearer speech and mutual intelligibility are in increasing demand. Automated feedback mechanisms, especially those with real-time capability, offer learners immediate and objective insights into their spoken language skills and lead them towards improving.
Dou, W. (2025) ‘A method for capturing English oral pronunciation errors based on speech recognition’, Int. J. Computing Science and Mathematics, Vol. 21, No. 1, pp.32–47.