A new video equivalent of optical character recognition (OCR) but for sign language is described by researchers from China in the International Journal of Systems, Control and Communications.
Kai Zhao, Daotong Wang, and Jianbo Su of Shanghai Jiao Tong University and Kejun Zhang and Yu Zhai of the Shanghai Lingzhi High-Tech Corporation discuss a system that can recognise Chinese sign language in a video stream and convert the language in real-time into text. Such a system could be used to automate the generation of subtitles for people sharing the video stream who are not familiar with Chinese sign language. The system was built with a database of half a million video segments and uses a three-dimensional convolutional neural network to extract the relevant frames for conversion.
This is, the team writes, “a complete real-time sign language recognition system” for Chinese sign language. It is composed of a human interaction interface, a motion detection module, a hand and head detection module, and a video acquisition mechanism. The researchers have now demonstrated 92.6% recognition accuracy on a dataset containing 1,000 vocabularies. The system would not only be useful in adding captions to video of a signer but could be used in public areas such as hospitals, banks, and train stations where a person signing could talk to a member of staff who is a non-signer for instance.
The team adds that improvements to the accuracy of the system might be made by incorporating skin detection to extract greater subtleties from the movements of the person signing. Likewise, the addition of detection of the signers underlying skeleton would also add to the sophistication of the recognition system and so improve accuracy.
Zhao, K., Zhang, K., Zhai, Y., Wang, D. and Su, J. (2021) ‘Real-time sign language recognition based on video stream’, Int. J. Systems, Control and Communications, Vol. 12, No. 2, pp.158–174.