Facing up to the facts – Science Spot

Technology increasingly relies on facial recognition, whether to unlock one’s smartphone or to monitor public spaces. However, faces move, cameras rarely face catch use perfectly face on. People tilt their heads, glance sideways, or are caught in the periphery of a busy scene. This issue thus remains a challenge for facial recognition systems that tend to need our full attention, as it were.

Research in the International Journal of Information and Communication Technology discusses a new approach developed by a team in China – the Guided Deformable Attention (GDA) network. Bin Deng and Guanghui Dengof Hunan University of Technology in Zhuzhou, Hunan, China, say this approach steps up to address the problem of rotated faces. The system could improve security systems as well as have applications in gaming and the entertainment industry in general.

Standard facial recognition systems use convolutional neural networks (CNNs) to detect features such as our eyes and nose based on their expected position in a straightforward, front-facing portrait. They are quite rigid in how they work and require fixed kernels to detect those features and confirm an identity based on the precise positions, size, and shape in the acquired image of the person’s face. The CNNs have been improved in recent years by allowing deformable convolutions, but this still does not work well in complicated real-world environments such as crowds or other busy scenes.

The new GDA network approach could solve the problem by introducing a guiding mechanism that helps the system remain focused on the face itself, regardless of orientation or background noise. The key innovation here is the system’s ability to maintain its focus on the essential structure of a face even when there are distractions in the scene. The system, the researchers explain, knows what a face looks like and can remain locked on it. This is not dissimilar to the ability of many modern digital cameras to track a moving object, such as an animal, and to focus lock on to the animal’s eye for the best photograph.

The GDA first identifies the location of the face within an image using an affine matrix, a mathematical method that allows the system to rotate or scale the image to get a better understanding of where the face might be. The second step is to refine this detection process using those deformable convolutions. It does this in such a way as to remain locked on the face and not turns its digital attention to competing objects or noise in the acquired image.

Thus, in security surveillance, where faces in a crowd rarely present themselves in perfect profile, the system can home in on a chosen face, and accurately detect that face in the crowd for subsequent identification. The approach is not limited to security and law enforcement. It could be used in virtual reality and augmented reality, where users’ faces are often seen from different angles yet accurate face detection is important to creating an immersive, real-time experience for the user.

Deng, B. and Deng, G. (2024) ‘Rotation-invariant face detection with guided deformable attention’, Int. J. Information and Communication Technology, Vol. 25, No. 8, pp.32–48.