A new approach to searching through video content has been developed by a team in South Korea. The system, described in the International Journal of Computational Vision and Robotics, extracts spoken word from a video recording, converts it to text, and then makes that text searchable. Importantly, the system thus does not rely on embedded keywords nor curated tags or hashtags to be associated with the video content.
The approach obviously relies on the dialogue or spoken commentary of an item being associated with the scenes in the video that users might wish to search. It is, of course, superfluous if the video has subtitles already baked in. Nevertheless, it will be a boon for users wishing to search the millions of hours of video available in databases, on streaming services, and elsewhere on the internet and could be used to help catalogue videos.
Kitae Hwang, In Hwan Jung, and Jae Moon Lee of the School of Computer Engineering at Hansung University in Seoul, have developed an Android app for use with appropriate smartphones. It is worth noting, however, that there is at least one other app with the same name, so should this app be made available in the Google Play Store for Android apps, it is likely to require a change of name.
The new app works by extracting audio from videos using the FFmpeg code and converting it into text in ten-second increments. This, the team explains, creates a searchable timeline for the video. Advanced speech recognition technology then generates a transcription of those audio segments, which are indexed on the video timeline. For a 20-minute video, the process is complete in just two to three minutes and runs in the background while the video plays. The team points out that users can then search for specific terms and find all mentions in the video.
The app will have applications in education, news analysis, and other information-dense video where quick access to specific information is needed. For instance, students reviewing lecture recordings or journalists searching for specific statements in interviews could make use of this app. There are many more scenarios where it would be useful to be able to search video in this manner.
Hwang, K., Jung, I.H. and Lee, J.M. (2024) ‘An implementation of searchable video player’, Int. J. Computational Vision and Robotics, Vol. 14, No. 3, pp.325–337.