Today I was driving from DC, and on the way, I wondered why no one was doing lip-reading based speech recognition? Pulled over, sent myself a reminder by email from my phone. Now, back home, I googled it and found that the topic is fresh and the results are impressive. It is claimed that the neural network LipNet achieves a recognition accuracy of 93.4%. The videos demonstrate voice commands with the radio blasting, featuring various drivers. Another video about LipNet shows that even a set of short words is recognized fairly well, not just whole sentences. If it really works that well, one can expect this system to be an adjunct to speech recognition for improving accuracy. For those who have read this far – found an interesting detailed work on this subject: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.453.7204&rep=rep1&type=pdf
Interestingly, theoretically all these AI systems could be trained on the myriad of videos already available. Feed it YouTube with faces, and wait for the outcomes.
