Sunday, May 4, 2014

Speaker Variability – one of the biggest challenges to Speech Recognition

All speakers have their special voices, due to their unique physical body and personality. The same factors discussed below that make human speech unique become a challenge for Automatic Speech Recognition (ASR) to work effectively.

Realization
The realization of speech changes over time. Even if the speaker tries to sound exactly the same, there will always be some small dierences in the acoustic wave we produce.

Speaking style
Speaking is a way of expressing our personality and we communicate our emotions via speech. We speak differently when we are happy, sad, frustrated, stressed, disappointed, or defensive. Our speaking styles also vary in different situations and depending on whether we are speaking with our parents, or with our friends.
The sex and age of the speaker
Men and women have different voices, and the main reason to this is that women have in general shorter vocal tract than men. Likewise, the anatomy of the vocal tract changes over time depending on the health or the age of the speaker.
Speed of speech
We speak in different modes of speed, at different times. If we are stressed, we tend to speak faster, and if we are tired, the speed tends to decrease. We also speak in different speeds if we talk about something known or something unknown.

Regional and social dialects
Regional dialects involve features of pronunciation, vocabulary and grammars, which differ according to the geographical area the speaker, come from. Social dialects are distinguished by features of pronunciation, vocabulary and grammar according to the social group of the speaker.


The long list of variations does not mean that we give up on ASR. It may seem quite unlikely that we will ever succeed to do perfect ASR, but there is definitely potential for improvement. One thing that we can consider is if humans should speak differently to computers. For instance, we could strive to be unambiguous and speak in a hypercorrect style to get the computer to understand us perfectly. Although this could simplify ASR, not all variations discussed above can be addressed. Our goal with ASR should therefore not be to have ’natural’ verbal communication with machines but rather seek efficient user interfaces.



No comments:

Post a Comment