Speaker
Variability – one of the biggest challenges to Speech Recognition
All
speakers have their special voices, due to their unique physical body and personality.
The same factors discussed below that make human speech unique become a
challenge for Automatic Speech Recognition (ASR) to work effectively.
Realization
The
realization of speech changes over time. Even if the speaker tries to sound exactly
the same, there will always be some small differences
in the acoustic wave we produce.
Speaking
style
Speaking is a way of expressing our personality
and we communicate our emotions via speech. We speak differently when we are
happy, sad, frustrated, stressed, disappointed, or defensive. Our speaking styles
also vary in different situations and depending on whether we are speaking with
our parents, or with our friends.
The
sex and age of the speaker
Men and women have different voices, and the
main reason to this is that women have in general shorter vocal tract than men.
Likewise, the anatomy of the vocal tract changes over time depending on the
health or the age of the speaker.
Speed of speech
We
speak in different modes of speed, at different times. If we are stressed, we
tend to speak faster, and if we are tired, the speed tends to decrease. We also
speak in different speeds if we talk about something known or something
unknown.
Regional and social dialects
Regional
dialects involve features of pronunciation, vocabulary and grammars, which
differ according to the geographical area the speaker, come from. Social
dialects are distinguished by features of pronunciation, vocabulary and grammar
according to the social group of the speaker.
The long
list of variations does not mean that we give up on ASR. It may seem quite
unlikely that we will ever succeed to do perfect ASR, but there is definitely potential
for improvement. One thing that we can consider is if humans should speak
differently to computers. For instance, we could strive to be unambiguous and
speak in a hypercorrect style to get the computer to understand us perfectly. Although
this could simplify ASR, not all variations discussed above can be addressed. Our
goal with ASR should therefore not be to have ’natural’ verbal communication
with machines but rather seek efficient user interfaces.
No comments:
Post a Comment