As we've seen on our last post, one
of the major challenges of Speech Recognition is Speaker Variability.
But there are
others...
And the fact that
there are still so many big challenges in this theme, is what drives our
interest for it. Speech Recognition is, on the one hand, a technology with huge
potential for growth due to its wide possibility of use and, on the other hand,
something about which that is a lot to learn and investigate.
So, what are these
other main challenges?
> Quality of
inputs devices (typically microphones): the microphone's too low or too high
sensitivity may origin problems to the software deciphering the message. A
microphone too sensitive, for example, may capture unintended sounds (other
people speaking in the same room, for example) and thus prejudice the whole
Speech Recognition process. Related to this, today makes sense to all of us the
possibility of controlling our TV by voice instructions. "incresase
volume", "next channel", "shut down TV" are basic
instructions that we can imagine ourselves giving to the TV in our living room.
But let's think: will the TV respond only to pre-programed
"instructors"? Will it then recognize our voice over some other noise
existing in the same division? Will everyone be able to give instructions? To
make it work, will the ASR device have to be in the control, rather than on the
TV set? These are all questions whose answers are still under a lot of
research.
> Meaning and
context of what is being said: although this topic has to do with the sound
itself, is not so much related to the variability of the voice, but more
related to the intended meaning of whoever is speaking. Words like
"there" and "their”, “whole" and "hole" and
"leave" and "live" are pronounced the same way but have
totally different meanings. No Speech Recognition technology is yet able to
take this step to identify one's meaning, and our research tells us that this
is a big step that won’t probably be taken soon.
To emphasize the
idea again, some of the people involved in studies in this field question if -
given that there are so many problems associated to it, all the research should
continue. The majority though see it like we do... the potential of Speech
Recognition Technology and the early stage where we still are, makes us believe
that this will be the solution to many problems and, therefore, research should
and will continue.
No comments:
Post a Comment