Saturday, May 17, 2014

Publicizing our blog and learnings on the way .......


Hola!

This is our last blog and we wanted to sign off by sharing some of the techniques we have used to publicize our blog and its effectiveness in general. We also wanted to share some of our learning’s from this exercise in order to help students like us be more effective in promoting blogs for free. 

To be honest it was not an easy task to get people to our blog. We started off by using the most common social media platforms such as Facebook, Twitter and Google + to attract visitors. However, with a topic specific about a recent technological innovation, we found little interest for our blog. With so much content on the Internet awaiting people’s attention, it was hard to even get noticed from our friends and families.  Luckily, things improved half way through the term when we sought feedback from our Technology Professor and friends with previous experience with blogs. One message that consistently resonated across all feedbacks was the lack of a unified message in our content. While we focused mostly on diverse applications, we did not have a story connecting the posts or any structure that covered all aspects of the technology. So the message became clear to us – we needed to improve our content first and then focus on publicity.

We therefore took a step back and posted a message that attempted to connect our previous postings with a message around why people should care about this technology. Overall, this exercise renewed our motivations and we publicized the blog using direct communication channels such as email and word of mouth in addition to Facebook. We particularly targeted the influencers among our friends and families – individuals who spent lots of time on social media and had a big following as a result. Throughout the period we constantly monitored the free statistical analysis provided by Blogger. The direct marketing through email made it easier for people to view our blog on mobile devices such as iPad and iPhone.  Additionally, our international backgrounds helped us generate interest from across the world.



Lastly, we wanted to leave you with some of the techniques that we learnt along the way and which are particularly applicable to technology related blogs.

Ø    Having a rough idea of the broader story or picture you want to share with your blog is critical. Thinking about this right from the beginning can make it easier for your publicity campaign. Connecting the technology with a story can also attract a broader audience base that is less familiar with technical jargons and terminology. We learnt this later but it made a huge difference to our page visits!

Ø    While everyone uses Twitter, Facebook, Google+ and LinkedIn to promote blogs for their popularity, we believed that in the future we could get better attention on smaller but lesser-known networks such as Quora.com and Empire Avenue. This becomes even more relevant for a niche blog like ours seeking out focused audiences interested in emerging technologies.

Ø    Although it’s a known fact that adding graphics, photos and illustrations improves site visits, we wanted to emphasize this one as our posts with pictures definitely got more clicks that the ones without.

Ø    Finally, an interesting way to publicize a well-written post is to directly post the link to that specific post instead of providing a link to the blog home page. This technique of sharing the link of your best post will not only maximize the chances of the audience reading the entire post, but also the positive experience can increase the likelihood of other posts being read in the blog. We could measure the effect of this easily as the post we shared emerged as the one with the highest number of views and helped us increase the overall number of page views for our blog.  

Gracias y Saludos,


N2 Team B

Speech Recognition Opportunities

Thanks for always coming back to our blog!

Speech Recognition is a subject we are very passionate about and want to spread the importance and news of it through our blog.
We have been discussing the challenges that Speech Recognition has been facing in many applications. However true and applicable those challenges may be, ongoing research is being conducted to solve these issues.

I was reading a very interesting research [i] the other day about some of the opportunities and challenges in Speech Recognition and wanted to share with you some interesting facts.

How can Automatic Speech Recognition be improved? (ASR)
Three main challenges have been identified: accuracy, throughput and latency.

1.   In order to improve accuracy, the application needs to account for noisy environments in which current systems don’t perform well. This will increase the efficiency of the technology.
In many circumstances, speech recognition lacks recognition accuracy. This is mainly due to disturbing noises or variability speakers.
A past approach to effectively deal with this issue is the so called multi stream approach which incorporates multiple features of sets that help to improve performance for both small and large ASR tasks.
However, a more recent approach is to generate many feature streams with different spectro-temporal properties. The reason for that is that some streams might be more sensitive to speeches that vary at a slower rate and others might vary at a higher rate.

2.    In order to improve throughput, the application should allow batch processing of the speech recognition task to execute as efficiently as possible which will therefore increase the utility for multimedia search.
More recently, a data-parallel automatic speech recognition inference engine was implemented on the graphics processing unit achieving a higher speed. With substantially lower overhead costs the solution promises a better throughput.

3.    In order to improve latency, the next step would be to allow speech-based applications such as speech-to-speech translation to achieve real time performance.
The main issue with latency is to recognize “who is speaking when” which is a process called “speaker diarization”.
A current approach to online diarization consisted of a training step and an online recognition step. Basically the first 1000 seconds of the input are taken and performed offline speaker diarization. Then speaker models are trained and a speech/non-speech model are taken from the output of the system.

Further research is being conducted to improve these challenges regarding ASR. I strongly believe that there will come a day when this technology will be flawless. Until then, keep visiting our blog for more news and updates on Speech Recognition!





[i] Makhijani R, Shrawankar U, Thakare V, “Opportunities and Challenges in Automatic Speech Recognition” 

Friday, May 16, 2014

Speech recognition challenges: Effects of daily use on the economy

The major challenges of speech recognition we've covered so far are the different dialects and speech variability in user as well as the low quality of input devices for speech.   weren't But what if These problems?   speech recognition Would be ready for everyday use ?

A few more issues Arise When These problems are corrected.   First is how to Distinguish Between the speaker and background noise.   How can the program know to translate your speech instead of the guy behind you in line at Starbucks?   Unless the device is tucked nicely next to your mouth, how does it know who to translate?

The other issue, and May be more important, is how many jobs would be lost if the software can be perfected?   Can you think of any Jobs that rely on translating speech to text? 

Pretty much any type of reporter Could Potentially be eliminated, such as a court reporter. 

It could take over machines drive-thru or for that matter any servers where it's not important to the business (fast-food).   The only employees needed would be the cooks.

Customer service employees would be obsolete.   Even though this is the trend now anyways, it Could Become completely useless in the future.


In essence, although speech recognition That is an amazing technology has many different uses, we can not completely ignore the negative effects it May have on people or the economy as a whole.   We Should keep in mind These next time we use the Siri!

Wednesday, May 14, 2014

Speech Recognition - Other challenges

As we've seen on our last post, one of the major challenges of Speech Recognition is Speaker Variability.
But there are others...

And the fact that there are still so many big challenges in this theme, is what drives our interest for it. Speech Recognition is, on the one hand, a technology with huge potential for growth due to its wide possibility of use and, on the other hand, something about which that is a lot to learn and investigate.

So, what are these other main challenges?

> Quality of inputs devices (typically microphones): the microphone's too low or too high sensitivity may origin problems to the software deciphering the message. A microphone too sensitive, for example, may capture unintended sounds (other people speaking in the same room, for example) and thus prejudice the whole Speech Recognition process. Related to this, today makes sense to all of us the possibility of controlling our TV by voice instructions. "incresase volume", "next channel", "shut down TV" are basic instructions that we can imagine ourselves giving to the TV in our living room. But let's think: will the TV respond only to pre-programed "instructors"? Will it then recognize our voice over some other noise existing in the same division? Will everyone be able to give instructions? To make it work, will the ASR device have to be in the control, rather than on the TV set? These are all questions whose answers are still under a lot of research.

> Meaning and context of what is being said: although this topic has to do with the sound itself, is not so much related to the variability of the voice, but more related to the intended meaning of whoever is speaking. Words like "there" and "their”, “whole" and "hole" and "leave" and "live" are pronounced the same way but have totally different meanings. No Speech Recognition technology is yet able to take this step to identify one's meaning, and our research tells us that this is a big step that won’t probably be taken soon.


To emphasize the idea again, some of the people involved in studies in this field question if - given that there are so many problems associated to it, all the research should continue. The majority though see it like we do... the potential of Speech Recognition Technology and the early stage where we still are, makes us believe that this will be the solution to many problems and, therefore, research should and will continue.

Sunday, May 4, 2014

Speaker Variability – one of the biggest challenges to Speech Recognition

All speakers have their special voices, due to their unique physical body and personality. The same factors discussed below that make human speech unique become a challenge for Automatic Speech Recognition (ASR) to work effectively.

Realization
The realization of speech changes over time. Even if the speaker tries to sound exactly the same, there will always be some small dierences in the acoustic wave we produce.

Speaking style
Speaking is a way of expressing our personality and we communicate our emotions via speech. We speak differently when we are happy, sad, frustrated, stressed, disappointed, or defensive. Our speaking styles also vary in different situations and depending on whether we are speaking with our parents, or with our friends.
The sex and age of the speaker
Men and women have different voices, and the main reason to this is that women have in general shorter vocal tract than men. Likewise, the anatomy of the vocal tract changes over time depending on the health or the age of the speaker.
Speed of speech
We speak in different modes of speed, at different times. If we are stressed, we tend to speak faster, and if we are tired, the speed tends to decrease. We also speak in different speeds if we talk about something known or something unknown.

Regional and social dialects
Regional dialects involve features of pronunciation, vocabulary and grammars, which differ according to the geographical area the speaker, come from. Social dialects are distinguished by features of pronunciation, vocabulary and grammar according to the social group of the speaker.


The long list of variations does not mean that we give up on ASR. It may seem quite unlikely that we will ever succeed to do perfect ASR, but there is definitely potential for improvement. One thing that we can consider is if humans should speak differently to computers. For instance, we could strive to be unambiguous and speak in a hypercorrect style to get the computer to understand us perfectly. Although this could simplify ASR, not all variations discussed above can be addressed. Our goal with ASR should therefore not be to have ’natural’ verbal communication with machines but rather seek efficient user interfaces.