Speech
Recognition is the process in which words of a speaker will be automatically recognized
as text or some predefined instruction or code based upon the information
included in individual speech waves. A robust speech-recognition system combines accuracy of speech identification with
the ability to filter out noise and adapt to other acoustic conditions, such as
the speaker’s speech rate and accent. Speech-recognition technology is nowadays
embedded in voice-activated routing systems at customer call centers, voice dialing on mobile phones, transcription
(voice to text), managing stuff
(creating voice commands),web search, GPS navigation, vending machines,
smart homes and many other everyday applications.
ASR System can be:
Speaker dependent, Speaker independent, Isolated Word, Limited Vocabulary,
Continuous Speech, Unlimited Vocabulary.
Products Used
include:
■ MATLAB©
■ Data
Acquisition Toolbox™
■ Statistics
Toolbox™
■ ASR System Overview:
The basic workflow is
demonstrated considering an isolated; speaker dependent digit recognition
system. It comprise of three steps:
■ Speech acquisition
For training, speech is acquired from a
microphone and brought into the development environment for offline analysis.
For testing, speech is continuously streamed into the environment for online
processing. Data Acquisition Toolbox™ is used to set up continuous acquisition
of the speech signal and for simultaneous extraction of frames of data for
processing. Speech processing includes: Pre-emphasis (Flatten the magnitude spectrum), Frame
Blocking (Speech is short term predictable), Windowing (Remove the
discontinuities at the beginning and the end of each frame).
■
Speech analysis
Developing a Speech-Detection Algorithm : The
speech-detection algorithm is developed by processing the prerecorded speech frame
by frame within a simple loop.
Developing the Acoustic Model : A good acoustic model should
be derived from speech characteristics that will enable the system to
distinguish between the different words in the dictionary.
■
User interface
development
After developing the isolated digit recognition system in an
offline environment with pre-recorded speech, we migrate the system to operate
on streaming speech from a microphone input. We use MATLAB GUIDE tools to
create an interface that displays the time domain plot of each detected word as
well as the classified digit (Figure1).
Speech Recognition |
Author - Sushant shama
(Research
Associate at Sillicon Mentor)
No comments:
Post a Comment