About this project
MOVI stands for My Own Voice Interface! It is the first standalone speech recognizer and voice synthesizer for Arduino with full English sentence capability:
- Up to 200 customizable English sentences.
- Speaker independent
- Standalone, cloudless and private
- Very easy to program
MOVI provides an alternative to buttons, remote controls, or cell phones by letting you use full-sentence voice commands for tasks such as turning devices on and off, entering alarm codes, and carrying on programmed conversations with projects.
MOVI is plug and play! Connect the shield to your Arduino Uno or compatible board, connect an optional speaker, and you’re ready to go.
By supporting us on Kickstarter you will enable us to merge the multi-board prototype into a single shield and make it available to the Maker community.
MOVI IS LOADED WITH FEATURES AND HAS YOUR PRIVACY IN MIND
- MOVI can recognize up to 200 English sentences of your choice. You can be as specific as “Turn on the ceiling light in the bedroom” or as generic as “Light on.”
- MOVI is speaker independent. MOVI will respond in the same way to the same sentences, no matter who is saying them.
- MOVI has a programmable call sign. We use a call sign to catch MOVI’s attention to minimize false triggering. The default call sign is “MOVI,” but you can change it to “Computer”, “Hello” or anything you can think of! You can also turn it off and MOVI can just listen to any sound.
- MOVI is cloudless and does not require Internet connection or an external PC. All of the processing is done on its 1GHz ARM-based processor. This makes speech recognition accessible and private as we do not connect to the Internet.
- MOVI is a true Arduino shield. MOVI is stackable, and all connections come from the header pins.
- MOVI has a speech synthesizer. In addition to sentences, Makers can interact naturally with their projects by building full dialogs using the built-in male or female voice synthesizer.
- MOVI has a simple and versatile programming interface. Even a novice programmer can easily set up MOVI to perform specific actions in response to sentences. Adding speech recognition to a project requires fewer than 10 lines of code.
The following video is a 5-minute tutorial on how to program the shield:
- MOVI has an internal 2GB dictionary. Users can program MOVI to recognize virtually any complete English sentence as well as build their own dialog systems to change vocabulary. No voice training required!
- MOVI is hands free. In a quiet environment, you can talk to MOVI up to 12 feet away. Alternatively, you can connect a headset microphone when using it in a noisy room.
WHY WE ARE DIFFERENT
There are other shields that do speech recognition, but:
- They are limited to single words.
- They are speaker dependent and require you to record your own voice.
- They don’t have a call-sign to prevent false triggering.
- They only work in close proximity.
- Most do not have a speech synthesizer.
MOVI is doing away with all these limitations by offering a true state of the art speech recognizer that can be trained to recognize hundreds of English sentences.
MOVI IN THE NEWS
We started this project a year ago when challenged by a colleague that state of the art speech recognition could only be done in the cloud… Thanks to a combination of Moore’s law and over 15 years of expertise in audio processing, we managed to fit a very high quality system on a credit card-sized board - our initial prototype. We quickly realized that by integrating it on an Arduino shield, we could make our work easily accessible to the Maker community.
SOME PROJECT EXAMPLES... (updated July 14, 2015)
FIRST ROBOTICS APPLICATION!
We teamed up with the folks at Origami Robotics to add voice control to Romibo! In less than 3 hours start to finish, the team added the MOVI prototype to the Romibo hardware, programmed the voice command and corresponding moves by the robot and changed its call sign to "ROMIBO". Worked like a charm! You can see the result in this video!
THE ALARM KEYPAD
This video shows how to replace a typical digit-based code entry keypad (e.g. for a home alarm) with speech. To ‘unlock the door’, MOVI prompts for a password. The call sign in this example is “computer”.
HUNT THE WUMPUS
In this next video we implemented a voice interface for “Hunt the Wumpus”. This old-school video game is part of an Arduino tutorial project with an LCD screen and buttons. In the game, one maneuvers through caves to hunt down a monster called the Wumpus. More information on the original here. We replaced the LCD screen and buttons, with a speech-based interface using the call sign “hunter”.
The following videos demonstrates the dialog functionality of MOVI and shows how complex sentences can be recognized. This is our take on the famous Eliza program.
MOVI is powered by an Allwinner A13 CPU running Debian Linux on an SD Card. MOVI saves sentence data, call sign, and configuration so that settings are not lost when you turn off the power. It can be used either with its onboard electret microphone or an external headset microphone. MOVI has a 32 ohm audio output jack for the speech synthesizer and the optional feedback beeps. Its volume can be regulated in software. MOVI is powered through the Arduino RAW power pin and DOES NOT need an external power supply. However, the Arduino should be plugged to a power supply as most USB ports are not able to supply enough current. MOVI communicates via digital pin 10 and 11 to the Arduino, but these pins can be changed with jumper cables and a software setting.
Training: MOVI’s Arduino API sends the training sentences in textual form over the serial connection to the shield. The shield phonetizes sentences using a 2GB dictionary. The phoneme sequences are used to create a temporal model that assigns higher probabilities to phonemes sequences that occurred in the trained sentences than to those that didn’t.
Recognition: During recognition, a waveform comes in over the microphone and is broken down into speech and non-speech regions. The speech regions are passed to a classifier that has been trained on hundreds of adult speakers. It breaks down the waveform into possible phonemes sequences. Using the temporal model created in training, the phoneme sequences with higher probability are favored so that only words that are part of the training sentences can be recognized. A second correction step maps the words to the most likely sentence in the training set. For password recognition, this second step can be omitted in the API.
For more information on how a state-of-the-art speech recognizer works, check out “Multimedia Computing”, Chapter 18, (Cambridge University Press) or some of the scientific publications co-authored by one of the makers!
Risks and challenges
There are always risks with hardware projects.
Integration: By building a fully functional prototype ahead of time, we have reduced most of the risk associated with the hardware. The MOVI prototype is fully functional, and all features have been implemented. Converting the prototype to a single shield is a fairly straightforward effort. We are using a vendor that we have previously worked with and that has done similar work for us. As both Gerald and Bertrand have strong hardware and software backgrounds, we will verify every step of the design. We will initially build 4 pre-production boards on which we will conduct our final testing and verification. Only then will we commit to the initial production.
Manufacturing and Quality: Bertrand's 25 years of expertise in the hardware industry will ensure that our production meets only the highest standard of quality. We are using a very reliable vendor that has several years of experience building Arduino, Raspberry PI and other single-board computers in volume. We have successfully worked with them on other projects in the past.Learn about accountability on Kickstarter
Support this project
- (40 days)