From the very first time I saw the first Ironman movie, I was really inspired by the potential for speech recognition software. We humans don’t communicate with each other using keyboards. We communicate using our voices. Why then, do we depend on keyboards and mice to interact with our computers? When I saw “Jarvis” in Ironman, I saw the very real potential that speech recognition software has to revolutionize the way we interact with technology. If you speak clearly and quickly, you can talk your computer far, far faster than you could ever possibly type. Because of this, if you care about productivity at all, the future of speech recognition technology should really matter to you. The ridiculous part of all this is that we have had the technology for speech recognition for nearly three decades. However, the companies that decide which technologies get produced have decided not to use speech recognition as a primary means of input, but to use the keyboard and mouse instead.
Recently, I watched a video from TED of David Pogue talking about technology and how people who use technology on a day-to-day basis tend to find faster and more efficient ways to use the technology than their “average person” counterparts. Pogue demonstrated that he was using speech recognition technology on his Mac to write nearly all the articles for his New York Times column. He also demonstrated that he was using this same technology to write nearly all of his e-mails, and to greatly increase his productivity while using his Mac. What he demonstrated was that a well calibrated speech recognition system can recognize nearly anything that comes out of your mouth, in real-time. This is absolutely awe-inspiring to see in practice. When combined with text expansion as seen in TextExpander, the technology really starts to take off. For example, he would say, “Thank You,” and his Mac would write an entire letter explaining how grateful he was to a reader for taking the time to write to him, and how much he valued their response. This enabled him to go through hundreds of e-mails a day in a small fraction of the time it would have taken him to type them. Additionally, he was using the dictation technology to control the Mac itself.
Before we get started, let me first get the following out of the way: Apple does have a speech recognition technology now included with Mac OS X, but it is incredibly limited and inaccurate in practice. Of course, you already knew that. That’s why you probably used it for about the first day after you bought your Mac, and then ignored it from that point forward. This article focuses on technology that is far, far beyond the scope of what Apple’s offerings can achieve.
After being impressed by all of these things David Pogue was doing, I decided to do some research myself. It turns out that Nuance Software is the company responsible for nearly all speech recognition technology advancements on the Mac platform. They are the company responsible for Mac Speech and Dragon Dictate, as well as a plethora of older speech and dictation applications for Windows.
If you want to get started quickly with speech recognition technology, I highly recommend that you download the Dragon Dictate app for your iPhone or iPod right now. It works incredibly well, and will introduce you to the concepts involved here. I believe the entire reason that Nuance software reduced this free app for the iOS platform is to get you hooked on this technology. It works. Dragon Dictate is really that good.
If you want to try the software on your Mac, an instant download sells for $179.99 on Nuance.com. They also have specific offerings for various industries, including medical, legal, technical, and others. These software packages allow you to use very specific and detailed vocabulary for these industries. This makes Dragon Dictate an incredibly useful tool for people who have to do a tremendous amount of typing in their day-to-day work. This obviously includes secretaries, writers, teachers, and many other people. Nuance even makes a package specifically for graphic designers, with commands dedicated to controlling the Adobe Creative Suite applications.
I write for several blogs, which means that I spend a terribly large amount of time typing. Once I started playing with the Dragon Dictate app on the iPad, I realized how much time I could save writing articles if I simply dictated them instead of typing them. On my first few projects, I saved about three hours of time in a single day. Time is money, as they say, so the idea really caught on.
Now for the details. Dragon Dictate comes with a language pack that you will need to install when you first set up the app. This package consumes about a 1.5 to 2 gigabytes of space on your hard drive. Once installed, you will not need the disk from that point forward. The next thing they will ask you to do is to calibrate the app. This involves speaking into your microphone, and reading a story to the computer. The app then decides how to translate your voice into text—based on the way you sounded in real life, usually before you even finish the story. It can detect how your voice works, the speed at which you speak, and any particular vocabulary words you use often in speaking. It does this by learning as you use it, and becoming more and more intelligent every day. If you want to add more vocabulary words to the app, you can do this by using a built-in vocabulary introduction tool.
I found that after about ten minutes of instruction, the app was able to flawlessly translate nearly everything I threw at it. I’m not talking about speaking slowly and enunciating every single word. I’m talking about talking as fast as possible, as though I were speaking to some friends in an enthusiastic conversation. Because of this, you are effectively able to talk to your computer, and depend on it to type everything that you say in real time. This is incredible. The app is even smart enough to learn what your voice sounds like and tune out the voices of other people in the room and ignore them. It can also ignore ambient sound, such as wind and vehicle noise. Don’t have a headset? No problem. The app worked incredibly well in my testing using only the Mac’s internal microphone. Of course, if you have a headset, I’d recommend using it for even better transcription. That said, I find no reason whatsoever to believe that a headset is a necessity to use this app productively.
When the app is in use, it places an icon in your menu bar that indicates whether it is listening, and allows you to toggle the microphone’s listening state on and off. You can also use a simple keyboard shortcut to do this. The transcription system works in all text fields of the operating system. This is regardless of which application you are using, so it works absolutely everywhere.
Dragon Dictate is not limited to merely transcribing your words to text. It can also be used to control your entire Mac. For example, you can tell it to open an application, open a new window, and do a search in a text field. You can even tell it to type keystrokes for you, such as the “Enter” key. Additionally, it can be used to enter menu commands and respond to dialog boxes. You can also do things with the dock, such as hiding applications, minimizing windows, maximizing Windows, and closing them. Additionally, you can quit applications, hide background applications, and do other popular operating system actions. It even supports things like turning dock hiding on and off, in addition to all Exposé functionality. It really is a modern-day “Jarvis.”
I’m not going to pretend that Dragon Dictate isn’t an expensive application. It is. However, time is money, and the more time you can save the better. Ultimately, if you value your time you should value this application. I find it worth every penny, and could not possibly recommend it more.
Before I close, there’s one last interesting bit of news that I would like to mention here. Rumor has it that Apple is actually working right now with Nuance Software to include these technologies in future versions of iOS and Mac OS X. Quite frankly, it’s about time.
By the way, I dictated this entire article with Dragon Dictate on my MacBook Air.
Photo Credit: Nuance Software