Speech to text vs. keyboards: Will any computer languages die?

I recently witnessed discussions of new keyboards that provide no tactile feedback, and are potentially rough on the finger joints. Keyboards of this kind have been proposed, and in some cases, manufactured, for some time, but there is no doubt that even Apple with its former focus on usability, is now succumbing to making the slimmest devices they can, no matter the cost to ergonomics. In essence, the keyboard is slowly walking out the door, in spite of previous predictions that most input into computers would remain keyboard-driven for the next ten to fifteen years. What are the alternatives?

The obvious answer is, speech to text, but while add-on packages for medical terms or for various other industries are available for some speech to text systems, I’ve yet to see programs being written by STT. My main gripe here is that many computer languages contain characters that are difficult to dictate because their pronunciation is not unique, and one or two words need to be said to dictate just one character (e.g. “semi-colon” or “open parentheses”). Granted, verbal shortcuts could be used in some cases, e.g. “O P” for “open parentheses”.

Nonetheless, I am left wondering whether among the myriad programming languages, many of whom are very similar to each other, those that do not require characters other than alphabetic and numerical ones (except for containing strings, which may be a harder problem otherwise) will fare better than those that have copious amounts, such as Perl, where every variable name is prefixed with a punctuation character of some sort, or sometimes two, and every instruction needs to be followed with a semi-colon (usually at the end of a line). Being “white-space agnostic” comes at a price.

There are other areas where STT may have difficulty making inroads, including customer service, where the ability of the human operator to speak to the customer is more important than obsoleting the keyboard. It’s possible to imagine a STT enabled software that listens in to the conversation and takes down customer data autonomously. Such a system would need to have a tiny error rate, however.

And it’s still unclear to me whether:

  1. people could be equally productive using STT as they can using keyboards, especially programmers;
  2. you could use your voice as continuously through the day as you can with a keyboard; and
  3. people who have been using keyboards for a long time can be retrained to now use STT.

So I think there’s a lot of work remaining to be done before STT can be widely used, and I’ll personally be using proper, ergonomic keyboards for some time yet.

The OS X speech-to-text myth

There is a widespread myth in the Mac community that Mac OS (yes, not just OS X) has included “speech recognition” for many years. I would argue that through well-publicised Jobs keynotes, in-store lecture theatres, many fansites with documentation, mostly in the form of two-paragraph “tips”, and, more recently, instructional videos on the Apple website, user knowledge of OS X is much better than user knowledge among Windows users. How is it, then, that very few Mac users actually use “speech recognition” (my claim)?

You will find that historically, speech recognition has been synonymous with “speech to text” (which the Wikipedia article still redirects from: speech to text). During the sometimes claimed twenty years that OS X has included “speech recognition”, third party applications such as iListen and ViaVoice for Mac have continued to sell. So is this an anomaly of history, where Mac customers have for years continued to buy third party software for functionality that was actually included in their OS out of the box? No, something perhaps more perfidious. There has been a semantic shift, where “speech recognition” for Mac users has become identical with “Speakable items“, a feature of Mac OS introduced as part of the OS in March 1994, although available from 1993 as a stand-alone program called PlainTalk. Speakable items includes phrases that allow you to navigate windows and certain programs; it also lets you define your own phrases which you can associate, for instance, with Automator scripts. I’ll reiterate again: PlainTalk and Speakable Items are not speech recognition! At best, it might be called phrase recognition, and its 1993 release date is very little to show for “20 years of history”.

Finally, as of this writing, speech to text in Tiger can neither be found in the System Settings, nor in the Services menu. Since it hasn’t been mentioned in any of the keynotes preceding Leopard, I doubt it will suddenly appear. (Remember the “top secret features”? Where are they?) If you wish to prove me wrong and demonstrate that scores of Mac users have been morons to buy third party software that did real speech recognition, and that purported experts have been ignorant, please post a reply!

That failing, I have to conclude that a certain gadget website (to be punished with a non-link) has been quite unfair in its recent comparison of Mac OS X 10.5 and Windows Vista, which ignores Vista’s true speech recognition.