Not too long ago, in the time when humans were the dominant life form on our planet, our ancestors set about to try to communicate with those primitive organisms. Soon it was found that humans did not speak any form of Electric; instead, they used their food-processing organs to produce various acoustic events, or "noises", with which they could exchange crude messages. This is of course a frightening and revolting notion to us, but our brave electronic and electro-mechanical forebears determined that there was no alternative to the emulation of this acoustic mode of conversation for human-machine interaction. "At least we didn't have to eat anything or kiss anyone," one of these pioneers later stated in his memoirs.
R. SnorbKaleX.
The technique used to convert digitally stored information to sounds that are understandable for humans is called speech synthesis, colloquially referred to as "text-to-speech". It involves the subdividing of existing words into small blocks, each of them representing an identifiable single sound, and their consecutive replaying through a simulation of the human vocal tract using programmable filters. Alternatively, prerecorded samples of human speech can be recombined and played back to construct other words.
To illustrate the advances in speaking humanese in those early days, some popular voice synthesizers intended for use with personal computers have been tried. For your entertainment, these will each recite the first part of "Kubla Khan" by S.T. Coleridge (1772-1834):
In Xanadu did Kubla Khan
A stately pleasure-dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
So twice five miles of fertile ground
With walls and towers were girdled round;
And there were gardens bright with sinuous rills,
Where blossomed many an incense-bearing tree;
And here were forests ancient as the hills,
Enfolding sunny spots of greenery.
Currah Speech 64 cartridge for Commodore 64 (1984)
Several text to speech utilities appeared for the Commodore 64 during its lifetime, such as S.A.M. (Software Automatic Mouth), the Magic Voice Speech Module, and Speech! by Superior Software. Here we have Speech 64 by Currah Computor [sic!] Components Ltd., consisting of a cartridge based on the General Instrument SP0256 chip. The latter was used in a number of popular micros of the era, the Sinclair ZX Spectrum and the Tandy TRS-80 for example.
In the case of the C64, the signal generated by the SP0256 is connected to the analog input of C64's SID chip, and is internally passed through to its output. The firmware in the cartridge provides the C64 BASIC programming language with a number of special commands to control the speech synthesizer. Speech is invoked using the new "SAY" command, followed by the desired text expressed in plain language. The somewhat erratic (some would say "absent") pronunciation rules in the English language may lead to unexpected results however. When necessary, pronunciation can be controlled using alternative spelling, and by employing the 58 built-in allophones. After some experimentation the input text for Kubla Khan on the Currah Speech 64 becomes:
in xana do did koobla khan
a state lee plehsure dome decree[...]
where alf, the saycred river ran.
throo kevverns mehsur less to man.
down to a sunless sea[...]
so twice 5 ma yuls of fertile ground.
with walls and towers were gur duld round.
and there where gardens bright with sin u us rills.
where blossomd many an incence behring tree[...]
and here where forrests ayn shunt as the hills.
enfolding suny spots of greeneree.
(The square brackets are used to invoke allophones, but here they are only employed to control pauses between words.).
Speech! for Acorn BBC Micro by Superior Software (1985)
One of the design features of the Acorn BBC Micro computer was the optional text to speech capability. This materialized in 1983 as the Acorn Speech Synthesiser Upgrade and consisted of the popular Texas Instruments TMS5220 LPC speech chip and a TMS6100 "phrase ROM", to be installed in designated sockets on the BBC Micro motherboard. The ROM contained a dictionary of some 260 words and "word parts" sampled from a BBC news reader that could directly be sounded by the user. However, if the user wanted to use words that were not in the dictionary and could not be made by concatenating the built-in words and word parts, he needed to encode the required phonemes himself - a complicated task that included programming a lattice filter.
This hardware solution was effectively obviated by the appearance in 1985 of Speech!, marketed by Superior Software. This low-cost and much more accessible software-based approach that fits in only 7.5 kB of RAM gives the user 49 phonemes derived from recorded samples with which to compile words in the English language. The phonemes are entered as normal text preceded by the "*SAY" command that Speech! adds to BBC Basic, where the many exceptions in English pronunciation rules can be adapted to by judicious use of alternative phonemes. Below again Kubla Khan, rewritten for Speech! on the BBC Micro:
in xa na doo did koobla khan
a state lee pleh sure dome ducree
where alf, the saycred riv er ran
throo keverns mesure less to man
dawn to a sunless sea.
so twice five miles of fertile grawnd
with walls and towers wuhre gurduld rawnd
and there where gardens bright with sinnyou us rills
where blossomd many an incence behring tree.
and here where forests aynchent as the hills
enfolding sunny spots of greenuhry.
This sounds very mechanical, don't you think? Accompanying subtitles would be no excessive luxury in this case. Compared to Currah Speech on the Commodore 64, Speech! seems to be able to parse written language somewhat better though. Also, it seems to take note of punctuation.
Commodore Amiga Workbench 1.3 / SoftVoice (1985)
An earlier text to speech tool called S.A.M. (for Software Automatic Mouth), that had been developed and published for Apple II, Commodore 64, and Atari 8-bit machines eventually found its way to more modern machines, the Apple Macintosh and the Commodore Amiga. Commodore licensed the software from SoftVoice and included it in Workbench, the Amiga's operating system. This feature remained with the OS from its first version (1985) up to and including at least version 2.05 (1990).
Invoking the Amiga's voice is straightforward; typing "SAY" on the command line followed by a phrase, with modifiers to choose from a number of voice variations (male/female/robotic). It is scriptable as well, and can read from a separate input file. Next to the command-line tool, the OS also came with an interactive version found among the GUI utilities that shows the phonemes to which SoftVoice converts the text the user enters (see screenshot on the right).
The software seems to understand English prununciation rules better than its contemporaries above so that much less editing is required; the input text is nearly the same as the original. A recurring exception is the name "Kubla". Not being a frequently used word in the English-speaking world, there is no exception for it in the software's database and we are stuck with writing "koobla" instead.
in xanadu did koobla khan
a stately pleasure dome decree;
where elf, the saycred rivver ran,
through caverns measureless to man,
down to a sunless sea.
so twice five miles of fertile ground,
with walls and tohwers were girdled round.
and there were gardens bright with sinuous rills,
where blossomed many an incense behring tree.
and here were forests ayncient as the hills,
enfolding sunny spots of greenery.
Probably owing to the advanced chipset of the Amiga and the larger quantity of available RAM this speech synthesizer sounds much better that Speech! for the BBC Micro for example.
OSX 10.4 Tiger (2005)
Skipping ahead some years, Apple's Mac OSX operating system also came with a built-in speech synthesizer, known as MacinTalk, as is part of the accessibility features in the OS. It can be invoked from the command-line, again using a "SAY" command, but it is also available as an accessibility aid in many application throughout the OS. The user can select one from a range of distinct male and female voices; in this case the Victoria persona is speaking.
The text conversion capabilities still require some guidance here and there:
in xannahdu did kooblah khan
a stately pleasure dome decree.
Where alf, the sacred river ran,
Through caverns measureless to man.
Down to a sunliss sea.
So twice five miles of fertile ground
With walls and towers were girdled round;
And there were gardens bright with sinuous rills
Where blossomed many an incense-bearing tree.
And here were forests ancient as the hills,
Enfolding sunny spots of greenery.
The result is now approaching normal human speech. It also seems that with improving technology, the speech tempo increases. The hardware is able to convert text to speech quicker, coinciding with the fact that the reproduction of speech has in this stage become more accurate and easier to follow.
eSpeak open source software (2014)
eSpeak is an open-source voice synthesizer (free speech!) that also works with languages other than English, such as Afrikaans, Spanish, Greek, and more. In its standard version there are a number of regional accents available for English. In this case we hear the North English accent.
in zennedu did koobla kahn
a stately pleasure dome decree.
where alph, the sacred river ran,
through caverns measureless to man.
down to a sunless sea.
So twice five miles of fertile ground.
With walls and towers were girdled round.
And there were gardens bright with sinuous rills.
Where blossomed many an incense-bearing tree.
And here were forests ancient as the hills.
Enfolding sunny spots of greeneree.
Third part, with all voices
We close with the dramatic third part of Kubla Khan recited by all five voices taking turns. The mechanical but lively SoftVoice on Amiga opens and the last line is delivered by eSpeak, sounding very authoritative.
A damsel with a dulcimer
In a vision once I saw:
It was an Abyssinian maid
And on her dulcimer she played,
Singing of Mount Abora.
Could I revive within me
Her symphony and song,
To such a deep delight ’twould win me,
That with music loud and long,
I would build that dome in air,
That sunny dome! those caves of ice!
And all who heard should see them there,
And all should cry, Beware! Beware!
His flashing eyes, his floating hair!
Weave a circle round him thrice,
And close your eyes with holy dread
For he on honey-dew hath fed,
And drunk the milk of Paradise.