#285: Intermediate Text-to-Speech With Pyttsx3
Last week we did our first steps and used Google Translate to turn text into speech. While it works, it is limited and sending confidential data to Google can be a data protection problem. Luckily for us, there are solutions that run on our local machines.
Installing pyttsx3
The package pyttsx3 is an offline text-to-speech conversion library for Python. It uses well-established engines for the output part, what allows us to use it on Windows, Linux and Mac. On Windows we can profit from the built-in text-to-speech (TTS) feature, while on Linux we may need to install an engine like espeak-ng.
We can install pyttsx3 with this command:
Turn text into speech
To turn text into speech, we import pyttsx3, initialise an engine and set the voice we want to use. If we want to output the generated audio file through the speaker, we can use this code:
If we want to save the generated audio as a file, we can replace the line 4 (that calls the say() method) with this ones:
Supported languages
We can use this code to get a list of supported voices and languages:
On Windows, we only get the few languages we explicitly installed in Accessibility > Narrator and we must make sure that the voice we pick matches the language we want to use – otherwise we end up with a mess:
0 - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-GB_HAZEL_11.0
1 - HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens\TTS_MS_EN-US_ZIRA_11.0
On my Ubuntu 24.04 Linux system, the list of supported languages is much larger thanks to espeak-ng:
0 - Afrikaans
1 - Amharic
2 - Aragonese
3 - Arabic
4 - Assamese
5 - Azerbaijani
6 - Bashkir
7 - Belarusian
8 - Bulgarian
9 - Bengali
10 - Bishnupriya Manipuri
11 - Bosnian
12 - Catalan
13 - Cherokee
14 - Chinese (Mandarin, latin as English)
15 - Chinese (Mandarin, latin as Pinyin)
16 - Czech
17 - Chuvash
18 - Welsh
19 - Danish
20 - German
21 - Greek
22 - English (Caribbean)
23 - English (Great Britain)
24 - English (Scotland)
25 - English (Lancaster)
26 - English (West Midlands)
27 - English (Received Pronunciation)
28 - English (America)
29 - English (America, New York City)
30 - Esperanto
31 - Spanish (Spain)
32 - Spanish (Latin America)
33 - Estonian
34 - Basque
35 - Persian
36 - Persian (Pinglish)
37 - Finnish
38 - French (Belgium)
39 - French (Switzerland)
40 - French (France)
41 - Gaelic (Irish)
42 - Gaelic (Scottish)
43 - Guarani
44 - Greek (Ancient)
45 - Gujarati
46 - Hakka Chinese
47 - Hawaiian
48 - Hebrew
49 - Hindi
50 - Croatian
51 - Haitian Creole
52 - Hungarian
53 - Armenian (East Armenia)
54 - Armenian (West Armenia)
55 - Interlingua
56 - Indonesian
57 - Ido
58 - Icelandic
59 - Italian
60 - Japanese
61 - Lojban
62 - Georgian
63 - Kazakh
64 - Greenlandic
65 - Kannada
66 - Korean
67 - Konkani
68 - Kurdish
69 - Kyrgyz
70 - Latin
71 - Luxembourgish
72 - Lingua Franca Nova
73 - Lithuanian
74 - Latgalian
75 - Latvian
76 - Māori
77 - Macedonian
78 - Malayalam
79 - Marathi
80 - Malay
81 - Maltese
82 - Myanmar (Burmese)
83 - Norwegian Bokmål
84 - Nahuatl (Classical)
85 - Nepali
86 - Dutch
87 - Nogai
88 - Oromo
89 - Oriya
90 - Punjabi
91 - Papiamento
92 - Klingon
93 - Polish
94 - Portuguese (Portugal)
95 - Portuguese (Brazil)
96 - Pyash
97 - Lang_Belta
98 - Quechua
99 - K'iche'
100 - Quenya
101 - Romanian
102 - Russian
103 - Russian (Latvia)
104 - Sindhi
105 - Shan (Tai Yai)
106 - Sinhala
107 - Sindarin
108 - Slovak
109 - Slovenian
110 - Lule Saami
111 - Albanian
112 - Serbian
113 - Swedish
114 - Swahili
115 - Tamil
116 - Telugu
117 - Thai
118 - Turkmen
119 - Setswana
120 - Turkish
121 - Tatar
122 - Uyghur
123 - Ukrainian
124 - Urdu
125 - Uzbek
126 - Vietnamese (Northern)
127 - Vietnamese (Central)
128 - Vietnamese (Southern)
129 - Chinese (Cantonese)
130 - Chinese (Cantonese, latin as Jyutping)
Customisations
With pyttsx3 we get 3 main dials to influence the output:
- Voice/Language
- Speed
- Volume
To change the voice, we can run the code from above and pick the voice and language combination that we get from the engine and then set it through the index position in the voices list:
To change the speed, we can read the rate property and set it to a different value than the 200 we get on Windows:
The final dial we have is the volume, that we can set to a value between 0 and 1:
Next
With pyttsx3 we get a solution for TTS that runs on our local machine. There is no external service involved, and our data does not leave our machine. Based on the engine we use, we get a wide range of voices/languages that we can use. However, the customisations are rather limited.
Next week we explore Coqui, a library that also runs locally but offers us more advanced features.