How to Convert Text to Speech in Python

Manav Narula Mar 11, 2025 Python Python Audio

Method 1: Using gTTS (Google Text-to-Speech)
Method 2: Using pyttsx3
Method 3: Using Microsoft Speech API (SAPI)
Conclusion
FAQ

Converting text to speech (TTS) in Python is an exciting way to bring your applications to life. Whether you’re developing an educational tool, a virtual assistant, or simply want to enhance user experience, TTS can be a game-changer. Python, with its rich ecosystem of libraries, makes this task straightforward and accessible.

In this tutorial, we’ll dive into different methods to convert text to speech using Python. By the end, you’ll be equipped with the knowledge to implement TTS in your own projects, making your applications more interactive and user-friendly. So, let’s get started!

Method 1: Using gTTS (Google Text-to-Speech)

One of the most popular libraries for converting text to speech in Python is gTTS, which stands for Google Text-to-Speech. This library allows you to leverage Google’s TTS API, making it easy to generate speech from text. To get started, you’ll need to install the gTTS library, which can be done using pip.

First, install the library:

pip install gTTS

Once installed, you can use the following code snippet to convert text to speech:

from gtts import gTTS
import os

text = "Hello, welcome to the world of Python programming!"
language = 'en'

speech = gTTS(text=text, lang=language, slow=False)
speech.save("welcome.mp3")
os.system("start welcome.mp3")

This code imports the necessary libraries, defines the text you want to convert, and specifies the language. It then creates a speech object and saves it as an MP3 file. Finally, it plays the audio file using the default media player.

Output:

Audio file "welcome.mp3" will play

Using gTTS is incredibly straightforward. It provides a simple interface to convert text into speech with just a few lines of code. The generated audio file can be played on any device that supports MP3 playback. One of the advantages of gTTS is its ability to support multiple languages, making it versatile for global applications. However, since it relies on an internet connection to access Google’s TTS API, ensure you have a stable connection when using it.

Method 2: Using pyttsx3

Another great option for TTS in Python is the pyttsx3 library. Unlike gTTS, pyttsx3 works offline and is compatible with both Windows and Linux systems. It uses the text-to-speech engines installed on your computer, making it a reliable choice for applications that require offline functionality. To get started, install pyttsx3 with pip:

pip install pyttsx3

Here’s a simple example demonstrating how to use pyttsx3 for text-to-speech conversion:

import pyttsx3

engine = pyttsx3.init()
text = "Hello, this is an offline text to speech conversion using pyttsx3."
engine.say(text)
engine.runAndWait()

In this code, we initialize the TTS engine, define our text, and use the say method to queue the text for speaking. The runAndWait method processes the speech commands and waits for them to finish.

Output:

Audio will be played through the system's speakers

The pyttsx3 library is particularly useful for applications that need to function without an internet connection. It provides a range of voice options and allows you to adjust the speech rate and volume. This flexibility makes it a favorite among developers looking to create more customized user experiences. Additionally, since it utilizes the system’s built-in TTS capabilities, it can produce high-quality audio output.

Method 3: Using Microsoft Speech API (SAPI)

For Windows users, the Microsoft Speech API (SAPI) is another powerful option for converting text to speech. This method provides high-quality voices and is integrated into the Windows operating system. To use SAPI, you can utilize the comtypes library to access the API. First, ensure you have the library installed:

pip install comtypes

Here’s how you can implement TTS using SAPI:

import comtypes.client

def speak(text):
    speaker = comtypes.client.CreateObject("SAPI.SpVoice")
    speaker.Speak(text)

text = "This is a demonstration of the Microsoft Speech API for text to speech."
speak(text)

In this example, we create a function called speak that initializes the SAPI voice object and uses the Speak method to read the text aloud.

Output:

Audio will be played through the system's speakers

The Microsoft Speech API is a robust option for developers working on Windows platforms. It provides high-quality speech synthesis and supports various voice configurations. The ability to customize voice parameters such as pitch and speed adds to its versatility. However, it is essential to note that this method is specific to Windows, so it may not be suitable for cross-platform applications.

Conclusion

In this tutorial, we explored three effective methods for converting text to speech in Python: gTTS, pyttsx3, and the Microsoft Speech API. Each method has its unique advantages, whether it’s the ease of use with gTTS, the offline capabilities of pyttsx3, or the high-quality output from SAPI. Depending on your project requirements, you can choose the best option that fits your needs. With these tools at your disposal, you can enhance your applications and create engaging user experiences. Happy coding!

FAQ

What is text-to-speech in Python?
Text-to-speech in Python refers to the conversion of written text into spoken words using various libraries and APIs.
Do I need an internet connection to use gTTS?
Yes, gTTS requires an internet connection as it relies on Google’s TTS API.
Can I use pyttsx3 on Linux?
Yes, pyttsx3 is compatible with both Windows and Linux operating systems.
How can I change the voice in pyttsx3?
You can change the voice in pyttsx3 by accessing the available voices using engine.getProperty('voices') and selecting one.
Is the Microsoft Speech API available on non-Windows platforms?
No, the Microsoft Speech API is specific to Windows and not available on other operating systems.

Enjoying our tutorials? Subscribe to DelftStack on YouTube to support us in creating more high-quality video guides. Subscribe

Author: Manav Narula

Manav is a IT Professional who has a lot of experience as a core developer in many live projects. He is an avid learner who enjoys learning new things and sharing his findings whenever possible.

Method 1: Using gTTS (Google Text-to-Speech)

Method 2: Using pyttsx3

Method 3: Using Microsoft Speech API (SAPI)

Conclusion

FAQ

Related Article - Python Audio