Skip to content

#290: Record Audio With PyAudio

Last week we used sounddevice to record audio with Python. In this post we do the same thing, but with the PyAudio library. Depending on your environment and sound devices, you may need more flexibility and operating-system specific support than what sounddevice can offer.

Installation

We can install PyAudio with this command:

uv pip install PyAudio

Show the devices

As with sounddevice, we need to know what channels our sound device supports to not risk an exception when we try to record our audio. Here we need a bit more code to get the list of all the input and output devices.

Before you run this script, make sure that you connected your microphone. The index of devices will change when the operating system finds new audio devices and we need the index when we want to use anything else than the default device.

import pyaudio

# Initialize PyAudio
p = pyaudio.PyAudio()

# List all devices
for i in range(p.get_device_count()):
    info = p.get_device_info_by_index(i)
    print(f"Device {i}: {info['name']} [{info['maxInputChannels']} in, {info['maxOutputChannels']} out]")

# Optionally, get default input device info
default_input = p.get_default_input_device_info()
print()
print("Default input device:")
print(f"  Name: {default_input['name']}")
print(f"  Max input channels: {default_input['maxInputChannels']}")

p.terminate()
Device 0: Microsoft Sound Mapper - Input [2 in, 0 out]
Device 1: Headset (EarPods) [1 in, 0 out]
Device 2: Echo Cancelling Speakerphone (D [2 in, 0 out]
Device 3: Microphone (Logitech BRIO) [2 in, 0 out]
Device 4: Microphone (Realtek(R) Audio) [4 in, 0 out]
Device 5: Microsoft Sound Mapper - Output [0 in, 2 out]
Device 6: Headset (EarPods) [0 in, 2 out]
Device 7: Echo Cancelling Speakerphone (D [0 in, 2 out]
Device 8: Speakers/Headphones (Realtek(R) [0 in, 2 out]
Device 9: Primary Sound Capture Driver [2 in, 0 out]
Device 10: Headset (EarPods) [1 in, 0 out]
Device 11: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [2 in, 0 out]
Device 12: Microphone (Logitech BRIO) [2 in, 0 out]
Device 13: Microphone (Realtek(R) Audio) [4 in, 0 out]
Device 14: Primary Sound Driver [0 in, 2 out]
Device 15: Headset (EarPods) [0 in, 2 out]
Device 16: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [0 in, 2 out]
Device 17: Speakers/Headphones (Realtek(R) Audio) [0 in, 2 out]
Device 18: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [0 in, 2 out]
Device 19: Speakers/Headphones (Realtek(R) Audio) [0 in, 2 out]
Device 20: Headset (EarPods) [0 in, 2 out]
Device 21: Headset (EarPods) [1 in, 0 out]
Device 22: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [2 in, 0 out]
Device 23: Microphone (Logitech BRIO) [2 in, 0 out]
Device 24: Microphone (Realtek(R) Audio) [2 in, 0 out]
Device 25: Speakers 1 (Realtek HD Audio output with SST) [0 in, 2 out]
Device 26: Speakers 2 (Realtek HD Audio output with SST) [0 in, 2 out]
Device 27: PC Speaker (Realtek HD Audio output with SST) [2 in, 0 out]
Device 28: Microphone 1 (Realtek HD Audio Mic input with SST) [2 in, 0 out]
Device 29: Microphone 2 (Realtek HD Audio Mic input with SST) [4 in, 0 out]
Device 30: Microphone 3 (Realtek HD Audio Mic input with SST) [4 in, 0 out]
Device 31: Stereo Mix (Realtek HD Audio Stereo input) [2 in, 0 out]
Device 32: Output (EarPods) [0 in, 2 out]
Device 33: Headset (EarPods) [1 in, 0 out]
Device 34: Microphone (Logitech BRIO) [2 in, 0 out]
Device 35: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [2 in, 0 out]
Device 36: Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515) [0 in, 2 out]

Default input device:
  Name: Headset (EarPods)
  Max input channels: 1

Record to a file

With PyAudio we can set the index of a specific device when we do the recording. For the default device we can set the input_device_index parameter to None:

import pyaudio
import wave

# Audio recording parameters
FORMAT = pyaudio.paInt16  # 16-bit resolution
CHANNELS = 1              # 2: Stereo, 1: Mono
RATE = 44100              # 44.1kHz sampling rate
CHUNK = 1024              # Record in chunks of 1024 samples
RECORD_SECONDS = 5        # Duration of recording
WAVE_OUTPUT_FILENAME = "output_pyaudio.wav"

# Initialize PyAudio
audio = pyaudio.PyAudio()

# Open audio stream
stream = audio.open(format=FORMAT,
                    channels=CHANNELS,
                    input_device_index=None,  # device ID or None for default
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)

print("Recording...")

frames = []

# Record data in chunks
for _ in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
    data = stream.read(CHUNK)
    frames.append(data)

print("Finished recording.")

# Stop and close stream
stream.stop_stream()
stream.close()
audio.terminate()

# Save recorded data to WAV file
with wave.open(WAVE_OUTPUT_FILENAME, 'wb') as wf:
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(audio.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))

print(f"Saved recording as {WAVE_OUTPUT_FILENAME}")

The script records audio chunks for 5 seconds and then writes the collected data with the wave module of Python to a valid *.wav file.

Record for as long as we need to

To record until we hit the Enter key, PyAudio offers us a syntax that looks a bit more understandable. But only at the first glance, then we see that we do not have callbacks but depend on the same threading logic – only a bit wider split throughout the script:

import pyaudio
import wave
import threading
import sys

# Audio recording parameters
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
CHUNK = 1024
WAVE_OUTPUT_FILENAME = "output_pyaudio.wav"

frames = []
recording = True

def record_audio():
    global frames, recording

    # Initialize PyAudio
    audio = pyaudio.PyAudio()

    # Open stream
    stream = audio.open(format=FORMAT,
                        input_device_index=0,
                        channels=CHANNELS,
                        rate=RATE,
                        input=True,
                        frames_per_buffer=CHUNK)

    print("Recording... Press Enter to stop.")

    # Record loop
    while recording:
        data = stream.read(CHUNK)
        frames.append(data)

    print("Recording stopped.")

    # Stop and close stream
    stream.stop_stream()
    stream.close()
    audio.terminate()

    # Save to file
    with wave.open(WAVE_OUTPUT_FILENAME, 'wb') as wf:
        wf.setnchannels(CHANNELS)
        wf.setsampwidth(audio.get_sample_size(FORMAT))
        wf.setframerate(RATE)
        wf.writeframes(b''.join(frames))

    print(f"Saved recording as {WAVE_OUTPUT_FILENAME}")

# Thread to handle recording
recording_thread = threading.Thread(target=record_audio)
recording_thread.start()

# Wait for Enter key
input()  # Press Enter to stop
recording = False
recording_thread.join()

PyAudio or sounddevice?

Both libraries offer us a good support to record audio. What you choose depends mainly on the specific hardware / operating system combination you use. If that is not a deciding factor, I would suggest you start with sounddevice and only use PyAudio if you need more flexibility.

Next

With PyAudio and sounddevice we could record our audio and put it into a *.wav file. Next week we take this recording and try to extract the text in our audio file.