#289: Record Audio With Sounddevice

When we want to turn speech into text, we somehow need to record our audio. For Python we have two main options: sounddevice and PyAudio. This week we see how the more modern sounddevice works, while PyAudio will be the topic of next week’s post.

Installation

We can install sounddevice with this command:

uv pip install sounddevice

Show the devices

Before we start recording, we first should see what audio devices we can access and how many channels they offer. If we try to record with two channels for stereo but the device only offers one channel (mono), we end up with an exception.

Before you run this script, make sure that you connected the microphone you want to use.

import sounddevice as sd

print(sd.query_devices())

   0 Microsoft Sound Mapper - Input, MME (2 in, 0 out)
>  1 Headset (EarPods), MME (1 in, 0 out)
   2 Echo Cancelling Speakerphone (D, MME (2 in, 0 out)
   3 Microphone (Logitech BRIO), MME (2 in, 0 out)
   4 Microphone (Realtek(R) Audio), MME (4 in, 0 out)
   5 Microsoft Sound Mapper - Output, MME (0 in, 2 out)
<  6 Headset (EarPods), MME (0 in, 2 out)
   7 Echo Cancelling Speakerphone (D, MME (0 in, 2 out)
   8 Speakers/Headphones (Realtek(R), MME (0 in, 2 out)
   9 Primary Sound Capture Driver, Windows DirectSound (2 in, 0 out)
  10 Headset (EarPods), Windows DirectSound (1 in, 0 out)
  11 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows DirectSound (2 in, 0 out)
  12 Microphone (Logitech BRIO), Windows DirectSound (2 in, 0 out)
  13 Microphone (Realtek(R) Audio), Windows DirectSound (4 in, 0 out)
  14 Primary Sound Driver, Windows DirectSound (0 in, 2 out)
  15 Headset (EarPods), Windows DirectSound (0 in, 2 out)
  16 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows DirectSound (0 in, 2 out)
  17 Speakers/Headphones (Realtek(R) Audio), Windows DirectSound (0 in, 2 out)
  18 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows WASAPI (0 in, 2 out)
  19 Speakers/Headphones (Realtek(R) Audio), Windows WASAPI (0 in, 2 out)
  20 Headset (EarPods), Windows WASAPI (0 in, 2 out)
  21 Headset (EarPods), Windows WASAPI (1 in, 0 out)
  22 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows WASAPI (2 in, 0 out)
  23 Microphone (Logitech BRIO), Windows WASAPI (2 in, 0 out)
  24 Microphone (Realtek(R) Audio), Windows WASAPI (2 in, 0 out)
  25 Speakers 1 (Realtek HD Audio output with SST), Windows WDM-KS (0 in, 2 out)
  26 Speakers 2 (Realtek HD Audio output with SST), Windows WDM-KS (0 in, 2 out)
  27 PC Speaker (Realtek HD Audio output with SST), Windows WDM-KS (2 in, 0 out)
  28 Microphone 1 (Realtek HD Audio Mic input with SST), Windows WDM-KS (2 in, 0 out)
  29 Microphone 2 (Realtek HD Audio Mic input with SST), Windows WDM-KS (4 in, 0 out)
  30 Microphone 3 (Realtek HD Audio Mic input with SST), Windows WDM-KS (4 in, 0 out)
  31 Stereo Mix (Realtek HD Audio Stereo input), Windows WDM-KS (2 in, 0 out)
  32 Output (EarPods), Windows WDM-KS (0 in, 2 out)
  33 Headset (EarPods), Windows WDM-KS (1 in, 0 out)
  34 Microphone (Logitech BRIO), Windows WDM-KS (2 in, 0 out)
  35 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows WDM-KS (2 in, 0 out)
  36 Echo Cancelling Speakerphone (DELL PROFESSIONAL SOUND BAR AE515), Windows WDM-KS (0 in, 2 out)

The > shows the default input device, while < shows the default output device.

Change the default device

If we are not happy with the current default device, we can use this property to set it to the name of the device we want to use:

# select device
sd.default.device = 'Echo Cancelling Speakerphone (D, MME'

Record to a file

When we found the device we want to use, we can set the correct number of channels in this script to record and store our audio file to disk:

import os
import wave
import numpy as np

# Set environment variable before importing sounddevice. Value is not important.
# os.environ["SD_ENABLE_ASIO"] = "1"

import sounddevice as sd

# Parameters
duration = 5  # seconds
sample_rate = 44100  # 44.1 kHz
channels = 1  # 1 = Mono, 2 = Stereo
file_name = "output_sounddevice.wav"
frames = int(duration * sample_rate)
dtype = 'int16'

print("Recording...")

# Record audio
audio_data = sd.rec(frames, 
                    sample_rate=sample_rate, 
                    channels=channels, 
                    dtype=dtype)
sd.wait()  # Wait until recording is finished

# Save to file
with wave.open(file_name, 'wb') as wf:
    wf.setnchannels(channels)
    wf.setsampwidth(np.dtype(dtype).itemsize)
    wf.setframerate(sample_rate)
    wf.writeframes(audio_data.tobytes())

print(f"Recording saved as {file_name}")

The script captures audio from the microphone for 5 seconds and generates a NumPy array. We can take this array and save it to a *.wav file with the wave module of Python.

Record for as long as we need to

The fixed duration to record is usually not what we want. We can modify the scrip and add threading so that we can keep recording until we hit the Enter key:

import sounddevice as sd
import numpy as np
import wave
import threading
import sys

# Parameters
sample_rate = 44100  # 44.1 kHz
channels = 1  # 1 = Mono, 2 = Stereo
file_name = "output_sounddevice.wav"
dtype = 'int16'

# Buffer to hold recorded data
recorded_frames = []
recording = True

def callback(indata, frames, time, status):
    if status:
        print(status, file=sys.stderr)
    recorded_frames.append(indata.copy())

def wait_for_enter():
    input("Recording... Press Enter to stop.\n")
    global recording
    recording = False

# Start enter-listening thread
stop_thread = threading.Thread(target=wait_for_enter)
stop_thread.start()

# Start recording stream
with sd.InputStream(samplerate=sample_rate, 
                    channels=channels, 
                    dtype=dtype, 
                    callback=callback):
    while recording:
        sd.sleep(100)

# Combine and save to file
audio_data = np.concatenate(recorded_frames)

with wave.open(file_name, 'wb') as wf:
    wf.setnchannels(channels)
    wf.setsampwidth(np.dtype(dtype).itemsize)
    wf.setframerate(sample_rate)
    wf.writeframes(audio_data.tobytes())

print(f"Saved recording as {file_name}")

There is a lot more going on here than in the fixed-length script. The callback() function appends the current increment of our recording to our list of recordings, while the wait_for_enter() function makes sure that we can stop the recording.