PyAudio中的录音丢帧机制

这是一段利用 PyAudio 库来录音的代码（出处）：

import pyaudio
import wave

chunk = 1024  # Record in chunks of 1024 samples
sample_format = pyaudio.paInt16  # 16 bits per sample
channels = 2
fs = 44100  # Record at 44100 samples per second
seconds = 3
filename = "output.wav"

p = pyaudio.PyAudio()  # Create an interface to PortAudio

print('Recording')

stream = p.open(format=sample_format,
                channels=channels,
                rate=fs,
                frames_per_buffer=chunk,
                input=True)

frames = []  # Initialize array to store frames

# Store data in chunks for 3 seconds
for i in range(0, int(fs / chunk * seconds)):
    data = stream.read(chunk)
    frames.append(data)

# Stop and close the stream 
stream.stop_stream()
stream.close()
# Terminate the PortAudio interface
p.terminate()

print('Finished recording')

# Save the recorded data as a WAV file
wf = wave.open(filename, 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(frames))
wf.close()

这里用的是传统的阻塞模式。当收集到的采样数少于 chunk 时，stream.read() 语句将会阻塞。但是，有个问题随即而来——如果当 stream.read() 返回时，收集到的采样没有及时被程序读取，那么这些采样会被保留吗？

一个相似的情况是网络套接字通讯。假设一个程序每次只从套接字读取 A 个字节，而网卡缓冲区的大小是 B 字节（通常情况下，缓冲区都是足够大的，也就是 A < B)。在不断有新的数据传入的情况下，如果程序没有及时读取数据，那么这些数据就会留在网卡缓冲区中；直到网卡缓冲区满了，才会发生数据丢失的情况。那么从程序的角度来看，无论相邻的两次读取的操作间隔了多久，只要缓冲区没有溢出，读取得到的数据都会是连续的。这是因为网卡缓冲区起到了暂时保管数据的作用。

为了试验 PyAudio 在录音时是不是拥有相同的机制，我在每次读取操作之间加入了延迟，使得程序总是不会及时读取返回的数据。计算可得每次读取的数据的时长为：1024 ÷ 44100 = 0.0232秒，所以我加入了 0.1秒 的延迟。新的代码如下：

chunk = 1024
sample_format = pyaudio.paInt16
channels = 1
fs = 44100
read = 0
p = pyaudio.PyAudio()
stream = p.open(format=sample_format,
channels=channels,
rate=fs,
frames_per_buffer=chunk,
input=True)

while read < 10 * 44100:
data = stream.read(chunk)
read += chunk
time.sleep(0.1)
data = np.fromstring(data, dtype=np.int16)
self.audio_frames.append(data)
stream.stop_stream()
stream.close()
p.terminate()

wf = wave.open('delay_test.wav', 'wb')
wf.setnchannels(channels)
wf.setsampwidth(p.get_sample_size(sample_format))
wf.setframerate(fs)
wf.writeframes(b''.join(self.audio_frames))
wf.close()

播放所得的音频，发现有明显的断续感，也就是有丢帧的情况。这说明，PyAudio 在录音时的数据缓冲机制和网络通讯的不同，如果返回的数据没有被及时读取，最老的数据就会被新的数据覆盖，底层机制只会为程序保留 chunk 个采样。

PyAudio中的录音丢帧机制

https://tomzhu.site/2020/06/25/PyAudio中的录音丢帧机制/

作者

Tom Zhu

发布于

2020年6月25日

许可协议

Promise链式调用与async/await关键字上一篇

frps+frpc内网穿透服务搭建记录下一篇