webrtc 渲染_[WebRTC架构分析]采样率转换

本文使用 Zhihu On VSCode 创作并发布

前言

观察 WebRTC 的源码，关于采用率转换的实现有很多，在 m68 版本中用的是基于 sinc 函数的实现。要想理解音频重采样，需要掌握一定的信号处理理论，否则很难理解。当然，本文并不会介绍重采样理论，只是分析一下 WebRTC 重采样实现逻辑。

采样定理

采样率转换的理论依据是香浓的采样定理（ Shannon's sampling theorem）。

假设输入信号

是绝对可积的，而

是信号

的 n 个样本；那么，

是采样周期，采样频率是

。假设

是一个带限信号，通带是

，而

，傅里叶变换是:

其中，当

时，

。

是采用频率，

是截止频率。

那么，我们就可以通过样本序列

重新构建输入信号

。公式如下：

其中

h(t) 是一个 sinc(t) 函数，是理想低通滤波器的单位冲击响应。通过公式 (2) 可以发现，通过样本

重建信号

，其实就是将输入样本序列通过理想低通滤波器，这是在样本

之间拟合出一条连续曲线，所以公式 (2) 也叫

带限内插（band-limited interpolation）。

如果对

进行重采样，我们把新的重采样频率定义为

。如果

，进行的是

减采样（down sampling），那么

，也就是说保证 sinc 函数的截止频率必须小于一半新的采样频率。

由于理想低通滤波器 h(t) 是一个无限连续信号，在计算机中无法实现。在实践中，采用基于窗的 FIR 滤波器设计方法，通过窗函数对 h(t) 进行截取，计算出有限个 h(t) 系数。

下来，我们就看一下 WebRTC SincResampler 的实现。

WebRTC SincResampler 实现

WebRTC 中 sinc resampler 的类图如下：

sinc_resample

从类图来看，结构比较简单。

PushResampler 是对 PushSincResampler 的包装，主要是提供立体声支持。
PushSincResampler 是对 SincResampler 的包装，PushSincResampler 提供音频样本数据，SincResampler 通过 SincResamplerCallback 获取音频数据。
SincResampler 是提供真正的重采样算法实现。

WebRTC 中需要音频重采样的场景：

音频采集完后，进行音频处理的时候。
音频数据编码的过程中。
音频数据解码后，进行音频处理的时候。
音频数据渲染前。

重采样的主要工作如下：

初始化 kernel，本质是计算出低通滤波器系数 h(k)。由 sinc(t) 函数的特性得知，当 t = 0 时，sinc(t) 的增益是最大的;当 t =

时，sinc(t) 的取值是 0。把这些 t =

的时刻叫做 cross-zero 点。如果要设计一个 N 阶低通滤波器，需要将相邻两个 croos-zero 点之间的区间进行划分，比如划分成 M 等分；那么，此 N 阶低通滤波器的系数是 N*M 个。初始化 kernel，只需要在采样器初始化的时候执行一次。
根据新、老采样率的比例，在原有样本上进行插值。比如，ratio =

=3，那么需要在 x(n) ，x(n+1) 之间新插入两个样本，样本值通常取 x(n)的值。
对重新采样的数据和 h(k) 做卷积运算。

SincResampler 中初始化 kernel 的逻辑：

void SincResampler::InitializeKernel() {// Blackman window parameters.// 布拉克曼窗函数系数static const double kAlpha = 0.16;static const double kA0 = 0.5 * (1.0 - kAlpha);//0.42static const double kA1 = 0.5;static const double kA2 = 0.5 * kAlpha;//0.08// 总共是 33 * 32 = 1056 个 lps 系数，这是提供了 kKernelOffsetCount + 1 个核// Generates a set of windowed sinc() kernels.// We generate a range of sub-sample offsets from 0.0 to 1.0.const double sinc_scale_factor = SincScaleFactor(io_sample_rate_ratio_);for (size_t offset_idx = 0; offset_idx <= kKernelOffsetCount; ++offset_idx) {// 将 h(t) = sinc(t) 的每一个 cross-zero 点分割成 32 等分。const float subsample_offset =static_cast<float>(offset_idx) / kKernelOffsetCount;for (size_t i = 0; i < kKernelSize; ++i) {const size_t idx = i + offset_idx * kKernelSize;// 在 -k/2 ~ k/2 内计算 sinc(t)const float pre_sinc = static_cast<float>(M_PI *(static_cast<int>(i) - static_cast<int>(kKernelSize / 2) -subsample_offset));kernel_pre_sinc_storage_[idx] = pre_sinc;// Compute Blackman window, matching the offset of the sinc().const float x = (i - subsample_offset) / kKernelSize;const float window = static_cast<float>(kA0 - kA1 * cos(2.0 * M_PI * x) +kA2 * cos(4.0 * M_PI * x));kernel_window_storage_[idx] = window;// Compute the sinc with offset, then window the sinc() function and store// at the correct offset.// sinc_scale_factor 是重采样的截止频率，也叫滚降系数。kernel_storage_[idx] = static_cast<float>(window *((pre_sinc == 0) ?sinc_scale_factor :(sin(sinc_scale_factor * pre_sinc) / pre_sinc)));}}
}

SincResampler 中重采样逻辑。SincResampler 是通过 SincResamplerCallback 回调接口，从 PushSincResampler 中读取输入的样本数据的。

void SincResampler::Resample(size_t frames, float* destination) {size_t remaining_frames = frames;// 一个采样器，只会填充一次 inputbuffer// Step (1) -- Prime the input buffer at the start of the input stream.if (!buffer_primed_ && remaining_frames) {read_cb_->Run(request_frames_, r0_);buffer_primed_ = true;}// Step (2) -- Resample!  const what we can outside of the loop for speed.  It// actually has an impact on ARM performance.  See inner loop comment below.// io_sample_rate_ratio_ 是输入采样率和输出采样率的比值const double current_io_ratio = io_sample_rate_ratio_;const float* const kernel_ptr = kernel_storage_.get();while (remaining_frames) {// |i| may be negative if the last Resample() call ended on an iteration// that put |virtual_source_idx_| over the limit.//// Note: The loop construct here can severely impact performance on ARM// or when built with clang.  See https://codereview.chromium.org/18566009//*用 request_frames_ = 160(这是输入采样率是 16k Hz 的情况），current_io_ratio = 0.333， 那么，第一轮 load：block_size_ = 144， virtual_source_idx_=0；*/// i 表示子采样后，样本总数for (int i = static_cast<int>(ceil((block_size_ - virtual_source_idx_) / current_io_ratio));i > 0; --i) {RTC_DCHECK_LT(virtual_source_idx_, block_size_);// |virtual_source_idx_| lies in between two kernel offsets so figure out// what they are.const int source_idx = static_cast<int>(virtual_source_idx_);// 插值和抽取，可以理解为子采样const double subsample_remainder = virtual_source_idx_ - source_idx;const double virtual_offset_idx =subsample_remainder * kKernelOffsetCount;const int offset_idx = static_cast<int>(virtual_offset_idx);// We'll compute "convolutions" for the two kernels which straddle// |virtual_source_idx_|.// 两个核做两次卷积运算，这是为什么？const float* const k1 = kernel_ptr + offset_idx * kKernelSize;const float* const k2 = k1 + kKernelSize;// Ensure |k1|, |k2| are 16-byte aligned for SIMD usage.  Should always be// true so long as kKernelSize is a multiple of 16.RTC_DCHECK_EQ(0, reinterpret_cast<uintptr_t>(k1) % 16);RTC_DCHECK_EQ(0, reinterpret_cast<uintptr_t>(k2) % 16);// Initialize input pointer based on quantized |virtual_source_idx_|.// input_ptr[0] 就是 X(n)，input_ptr[1]=x(n-1),input_ptr[2]=x(n-2)。// 在 X[n] 样本序列中，最先采集的应该在序列的开头const float* const input_ptr = r1_ + source_idx;// Figure out how much to weight each kernel's "convolution".const double kernel_interpolation_factor =virtual_offset_idx - offset_idx;*destination++ = CONVOLVE_FUNC(input_ptr, k1, k2, kernel_interpolation_factor);// Advance the virtual index.// 调整输入样本数组下标，执行插值逻辑。virtual_source_idx_ += current_io_ratio;if (!--remaining_frames)return;}// Wrap back around to the start.virtual_source_idx_ -= block_size_;// Step (3) -- Copy r3_, r4_ to r1_, r2_.// This wraps the last input frames back to the start of the buffer.memcpy(r1_, r3_, sizeof(*input_buffer_.get()) * kKernelSize);// Step (4) -- Reinitialize regions if necessary.if (r0_ == r2_)UpdateRegions(true);// Step (5) -- Refresh the buffer with more input.read_cb_->Run(request_frames_, r0_);}
}

后记

本文主要是介绍 WebRTC SincResampler 模块实现逻辑。由于代码注释很多，所以只是增加了很少一部分注释。对于采样率转换理论，并没有讨论太多，因为依赖的前置知识比较多，三言两语很难说清楚。

WebRTC 的 SincResampler 实现，针对每一个输入 x(n)，通过两个 kernel 做了两次卷积运算，不知道这个设计目的是什么。

参考

[1] https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html "fir"