View source on GitHub
|
Returns a matrix to warp linear scale spectrograms to the mel scale.
tf.signal.linear_to_mel_weight_matrix(
num_mel_bins=20,
num_spectrogram_bins=129,
sample_rate=8000,
lower_edge_hertz=125.0,
upper_edge_hertz=3800.0,
dtype=tf.dtypes.float32,
name=None
)
Returns a weight matrix that can be used to re-weight a Tensor containing
num_spectrogram_bins linearly sampled frequency information from
[0, sample_rate / 2] into num_mel_bins frequency information from
[lower_edge_hertz, upper_edge_hertz] on the mel scale.
This function follows the Hidden Markov Model Toolkit (HTK) convention, defining the mel scale in terms of a frequency in hertz according to the following formula:
$$\textrm{mel}(f) = 2595 * \textrm{log}_{10}(1 + \frac{f}{700})$$
In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.
For example, the returned matrix A can be used to right-multiply a
spectrogram S of shape [frames, num_spectrogram_bins] of linear
scale spectrum values (e.g. STFT magnitudes) to generate a "mel spectrogram"
M of shape [frames, num_mel_bins].
# `S` has shape [frames, num_spectrogram_bins]
# `M` has shape [frames, num_mel_bins]
M = tf.matmul(S, A)
The matrix can be used with tf.tensordot to convert an arbitrary rank
Tensor of linear-scale spectral bins into the mel scale.
# S has shape [..., num_spectrogram_bins].
# M has shape [..., num_mel_bins].
M = tf.tensordot(S, A, 1)
Args |
|---|
num_mel_bins
num_spectrogram_bins
Tensor. How many bins there are in the
source spectrogram data, which is understood to be fft_size // 2 + 1,
i.e. the spectrogram only contains the nonredundant FFT bins.
sample_rate
Tensor. Samples per second of the input
signal used to create the spectrogram. Used to figure out the frequencies
corresponding to each spectrogram bin, which dictates how they are mapped
into the mel scale.
lower_edge_hertz
upper_edge_hertz
dtype
DType of the result matrix. Must be a floating point type.
name
Returns | |
|---|---|
A Tensor of shape [num_spectrogram_bins, num_mel_bins].
|
Raises |
|---|
ValueError
num_mel_bins/num_spectrogram_bins/sample_rate are not
positive, lower_edge_hertz is negative, frequency edges are incorrectly
ordered, upper_edge_hertz is larger than the Nyquist frequency.
View source on GitHub