tf.raw_ops.Mfcc

Transforms a spectrogram into a form that's useful for speech recognition.

Mel Frequency Cepstral Coefficients are a way of representing audio data that's been effective as an input feature for machine learning. They are created by taking the spectrum of a spectrogram (a 'cepstrum'), and discarding some of the higher frequencies that are less significant to the human ear. They have a long history in the speech recognition world, and https://en.wikipedia.org/wiki/Mel-frequency_cepstrum is a good resource to learn more.

spectrogram A Tensor of type float32. Typically produced by the Spectrogram op, with magnitude_squared set to true. sample_rate A Tensor of type int32. How many samples per second the source audio used. upper_frequency_limit An optional float. Defaults to 4000. The highest frequency to use when calculating the ceptstrum. lower_frequency_limit An optional float. Defaults to 20. The lowest frequency to use when calculating the ceptstrum. filterbank_channel_count An optional int. Defaults to 40. Resolution of the Mel bank used internally. dct_coefficient_count An optional int. Defaults to 13. How many output channels to produce per time slice. name A name for the operation (optional).

A Tensor of type float32.