New recipe for RescueSpeech dataset#2017
Conversation
|
Hi @sangeet2020,
|
|
@mravanelli , changes have been updated. Only best results are kept- joint training - SepFormer speech enhancement combined with Whisper ASR. Rest all recipes have been removed. thanks. |
|
Hi @sangeet2020, Thank you for making the modifications. Here are my second round of comments:
However, this is not the case with your code. Users need to specify the following parameters:
|
|
Hi @mravanelli, Thanks for your suggestions.
HF links added, dropbox links added, and
README has now HF and dropbox to RescueSpeech fine-tuned Whisper and SepFormer models. ✔
In progress, to be updated soon ❌
Added ✔
Added ✔
Not really, to run this experiments users would only need to download the
Fixed ✔ |
|
Thank you, @sangeet2020. I have made some minor modifications to improve the README file. Additionally, I have a few more comments:
|
|
Hi @sangeet2020,
|
|
@sangeet2020, please let me know once my comments are addressed. I think we are very close to merge this PR. |
|
Hi @sangeet2020, I have tested the training after merging the dev branch, and everything is working. Additionally, I've also conducted inference using the following Hugging Face models, and all of them ran without any issues: Please note that the browser API is currently unavailable as our whisper interface is only present in the dev branch. However, it will be included in the main branch once we release the new version. The only remaining item before merging this PR is the creation of the following interface:
To proceed, we need to include a speech enhancement model followed by an ASR model. Please upload both models to this repository and provide an example showcasing their usage. Here's a sample code snippet that demonstrates how to utilize these models: from speechbrain.pretrained import SepformerSeparation as Separator
from speechbrain.pretrained import WhisperASR
enh_model = Separator.from_hparams(source="speechbrain/rescuespeech", savedir='pretrained_models/rescuespeech_sepformer')
asr_model = WhisperASR.from_hparams(source="speechbrain/rescuespeech", savedir="pretrained_models/rescuespeech_whisper")
# For custom file, change the path accordingly
est_sources = enh_model.separate_file(path='speechbrain/rescuespeech_sepformer/example_rescuespeech16k.wav')
asr_model(est_sources[:, :, 0]) |
|
Hi Mirco,
HF repo for noise robust Whisper ASR on RescueSpeech: https://huggingface.co/speechbrain/noisy-whisper-resucespeech Thank You |
|
LGTM! Thank you @sangeet2020! |
Description
This pull request introduces a new training recipe and pre-trained models for a new "RescueSpeech" dataset in the SpeechBrain toolkit. The RescueSpeech dataset is a collection of audio recordings from emergency response scenarios, aimed at facilitating the development of speech and audio processing models for rescue operations.
The provided training recipe includes the necessary scripts, configurations, and data preparation steps for training models on the RescueSpeech dataset. Additionally, we have included pre-trained models that can be used for inference or as a starting point for further research.
This contribution aims to expand the toolkit's capabilities and enable the SpeechBrain community to explore speech and audio processing for rescue-related applications.
Changes Made
Dataset Details
See
dataset.mdTesting
We have thoroughly tested the training recipe and pre-trained models using a the test set from the RescueSpeech dataset. The results indicate the effectiveness and utility of the proposed approach.
Documentation
We have updated the documentation to include the following sections:
Checklist
Please check if your PR fulfills the following requirements:
Thank you for considering this pull request. We look forward to your feedback and the opportunity to contribute to the SpeechBrain toolkit with the RescueSpeech dataset and associated resources.
TDOs
@mravanelli , please feel free to suggest changes.
Note: when merged, we desire to include your PR title in our contributions list, check out one of our past version releases
—https://github.com/speechbrain/speechbrain/releases/tag/v0.5.14
Tip: below, on the « Create Pull Request » use the drop-down to select: « Create Draft Pull Request » – your PR will be in draft mode until you declare it « Ready for review »