Skip to content

New recipe for RescueSpeech dataset#2017

Merged
mravanelli merged 44 commits intospeechbrain:developfrom
sangeet2020:RescueSpeech
Jul 13, 2023
Merged

New recipe for RescueSpeech dataset#2017
mravanelli merged 44 commits intospeechbrain:developfrom
sangeet2020:RescueSpeech

Conversation

@sangeet2020
Copy link
Contributor

@sangeet2020 sangeet2020 commented Jun 6, 2023

Description

This pull request introduces a new training recipe and pre-trained models for a new "RescueSpeech" dataset in the SpeechBrain toolkit. The RescueSpeech dataset is a collection of audio recordings from emergency response scenarios, aimed at facilitating the development of speech and audio processing models for rescue operations.

The provided training recipe includes the necessary scripts, configurations, and data preparation steps for training models on the RescueSpeech dataset. Additionally, we have included pre-trained models that can be used for inference or as a starting point for further research.

This contribution aims to expand the toolkit's capabilities and enable the SpeechBrain community to explore speech and audio processing for rescue-related applications.

Changes Made

  • Added new training recipes for the RescueSpeech dataset.
  • Included data preprocessing scripts and configuration files.
  • Added pre-trained models for inference and transfer learning.
  • Documentation to include details about the RescueSpeech dataset.

Dataset Details

See dataset.md

Testing

We have thoroughly tested the training recipe and pre-trained models using a the test set from the RescueSpeech dataset. The results indicate the effectiveness and utility of the proposed approach.

Documentation

We have updated the documentation to include the following sections:

  • Introduction to the RescueSpeech dataset.
  • Instructions for obtaining and preparing the dataset.
  • Detailed information about the new training recipe.
  • Guidelines for using the provided pre-trained models.
  • Examples and demonstrations showcasing the usage of the RescueSpeech dataset in the toolkit.

Checklist

Please check if your PR fulfills the following requirements:

  • The code follows the project's coding conventions and style guidelines.
  • Tests have been added to verify the changes (if applicable).
  • Documentation has been updated to reflect the changes.
  • The commits follow the project's commit message guidelines.

Thank you for considering this pull request. We look forward to your feedback and the opportunity to contribute to the SpeechBrain toolkit with the RescueSpeech dataset and associated resources.

TDOs

  • add pre-trained models link
  • add HF link
  • update test/recipes.csv with this recipe.

@mravanelli , please feel free to suggest changes.

Note: when merged, we desire to include your PR title in our contributions list, check out one of our past version releases
https://github.com/speechbrain/speechbrain/releases/tag/v0.5.14

Tip: below, on the « Create Pull Request » use the drop-down to select: « Create Draft Pull Request » – your PR will be in draft mode until you declare it « Ready for review »

@sangeet2020 sangeet2020 changed the title New recipe for Rescue speech dataset New recipe for RescueSpeech dataset Jun 6, 2023
@mravanelli
Copy link
Collaborator

Hi @sangeet2020,
thank you for this PR! Here is my main comment from the first code inspection:

  • It looks like you shared all the code for all the various experiments in the paper (e.g., we have different models, different training modalities, ASR, Enhancement, etc). This is good from one side, but having too many models makes maintainability an issue. My suggestion is to only push the code for the best model (i.e., the combination of ASR and Enhancement that gives the best WER).

@sangeet2020
Copy link
Contributor Author

@mravanelli , changes have been updated. Only best results are kept- joint training - SepFormer speech enhancement combined with Whisper ASR. Rest all recipes have been removed.

thanks.

@mravanelli
Copy link
Collaborator

Hi @sangeet2020,

Thank you for making the modifications. Here are my second round of comments:

  1. The code should be easily runnable by typing the following command:
python train.py hparams/robust_asr_16k.yaml --data_folder=<data_folder_path>

However, this is not the case with your code. Users need to specify the following parameters: whisper_folder, pretrained_whisper_model, and pretrained_enhance_path. I suggest putting the required model on the SpeechBrain HF repo. I can give you access to it. This way, we can automatically download the models from HF, just like in many other recipes. Here's an example:

pretrained_enhance_path: speechbrain/RescueSpeech_Sepformer
  1. In the README, please add only the best model to the table, specifically the whisper fine-tuned and Sepformer-fine tuned models. Optionally, you can include the HF and Dropbox links for the fine-tuned Sepformer and whisper models. I can give you access to the new SpeechBrain Dropbox repo. Please ensure that these links are working correctly.

  2. Regarding the HF repo of the final model, you should upload both the final whisper and Sepformer models. You will need two YAML files for this. In the runnable example, show how to enhance the signal first and then how to run speech recognition on the enhanced signal. You can refer to examples of a Sepformer HF repo here and a whisper HF repo here.

  3. Please add recipe tests. See https://github.com/speechbrain/speechbrain/tree/develop/tests/recipes (follow up with me if you have problem setting the tests up).

  4. Minor suggestion: Add missing docstrings in rescuespeech_prepare.py for the functions unicode_normalisation and strip_accents.

  5. The dataset is not a single file in Zenodo. Do we need to download the entire dataset? It would be easier if there was a single file available.

  6. Fix trailing-whitespace in recipes/RescueSpeech/dataset.md

@sangeet2020
Copy link
Contributor Author

sangeet2020 commented Jul 2, 2023

Hi @mravanelli,

Thanks for your suggestions.

  1. The code should be easily runnable by typing the following command:
python train.py hparams/robust_asr_16k.yaml --data_folder=<data_folder_path>

However, this is not the case with your code. Users need to specify the following parameters: whisper_folder, pretrained_whisper_model, and pretrained_enhance_path. I suggest putting the required model on the SpeechBrain HF repo. I can give you access to it. This way, we can automatically download the models from HF, just like in many other recipes. Here's an example:

pretrained_enhance_path: speechbrain/RescueSpeech_Sepformer

HF links added, dropbox links added, and hparams/robust_asr_16k.yaml modified such users would need to supply --data_folder=<data_folder_path> argument. ✔

  1. In the README, please add only the best model to the table, specifically the whisper fine-tuned and Sepformer-fine tuned models. Optionally, you can include the HF and Dropbox links for the fine-tuned Sepformer and whisper models. I can give you access to the new SpeechBrain Dropbox repo. Please ensure that these links are working correctly.

README has now HF and dropbox to RescueSpeech fine-tuned Whisper and SepFormer models. ✔
However, Whisper model card, is not yet working as expectedly ❌

  1. Regarding the HF repo of the final model, you should upload both the final whisper and Sepformer models. You will need two YAML files for this. In the runnable example, show how to enhance the signal first and then how to run speech recognition on the enhanced signal. You can refer to examples of a Sepformer HF repo here and a whisper HF repo here.

In progress, to be updated soon ❌

  1. Please add recipe tests. See https://github.com/speechbrain/speechbrain/tree/develop/tests/recipes (follow up with me if you have problem setting the tests up).

Added ✔

  1. Minor suggestion: Add missing docstrings in rescuespeech_prepare.py for the functions unicode_normalisation and strip_accents.

Added ✔

  1. The dataset is not a single file in Zenodo. Do we need to download the entire dataset? It would be easier if there was a single file available.

Not really, to run this experiments users would only need to download the Task_ASR.tar.gz file which is primarily made for ASR experiments. Please refer dataset.md for further details.

  1. Fix trailing-whitespace in recipes/RescueSpeech/dataset.md

Fixed ✔

@mravanelli
Copy link
Collaborator

Thank you, @sangeet2020. I have made some minor modifications to improve the README file. Additionally, I have a few more comments:

  1. It's important to double-check the list of external dependencies and ensure that only the strictly necessary ones are included.

  2. The current version of the code only works with an older version of the transformer library (< 0.4.29) due to the issue mentioned here. @poonehmousavi is currently working on resolving this issue, as well as addressing performance problems with our Whisper interface. I would highly recommend closely synchronizing your efforts with her to find a solution for both problems.

  3. It's worth mentioning in the README that this recipe is quite memory demanding. Please provide a note specifying the amount of computing resources required to train this recipe.

  4. Lastly, please be aware that the final HF model repository is still empty. We should address this by ensuring the necessary files are included.

@mravanelli
Copy link
Collaborator

Hi @sangeet2020,

@mravanelli
Copy link
Collaborator

@sangeet2020, please let me know once my comments are addressed. I think we are very close to merge this PR.

@mravanelli
Copy link
Collaborator

Hi @sangeet2020, I have tested the training after merging the dev branch, and everything is working. Additionally, I've also conducted inference using the following Hugging Face models, and all of them ran without any issues:

  • Whisper Rescuespeech: link
  • Sepformer Rescuespeech: link

Please note that the browser API is currently unavailable as our whisper interface is only present in the dev branch. However, it will be included in the main branch once we release the new version.

The only remaining item before merging this PR is the creation of the following interface:

  • Noisy Whisper Rescuespeech: link

To proceed, we need to include a speech enhancement model followed by an ASR model. Please upload both models to this repository and provide an example showcasing their usage. Here's a sample code snippet that demonstrates how to utilize these models:

from speechbrain.pretrained import SepformerSeparation as Separator
from speechbrain.pretrained import WhisperASR

enh_model = Separator.from_hparams(source="speechbrain/rescuespeech", savedir='pretrained_models/rescuespeech_sepformer')
asr_model = WhisperASR.from_hparams(source="speechbrain/rescuespeech", savedir="pretrained_models/rescuespeech_whisper")

# For custom file, change the path accordingly
est_sources = enh_model.separate_file(path='speechbrain/rescuespeech_sepformer/example_rescuespeech16k.wav')
asr_model(est_sources[:, :, 0])

@sangeet2020
Copy link
Contributor Author

Hi Mirco,
Apologies for the delayed response.

  1. I have put only the necessary libraries needed to run the training script in requirements.txt.
  2. I have tried running the script (debug mode) and after pulling the latest changes, the script works. Even loading pre-trained models (trained on old transformers) and fine-tuning works. In short, everything is backwards compatible with the new version of the transformer.
  3. README.md has been complemented with a special note concerning the resources needed for training.
  4. All models have been uploaded in the HuggingFace.

HF repo for noise robust Whisper ASR on RescueSpeech: https://huggingface.co/speechbrain/noisy-whisper-resucespeech
As of writing this comment, everything is working on my end. If any issues, reply on this thread.

Thank You

@mravanelli
Copy link
Collaborator

LGTM! Thank you @sangeet2020!

@mravanelli mravanelli merged commit 3c2f72d into speechbrain:develop Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants