New recipe for RescueSpeech dataset by sangeet2020 · Pull Request #2017 · speechbrain/speechbrain

sangeet2020 · 2023-06-06T13:21:31Z

Description

This pull request introduces a new training recipe and pre-trained models for a new "RescueSpeech" dataset in the SpeechBrain toolkit. The RescueSpeech dataset is a collection of audio recordings from emergency response scenarios, aimed at facilitating the development of speech and audio processing models for rescue operations.

The provided training recipe includes the necessary scripts, configurations, and data preparation steps for training models on the RescueSpeech dataset. Additionally, we have included pre-trained models that can be used for inference or as a starting point for further research.

This contribution aims to expand the toolkit's capabilities and enable the SpeechBrain community to explore speech and audio processing for rescue-related applications.

Changes Made

Added new training recipes for the RescueSpeech dataset.
Included data preprocessing scripts and configuration files.
Added pre-trained models for inference and transfer learning.
Documentation to include details about the RescueSpeech dataset.

Dataset Details

See dataset.md

Testing

We have thoroughly tested the training recipe and pre-trained models using a the test set from the RescueSpeech dataset. The results indicate the effectiveness and utility of the proposed approach.

Documentation

We have updated the documentation to include the following sections:

Introduction to the RescueSpeech dataset.
Instructions for obtaining and preparing the dataset.
Detailed information about the new training recipe.
Guidelines for using the provided pre-trained models.
Examples and demonstrations showcasing the usage of the RescueSpeech dataset in the toolkit.

Checklist

Please check if your PR fulfills the following requirements:

The code follows the project's coding conventions and style guidelines.
Tests have been added to verify the changes (if applicable).
Documentation has been updated to reflect the changes.
The commits follow the project's commit message guidelines.

Thank you for considering this pull request. We look forward to your feedback and the opportunity to contribute to the SpeechBrain toolkit with the RescueSpeech dataset and associated resources.

TDOs

add pre-trained models link
add HF link
update test/recipes.csv with this recipe.

@mravanelli , please feel free to suggest changes.

Note: when merged, we desire to include your PR title in our contributions list, check out one of our past version releases
—https://github.com/speechbrain/speechbrain/releases/tag/v0.5.14

Tip: below, on the « Create Pull Request » use the drop-down to select: « Create Draft Pull Request » – your PR will be in draft mode until you declare it « Ready for review »

mravanelli · 2023-06-26T22:44:20Z

Hi @sangeet2020,
thank you for this PR! Here is my main comment from the first code inspection:

It looks like you shared all the code for all the various experiments in the paper (e.g., we have different models, different training modalities, ASR, Enhancement, etc). This is good from one side, but having too many models makes maintainability an issue. My suggestion is to only push the code for the best model (i.e., the combination of ASR and Enhancement that gives the best WER).

sangeet2020 · 2023-06-27T23:24:42Z

@mravanelli , changes have been updated. Only best results are kept- joint training - SepFormer speech enhancement combined with Whisper ASR. Rest all recipes have been removed.

thanks.

mravanelli · 2023-06-28T22:38:47Z

Hi @sangeet2020,

Thank you for making the modifications. Here are my second round of comments:

The code should be easily runnable by typing the following command:

python train.py hparams/robust_asr_16k.yaml --data_folder=<data_folder_path>

However, this is not the case with your code. Users need to specify the following parameters: whisper_folder, pretrained_whisper_model, and pretrained_enhance_path. I suggest putting the required model on the SpeechBrain HF repo. I can give you access to it. This way, we can automatically download the models from HF, just like in many other recipes. Here's an example:

pretrained_enhance_path: speechbrain/RescueSpeech_Sepformer

In the README, please add only the best model to the table, specifically the whisper fine-tuned and Sepformer-fine tuned models. Optionally, you can include the HF and Dropbox links for the fine-tuned Sepformer and whisper models. I can give you access to the new SpeechBrain Dropbox repo. Please ensure that these links are working correctly.
Regarding the HF repo of the final model, you should upload both the final whisper and Sepformer models. You will need two YAML files for this. In the runnable example, show how to enhance the signal first and then how to run speech recognition on the enhanced signal. You can refer to examples of a Sepformer HF repo here and a whisper HF repo here.
Please add recipe tests. See https://github.com/speechbrain/speechbrain/tree/develop/tests/recipes (follow up with me if you have problem setting the tests up).
Minor suggestion: Add missing docstrings in rescuespeech_prepare.py for the functions unicode_normalisation and strip_accents.
The dataset is not a single file in Zenodo. Do we need to download the entire dataset? It would be easier if there was a single file available.
Fix trailing-whitespace in recipes/RescueSpeech/dataset.md

sangeet2020 · 2023-07-02T02:23:13Z

Hi @mravanelli,

Thanks for your suggestions.

The code should be easily runnable by typing the following command:
python train.py hparams/robust_asr_16k.yaml --data_folder=<data_folder_path>
However, this is not the case with your code. Users need to specify the following parameters: whisper_folder, pretrained_whisper_model, and pretrained_enhance_path. I suggest putting the required model on the SpeechBrain HF repo. I can give you access to it. This way, we can automatically download the models from HF, just like in many other recipes. Here's an example:
pretrained_enhance_path: speechbrain/RescueSpeech_Sepformer

HF links added, dropbox links added, and hparams/robust_asr_16k.yaml modified such users would need to supply --data_folder=<data_folder_path> argument. ✔

In the README, please add only the best model to the table, specifically the whisper fine-tuned and Sepformer-fine tuned models. Optionally, you can include the HF and Dropbox links for the fine-tuned Sepformer and whisper models. I can give you access to the new SpeechBrain Dropbox repo. Please ensure that these links are working correctly.

README has now HF and dropbox to RescueSpeech fine-tuned Whisper and SepFormer models. ✔
However, Whisper model card, is not yet working as expectedly ❌

Regarding the HF repo of the final model, you should upload both the final whisper and Sepformer models. You will need two YAML files for this. In the runnable example, show how to enhance the signal first and then how to run speech recognition on the enhanced signal. You can refer to examples of a Sepformer HF repo here and a whisper HF repo here.

In progress, to be updated soon ❌

Please add recipe tests. See https://github.com/speechbrain/speechbrain/tree/develop/tests/recipes (follow up with me if you have problem setting the tests up).

Added ✔

Minor suggestion: Add missing docstrings in rescuespeech_prepare.py for the functions unicode_normalisation and strip_accents.

Added ✔

The dataset is not a single file in Zenodo. Do we need to download the entire dataset? It would be easier if there was a single file available.

Not really, to run this experiments users would only need to download the Task_ASR.tar.gz file which is primarily made for ASR experiments. Please refer dataset.md for further details.

Fix trailing-whitespace in recipes/RescueSpeech/dataset.md

Fixed ✔

mravanelli · 2023-07-02T20:17:45Z

Thank you, @sangeet2020. I have made some minor modifications to improve the README file. Additionally, I have a few more comments:

It's important to double-check the list of external dependencies and ensure that only the strictly necessary ones are included.
The current version of the code only works with an older version of the transformer library (< 0.4.29) due to the issue mentioned here. @poonehmousavi is currently working on resolving this issue, as well as addressing performance problems with our Whisper interface. I would highly recommend closely synchronizing your efforts with her to find a solution for both problems.
It's worth mentioning in the README that this recipe is quite memory demanding. Please provide a note specifying the amount of computing resources required to train this recipe.
Lastly, please be aware that the final HF model repository is still empty. We should address this by ensuring the necessary files are included.

mravanelli · 2023-07-06T01:08:19Z

Hi @sangeet2020,

we just merged the fix for whisper. Could you please merge the development branch here and make sure everything is fine?
Also, make sure to complete the HF repo and make sure everything is working: https://huggingface.co/sangeet2020/noisy-whisper-resucespeech

mravanelli · 2023-07-09T17:36:19Z

@sangeet2020, please let me know once my comments are addressed. I think we are very close to merge this PR.

mravanelli · 2023-07-11T00:01:20Z

Hi @sangeet2020, I have tested the training after merging the dev branch, and everything is working. Additionally, I've also conducted inference using the following Hugging Face models, and all of them ran without any issues:

Whisper Rescuespeech: link
Sepformer Rescuespeech: link

Please note that the browser API is currently unavailable as our whisper interface is only present in the dev branch. However, it will be included in the main branch once we release the new version.

The only remaining item before merging this PR is the creation of the following interface:

Noisy Whisper Rescuespeech: link

To proceed, we need to include a speech enhancement model followed by an ASR model. Please upload both models to this repository and provide an example showcasing their usage. Here's a sample code snippet that demonstrates how to utilize these models:

from speechbrain.pretrained import SepformerSeparation as Separator
from speechbrain.pretrained import WhisperASR

enh_model = Separator.from_hparams(source="speechbrain/rescuespeech", savedir='pretrained_models/rescuespeech_sepformer')
asr_model = WhisperASR.from_hparams(source="speechbrain/rescuespeech", savedir="pretrained_models/rescuespeech_whisper")

# For custom file, change the path accordingly
est_sources = enh_model.separate_file(path='speechbrain/rescuespeech_sepformer/example_rescuespeech16k.wav')
asr_model(est_sources[:, :, 0])

sangeet2020 · 2023-07-11T21:59:06Z

Hi Mirco,
Apologies for the delayed response.

I have put only the necessary libraries needed to run the training script in requirements.txt.
I have tried running the script (debug mode) and after pulling the latest changes, the script works. Even loading pre-trained models (trained on old transformers) and fine-tuning works. In short, everything is backwards compatible with the new version of the transformer.
README.md has been complemented with a special note concerning the resources needed for training.
All models have been uploaded in the HuggingFace.

HF repo for noise robust Whisper ASR on RescueSpeech: https://huggingface.co/speechbrain/noisy-whisper-resucespeech
As of writing this comment, everything is working on my end. If any issues, reply on this thread.

Thank You

mravanelli · 2023-07-13T21:06:56Z

LGTM! Thank you @sangeet2020!

root and others added 5 commits June 5, 2023 22:29

first commit

d2a2e69

add sym link to data prep script

fc144bf

minor fix

17c5e8b

remove model version name

adde7d4

add dataset details

e360ba8

sangeet2020 changed the title ~~New recipe for Rescue speech dataset~~ New recipe for RescueSpeech dataset Jun 6, 2023

sangeet2020 and others added 14 commits June 11, 2023 00:53

remove undesired comment

5b9a087

add functions to process csv files for task- enhancement

996ba85

add paper citations

a64209c

add paper citations in the README

152c309

add scripts for sepformer finetuning

eb4a162

minor fix

ff8b3c9

minor fixes

704cbd6

readme, update some instructions

a917874

remove history file

5357221

change in documentations

4d08c33

clean ups

84f43d1

add scripts for independent model training

86ca625

Merge branch 'speechbrain:develop' into RescueSpeech

559a018

add joint training scripts

80f1406

mravanelli self-requested a review June 17, 2023 14:08

mravanelli assigned sangeet2020 Jun 17, 2023

Adel-Moumen assigned mravanelli Jun 20, 2023

sangeet2020 and others added 7 commits June 23, 2023 00:25

fix sampleing rate when dataloading

e4ef200

remove absolute paths

d455e98

freeze whisper encoder

afb0187

minor changes in hyperparams

fd35c06

modify link to dataset

45481f8

add readme for all tasks

08cba4f

fix formatting

ff65ebc

sangeet2020 and others added 3 commits June 25, 2023 05:06

fix formatting

c24f1ca

Merge branch 'speechbrain:develop' into RescueSpeech

6c6ef72

Update dataset.md

86d4944

sangeet2020 and others added 6 commits June 27, 2023 22:26

keep scripts for best results, remove the rest

0567c74

removejoint training with CTC

292878f

modify readme

b7ea89d

change dir structure

37caaae

minor fix

ba2ad96

Merge branch 'speechbrain:develop' into RescueSpeech

61c4af2

sangeet2020 and others added 4 commits July 2, 2023 00:07

Merge branch 'speechbrain:develop' into RescueSpeech

9b9c6c2

add hf, dropbox, recipes.csv

555de27

fix pre-commit conflicts

9447794

add field- noise_wav

6707d09

mravanelli added 2 commits July 2, 2023 11:39

Update README.md

042fd1e

Update README.md

07e2458

sangeet2020 added 2 commits July 3, 2023 00:58

add transformers

9eadc9e

add clarity to training data; add computing power details

3f1959f

Merge branch 'speechbrain:develop' into RescueSpeech

bb1a1e7

mravanelli approved these changes Jul 13, 2023

View reviewed changes

mravanelli merged commit 3c2f72d into speechbrain:develop Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New recipe for RescueSpeech dataset#2017

New recipe for RescueSpeech dataset#2017
mravanelli merged 44 commits intospeechbrain:developfrom
sangeet2020:RescueSpeech

sangeet2020 commented Jun 6, 2023 •

edited

Loading

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

sangeet2020 commented Jun 27, 2023

Uh oh!

mravanelli commented Jun 28, 2023

Uh oh!

sangeet2020 commented Jul 2, 2023 •

edited

Loading

Uh oh!

mravanelli commented Jul 2, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

mravanelli commented Jul 9, 2023

Uh oh!

mravanelli commented Jul 11, 2023

Uh oh!

sangeet2020 commented Jul 11, 2023

Uh oh!

mravanelli commented Jul 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sangeet2020 commented Jun 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Dataset Details

Testing

Documentation

Checklist

TDOs

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

sangeet2020 commented Jun 27, 2023

Uh oh!

mravanelli commented Jun 28, 2023

Uh oh!

sangeet2020 commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jul 2, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

mravanelli commented Jul 9, 2023

Uh oh!

mravanelli commented Jul 11, 2023

Uh oh!

sangeet2020 commented Jul 11, 2023

Uh oh!

mravanelli commented Jul 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sangeet2020 commented Jun 6, 2023 •

edited

Loading

sangeet2020 commented Jul 2, 2023 •

edited

Loading