release | main | business
CI/CD | \--- develop | as usual
ecosystem | \ \<~> testing-refactoring | the tricky
refactoring | \--- unstable <~>/ | bits & pieces
The testing-refactoring branch contains this folder:
https://github.com/speechbrain/speechbrain/tree/hf-interface-testing/updates_pretrained_models
which contains folders identical to the model names uploaded to HuggingFace:
https://huggingface.co/speechbrain
e.g. hf-interface-testing/updates_pretrained_models/asr-wav2vec2-librispeech outlines testing of the pretrained model for speechbrain/asr-wav2vec2-librispeech and the folders can contain:
test.yaml- test definition w/ integrated code [mandatory]hyperparams.yaml- the standing (or updated) specification [mandatory]custom_interface.py- the standing (or updated) custom interface [optional]
Note: changing parameters mean either a model revision &/or a new model.
While hyperparams.yaml & custom_interface.py shall be updated through PRs complementary to conventional PRs, test.yaml is to be defined once only (and fixed when needed).
Such a complementary PR is for example:
speechbrain#1801
Note: to update files relevant to the testing as an operation, like test.yaml, please create and manage separate PRs for this. Their nature as a PR is of a different kind (not the change of a pretrained model itself).
Depending on the testing need, test.yaml grows - some examples
- ssl-wav2vec2-base-librispeech/test.yaml - the play between test sample, interface class, and batch function is handled via HF testing in
tests/utilssample: example.wav # test audio provided via HF repo cls: WaveformEncoder # existing speechbrain.inference class fnx: encode_batch # it's batch-wise function after audio loading
- asr-wav2vec2-librispeech/test.yaml - testing single example & against a dataset test partition
sample: example.wav # as above cls: EncoderASR # as above fnx: transcribe_batch # as above dataset: LibriSpeech # which dataset to use -> will create a tests/tmp/LibriSpeech folder recipe_yaml: recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml # the training recipe for dataloader etc overrides: # what of the recipe_yaml needs to be overridden output_folder: !ref tests/tmp/<dataset> # the output folder is at the tmp dataset (data prep & eval tasks only) dataio: | # which dataio_prepare to import; copy/paste from train_with_wav2vec.py — pay attention to the last line (their dataio_prepare needs to know how to prepare the recipe dataset) from recipes.LibriSpeech.librispeech_prepare import prepare_librispeech run_on_main( prepare_librispeech, kwargs={ "data_folder": recipe_hparams["data_folder"], "tr_splits": recipe_hparams["train_splits"], "dev_splits": recipe_hparams["dev_splits"], "te_splits": recipe_hparams["test_splits"], "save_folder": recipe_hparams["output_folder"], "merge_lst": recipe_hparams["train_splits"], "merge_name": "train.csv", "skip_prep": recipe_hparams["skip_prep"], }, ) from recipes.LibriSpeech.ASR.CTC.train_with_wav2vec import dataio_prepare test_datasets: dataio_prepare(recipe_hparams)[2] # where to get the test dataset from that prep pipeline (w/ input args) test_loader: test_dataloader_opts # dataloader name as in recipe_yaml performance: # which metric classes are used in the training recipe CER: # name for testing handler: cer_computer # name as in recipe_yaml field: error_rate # field/function as used in train script WER: # another one handler: error_rate_computer # another one field: error_rate # another one predicted: "[wrd.split(' ') for wrd in predictions[0]]" # what of the forward to use to compute metrics targeted: "[wrd.split(' ') for wrd in batch.wrd]" # what of the batch ground-of-truth to use to compute metrics to_stats: ids, predicted, targeted # what the metric computation needs from each batch
- emotion-recognition-wav2vec2-IEMOCAP/test.yaml - custom interfaces
sample: anger.wav # as above cls: CustomEncoderWav2vec2Classifier # => name of custom class provided through custom interface fnx: classify_batch # as above foreign: custom_interface.py # name of custom interface availed through HF repo dataset: IEMOCAP # as above recipe_yaml: recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml # as above overrides: # as above output_folder: !ref tests/tmp/<dataset> # as above dataio: | # as above from recipes.IEMOCAP.emotion_recognition.iemocap_prepare import prepare_data run_on_main( prepare_data, kwargs={ "data_original": recipe_hparams["data_folder"], "save_json_train": recipe_hparams["train_annotation"], "save_json_valid": recipe_hparams["valid_annotation"], "save_json_test": recipe_hparams["test_annotation"], "split_ratio": [80, 10, 10], "different_speakers": recipe_hparams["different_speakers"], "test_spk_id": recipe_hparams["test_spk_id"], "seed": recipe_hparams["seed"], }, ) from recipes.IEMOCAP.emotion_recognition.train_with_wav2vec2 import dataio_prep test_datasets: dataio_prep(recipe_hparams)["test"] # as above test_loader: dataloader_options # as above performance: # as above ClassError: # as above handler: error_stats # as above field: average # as above predicted: predictions[0] # as above targeted: batch.emo_encoded[0] # as above to_stats: ids, predicted, targeted, wav_lens # as above
When testing the HF snippets, use the functions gather_expected_results() and gather_refactoring_results().
They will create another yaml in which the gather before/after refactoring test results.
While standing interfaces are drawn from HF repos, their updated/refactored counterparts need to be specified to clone the PR git+branch into, e.g., tests/tmp/hf_interfaces. See the default values:
def gather_refactoring_results(
glob_filter="*",
new_interfaces_git="https://github.com/speechbrain/speechbrain", # change to yours
new_interfaces_branch="hf-interface-testing", # maybe you have another branch
new_interfaces_local_dir="tests/tmp/hf_interfaces", # you can leave this, or put it elsewhere
yaml_path="tests/tmp/refactoring_results.yaml", # same here, change only if necessary
):
...Examples:
# expected result(s) for one audio
# git checkout develop
python -c "from tests.utils.refactoring_checks import gather_expected_results;gather_expected_results('asr-wav2vec2-ctc-aishell')"
# result(s) after refactoring
# git checkout refactor_branch
python -c "from tests.utils.refactoring_checks import gather_refactoring_results;gather_refactoring_results('asr-wav2vec2-ctc-aishell')"this will give a warning
WARNING - no audio found on HF: asr-wav2vec2-ctc-aishell/example.wav
this means that tests/samples/single-mic/example1.wav is taken instead.
When testing against a dataset's test partition, this function is used: test_performance().
It will be handled through the main function of tests/utils/refactoring_checks.py, which expects its own config e.g.:
tests/utils/overrides.yaml.
Example:
LibriSpeech_data: !PLACEHOLDER
CommonVoice_EN_data: !PLACEHOLDER
CommonVoice_FR_data: !PLACEHOLDER
IEMOCAP_data: !PLACEHOLDER
new_interfaces_git: https://github.com/speechbrain/speechbrain
new_interfaces_branch: hf-interface-testing
new_interfaces_local_dir: tests/tmp/hf_interfaces
# Filter HF repos (will be used in a local glob dir crawling)
# glob_filter: "*wav2vec2*"
# glob_filter: "*libri*"
glob_filter: "*"
# put False to test 'before' only, e.g. via override
after: True
LibriSpeech:
data_folder: !ref <LibriSpeech_data>
skip_prep: True # assuming you know what you do ;)
CommonVoice_EN:
data_folder: !ref <CommonVoice_EN_data>
CommonVoice_FR:
data_folder: !ref <CommonVoice_FR_data>
IEMOCAP:
data_folder: !ref <IEMOCAP_data>Example call:
python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data=""
--glob_filter="*commonvoice*"
The use case for this construction is a legacy-preserving refactoring, providing an alternative interface.
Note: please feel free to create your own derived overrides.yaml (for specific cases).
The unstable branch serves to collect a series of legacy-breaking PRs before making a major release through develop.
Note: ofc, the just introduced testing-refactoring strategy is applicable here, also. Especially, as it relaxes testing demands.