Skip to content

Latest commit

 

History

History
 
 

With understanding, there is no fear.

    release | main             | business
      CI/CD |   \--- develop   | as usual
  ecosystem |         \   \<~> testing-refactoring   |  the tricky
refactoring |          \--- unstable <~>/            | bits & pieces

The testing-refactoring branch contains this folder:
https://github.com/speechbrain/speechbrain/tree/hf-interface-testing/updates_pretrained_models

which contains folders identical to the model names uploaded to HuggingFace:
https://huggingface.co/speechbrain

e.g. hf-interface-testing/updates_pretrained_models/asr-wav2vec2-librispeech outlines testing of the pretrained model for speechbrain/asr-wav2vec2-librispeech and the folders can contain:

  • test.yaml - test definition w/ integrated code [mandatory]
  • hyperparams.yaml - the standing (or updated) specification [mandatory]
  • custom_interface.py - the standing (or updated) custom interface [optional]

Note: changing parameters mean either a model revision &/or a new model.

While hyperparams.yaml & custom_interface.py shall be updated through PRs complementary to conventional PRs, test.yaml is to be defined once only (and fixed when needed). Such a complementary PR is for example: speechbrain#1801

Note: to update files relevant to the testing as an operation, like test.yaml, please create and manage separate PRs for this. Their nature as a PR is of a different kind (not the change of a pretrained model itself).

Depending on the testing need, test.yaml grows - some examples

  1. ssl-wav2vec2-base-librispeech/test.yaml - the play between test sample, interface class, and batch function is handled via HF testing in tests/utils
    sample: example.wav # test audio provided via HF repo
    cls: WaveformEncoder # existing speechbrain.inference class
    fnx: encode_batch # it's batch-wise function after audio loading
  2. asr-wav2vec2-librispeech/test.yaml - testing single example & against a dataset test partition
    sample: example.wav # as above
    cls: EncoderASR # as above
    fnx: transcribe_batch # as above
    dataset: LibriSpeech # which dataset to use -> will create a tests/tmp/LibriSpeech folder
    recipe_yaml: recipes/LibriSpeech/ASR/CTC/hparams/train_hf_wav2vec.yaml # the training recipe for dataloader etc
    overrides: # what of the recipe_yaml needs to be overridden
      output_folder: !ref tests/tmp/<dataset> # the output folder is at the tmp dataset (data prep & eval tasks only)
    dataio: | # which dataio_prepare to import; copy/paste from train_with_wav2vec.py — pay attention to the last line (their dataio_prepare needs to know how to prepare the recipe dataset)
        from recipes.LibriSpeech.librispeech_prepare import prepare_librispeech
        run_on_main(
            prepare_librispeech,
            kwargs={
                "data_folder": recipe_hparams["data_folder"],
                "tr_splits": recipe_hparams["train_splits"],
                "dev_splits": recipe_hparams["dev_splits"],
                "te_splits": recipe_hparams["test_splits"],
                "save_folder": recipe_hparams["output_folder"],
                "merge_lst": recipe_hparams["train_splits"],
                "merge_name": "train.csv",
                "skip_prep": recipe_hparams["skip_prep"],
            },
        )
        from recipes.LibriSpeech.ASR.CTC.train_with_wav2vec import dataio_prepare
    test_datasets: dataio_prepare(recipe_hparams)[2] # where to get the test dataset from that prep pipeline (w/ input args)
    test_loader: test_dataloader_opts # dataloader name as in recipe_yaml
    performance: # which metric classes are used in the training recipe
      CER: # name for testing
        handler: cer_computer # name as in recipe_yaml
        field: error_rate # field/function as used in train script
      WER: # another one
        handler: error_rate_computer # another one
        field: error_rate # another one
    predicted: "[wrd.split(' ') for wrd in predictions[0]]" # what of the forward to use to compute metrics
    targeted: "[wrd.split(' ') for wrd in batch.wrd]" # what of the batch ground-of-truth to use to compute metrics
    to_stats: ids, predicted, targeted # what the metric computation needs from each batch
  3. emotion-recognition-wav2vec2-IEMOCAP/test.yaml - custom interfaces
    sample: anger.wav # as above
    cls: CustomEncoderWav2vec2Classifier # => name of custom class provided through custom interface
    fnx: classify_batch # as above
    foreign: custom_interface.py # name of custom interface availed through HF repo
    dataset: IEMOCAP # as above
    recipe_yaml: recipes/IEMOCAP/emotion_recognition/hparams/train_with_wav2vec2.yaml # as above
    overrides: # as above
      output_folder: !ref tests/tmp/<dataset> # as above
    dataio: | # as above
        from recipes.IEMOCAP.emotion_recognition.iemocap_prepare import prepare_data
        run_on_main(
            prepare_data,
            kwargs={
                "data_original": recipe_hparams["data_folder"],
                "save_json_train": recipe_hparams["train_annotation"],
                "save_json_valid": recipe_hparams["valid_annotation"],
                "save_json_test": recipe_hparams["test_annotation"],
                "split_ratio": [80, 10, 10],
                "different_speakers": recipe_hparams["different_speakers"],
                "test_spk_id": recipe_hparams["test_spk_id"],
                "seed": recipe_hparams["seed"],
            },
        )
        from recipes.IEMOCAP.emotion_recognition.train_with_wav2vec2 import dataio_prep
    test_datasets: dataio_prep(recipe_hparams)["test"] # as above
    test_loader: dataloader_options # as above
    performance: # as above
      ClassError: # as above
        handler: error_stats # as above
        field: average # as above
    predicted: predictions[0] # as above
    targeted: batch.emo_encoded[0] # as above
    to_stats: ids, predicted, targeted, wav_lens # as above

When testing the HF snippets, use the functions gather_expected_results() and gather_refactoring_results(). They will create another yaml in which the gather before/after refactoring test results. While standing interfaces are drawn from HF repos, their updated/refactored counterparts need to be specified to clone the PR git+branch into, e.g., tests/tmp/hf_interfaces. See the default values:

def gather_refactoring_results(
    glob_filter="*",
    new_interfaces_git="https://github.com/speechbrain/speechbrain",  # change to yours
    new_interfaces_branch="hf-interface-testing",  # maybe you have another branch
    new_interfaces_local_dir="tests/tmp/hf_interfaces",  # you can leave this, or put it elsewhere
    yaml_path="tests/tmp/refactoring_results.yaml",  # same here, change only if necessary
):
    ...

Examples:

# expected result(s) for one audio
# git checkout develop
python -c "from tests.utils.refactoring_checks import gather_expected_results;gather_expected_results('asr-wav2vec2-ctc-aishell')"

# result(s) after refactoring
# git checkout refactor_branch
python -c "from tests.utils.refactoring_checks import gather_refactoring_results;gather_refactoring_results('asr-wav2vec2-ctc-aishell')"

this will give a warning

WARNING - no audio found on HF: asr-wav2vec2-ctc-aishell/example.wav

this means that tests/samples/single-mic/example1.wav is taken instead.


When testing against a dataset's test partition, this function is used: test_performance(). It will be handled through the main function of tests/utils/refactoring_checks.py, which expects its own config e.g.: tests/utils/overrides.yaml.

Example:

LibriSpeech_data: !PLACEHOLDER
CommonVoice_EN_data: !PLACEHOLDER
CommonVoice_FR_data: !PLACEHOLDER
IEMOCAP_data: !PLACEHOLDER

new_interfaces_git: https://github.com/speechbrain/speechbrain
new_interfaces_branch: hf-interface-testing
new_interfaces_local_dir: tests/tmp/hf_interfaces

# Filter HF repos (will be used in a local glob dir crawling)
# glob_filter: "*wav2vec2*"
# glob_filter: "*libri*"
glob_filter: "*"

# put False to test 'before' only, e.g. via override
after: True

LibriSpeech:
  data_folder: !ref <LibriSpeech_data>
  skip_prep: True # assuming you know what you do ;)

CommonVoice_EN:
  data_folder: !ref <CommonVoice_EN_data>

CommonVoice_FR:
  data_folder: !ref <CommonVoice_FR_data>

IEMOCAP:
  data_folder: !ref <IEMOCAP_data>

Example call:

python tests/utils/refactoring_checks.py tests/utils/overrides.yaml --LibriSpeech_data="" --CommonVoice_EN_data="" --CommonVoice_FR_data="" --IEMOCAP_data=""
--glob_filter="*commonvoice*"

The use case for this construction is a legacy-preserving refactoring, providing an alternative interface.

Note: please feel free to create your own derived overrides.yaml (for specific cases).


The unstable branch serves to collect a series of legacy-breaking PRs before making a major release through develop.

Note: ofc, the just introduced testing-refactoring strategy is applicable here, also. Especially, as it relaxes testing demands.