Conversation
|
To what is this variable connected? a yaml parameter? |
yeah kinda - but it might be in conflict with some other concept beneath. well, actually, the sorting flag is the flag? |
|
just put it to "False" instead; this simply reverts the other PR that caused the bug w/o needing to rethink recipe & hparam logic. |
|
Added a test and an initial fix outline. The test uses a CPU ddp backend, I run into these nearby-sortings w/ sorting & ddp: edit: also, only 38/100 examples are sampled when DDP uses sorting, which should not be. W/o ddp (and shuffling ddp), all works fine. |
|
The test needed to be adjusted to DDP & pytorch's DistributedSampler which has: Test works for the proposed fix on DDP & sorting. Bc of the drop_last & replicas, the final elements of the last batch won't be sorted within that batch. Yet, DDP seems to distribute the batches w/ replicas BUT
The sorting asserts are thus tailored to the dummy data (and a bit lazy with "[:-1]" subset testing across all batches, instead of final batch only). Ready for review. |
| "Killing process " + str() + "\n" | ||
| "Not enough GPUs available!" | ||
| ) | ||
| if not run_opts["distributed_backend"] == "gloo": |
There was a problem hiding this comment.
Again, why is gloo treated like that? :p
There was a problem hiding this comment.
CPU vs GPU - there is no GPU testing on github, or pay for it :p
There was a problem hiding this comment.
the way it is implemented for GPU is incompatible with CPU: gloo will crash when run in cpu only.
This pytorch example for gloo is incompatible with SB, as it was before the contributed fix
https://pytorch.org/tutorials/intermediate/dist_tuto.html
There was a problem hiding this comment.
for this part, if there is only CPU, ofc there are not enough GPUs ;)
As described in #1722 , there is an issue with sorting and DDP.
This is to open up a PR. The goal is not fixing all possibilities - the goal is to restore functionality.
A testing is not yet implemented (needs DDP set-up; that might be a different sort of github workflow altogether).
speechbrain/speechbrain/core.py
Line 726 in c229dbc
defines a shuffle variable that is simply re-used for the flags that were redefined in #1518 ; defaults to
False, such that sorting is kept if wanted througout.