View source on GitHub
|
Shards computation along the batch dimension for parallel execution.
tf.compat.v1.tpu.batch_parallel(
computation: Callable[..., Any],
inputs: Optional[List[List[Optional[core_types.Tensor]]]] = None,
num_shards: int = 1,
infeed_queue: Optional[tpu_feed.InfeedQueue] = None,
device_assignment: Optional[tf.tpu.experimental.DeviceAssignment] = None,
name: Optional[Text] = None,
xla_options: Optional[tf.tpu.XLAOptions] = None
)
Convenience wrapper around shard().
inputs must be a list of Tensors or None (equivalent to an empty list).
Each input is split into num_shards pieces along the 0-th dimension, and
computation is applied to each shard in parallel.
Tensors are broadcast to all shards if they are lexically captured by
computation. e.g.,
x = tf.constant(7) def computation(): return x + 3 ... = shard(computation, ...)
The outputs from all shards are concatenated back together along their 0-th dimension.
Inputs and outputs of the computation must be at least rank-1 Tensors.
Args |
|---|
computation
inputs
num_shards.
num_shards
infeed_queue
None, the InfeedQueue from which to append a tuple
of arguments as inputs to computation.
device_assignment
None, a DeviceAssignment describing the
mapping between logical cores in the computation with physical cores in
the TPU topology. Uses a default device assignment if None. The
DeviceAssignment may be omitted if each shard of the computation uses
only one core, and there is either only one shard, or the number of shards
is equal to the number of cores in the TPU system.
name
xla_options
tpu.XLAOptions which indicates the options
passed to XLA compiler. Use None for default options.
Returns | |
|---|---|
| A list of output tensors. |
Raises |
|---|
ValueError
num_shards <= 0
View source on GitHub