View source on GitHub
|
Creates a dataset which reads data from the tf.data service.
tf.data.experimental.service.from_dataset_id(
processing_mode,
service,
dataset_id,
element_spec=None,
job_name=None,
consumer_index=None,
num_consumers=None,
max_outstanding_requests=None,
data_transfer_protocol=None,
cross_trainer_cache=None,
target_workers='AUTO'
) -> tf.data.Dataset
This is useful when the dataset is registered by one process, then used in
another process. When the same process is both registering and reading from
the dataset, it is simpler to use tf.data.experimental.service.distribute
instead.
Before using from_dataset_id, the dataset must have been registered with the
tf.data service using tf.data.experimental.service.register_dataset.
register_dataset returns a dataset id for the registered dataset. That is
the dataset_id which should be passed to from_dataset_id.
The element_spec argument indicates the tf.TypeSpecs for the elements
produced by the dataset. Currently element_spec must be explicitly
specified, and match the dataset registered under dataset_id. element_spec
defaults to None so that in the future we can support automatically
discovering the element_spec by querying the tf.data service.
tf.data.experimental.service.distribute is a convenience method which
combines register_dataset and from_dataset_id into a dataset
transformation.
See the documentation for tf.data.experimental.service.distribute for more
detail about how from_dataset_id works.
dispatcher = tf.data.experimental.service.DispatchServer()dispatcher_address = dispatcher.target.split("://")[1]worker = tf.data.experimental.service.WorkerServer(tf.data.experimental.service.WorkerConfig(dispatcher_address=dispatcher_address))dataset = tf.data.Dataset.range(10)dataset_id = tf.data.experimental.service.register_dataset(dispatcher.target, dataset)dataset = tf.data.experimental.service.from_dataset_id(processing_mode="parallel_epochs",service=dispatcher.target,dataset_id=dataset_id,element_spec=dataset.element_spec)print(list(dataset.as_numpy_iterator()))[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Args |
|---|
processing_mode
tf.data.experimental.service.ShardingPolicy specifying
how to shard the dataset among tf.data workers. See
tf.data.experimental.service.ShardingPolicy for details. For backwards
compatibility, processing_mode may also be set to the strings
"parallel_epochs" or "distributed_epoch", which are respectively
equivalent to ShardingPolicy.OFF and ShardingPolicy.DYNAMIC.
service
[<protocol>://]<address>, where <address> identifies the dispatcher
address and <protocol> can optionally be used to override the default
protocol to use. If it's a tuple, it should be (protocol, address).
dataset_id
register_dataset when the dataset is registered with the tf.data
service.
element_spec
tf.TypeSpecs representing the type of
elements produced by the dataset. This argument is only required inside a
tf.function. Use tf.data.Dataset.element_spec to get the element spec
for a given dataset.
job_name
consumer_index
0
to num_consumers. Must be specified alongside num_consumers. When
specified, consumers will read from the job in a strict round-robin order,
instead of the default first-come-first-served order.
num_consumers
consumer_index. When specified,
consumers will read from the job in a strict round-robin order, instead of
the default first-come-first-served order. When num_consumers is
specified, the dataset must have infinite cardinality to prevent a
producer from running out of data early and causing consumers to go out of
sync.
max_outstanding_requests
distribute won't use more than element_size *
max_outstanding_requests of memory.
data_transfer_protocol
cross_trainer_cache
CrossTrainerCache object is
provided, dataset iteration will be shared across concurrently running
trainers. See
https://www.tensorflow.org/api_docs/python/tf/data/experimental/service#sharing_tfdata_service_with_concurrent_trainers
for details.
target_workers
"AUTO", tf.data
runtime decides which workers to read from. If "ANY", reads from any
tf.data service workers. If "LOCAL", only reads from local in-processs
tf.data service workers. "AUTO" works well for most cases, while users
can specify other targets. For example, "LOCAL" helps avoid RPCs and
data copy if every TF worker colocates with a tf.data service worker.
Consumers of a shared job must use the same target_workers. Defaults to
"AUTO".
Returns | |
|---|---|
A tf.data.Dataset which reads from the tf.data service.
|
View source on GitHub