View source on GitHub
|
Options for constructing a Checkpoint.
tf.train.CheckpointOptions(
experimental_io_device=None,
experimental_enable_async_checkpoint=False,
enable_async=False
)
Used as the options argument to either tf.train.Checkpoint.save() or
tf.train.Checkpoint.restore() methods to adjust how variables are
saved/restored.
Example: Run IO ops on "localhost" while saving a checkpoint:
step = tf.Variable(0, name="step")
checkpoint = tf.train.Checkpoint(step=step)
options = tf.train.CheckpointOptions(experimental_io_device="/job:localhost")
checkpoint.save("/tmp/ckpt", options=options)
Args |
|---|
experimental_io_device
None (default)
then for each variable the filesystem is accessed from the CPU:0 device
of the host where that variable is assigned. If specified, the
filesystem is instead accessed from that device for all variables.This is for example useful if you want to save to a local directory, such as "/tmp" when running in a distributed setting. In that case pass a device for the host where the "/tmp" directory is accessible.
experimental_enable_async_checkpoint
enable_async
Async checkpoint moves the checkpoint file writing off the main thread, so that the model can continue to train while the checkpoing file writing runs in the background. Async checkpoint reduces TPU device idle cycles and speeds up model training process, while memory consumption may increase.
Attributes |
|---|
enable_async
experimental_enable_async_checkpoint
experimental_io_device
View source on GitHub