View source on GitHub
|
Layer normalization layer (Ba et al., 2016).
Inherits From: Layer, Operation
tf.keras.layers.LayerNormalization(
axis=-1,
epsilon=0.001,
center=True,
scale=True,
rms_scaling=False,
beta_initializer='zeros',
gamma_initializer='ones',
beta_regularizer=None,
gamma_regularizer=None,
beta_constraint=None,
gamma_constraint=None,
**kwargs
)
Used in the notebooks
| Used in the tutorials |
|---|
Normalize the activations of the previous layer for each given example in a batch independently, rather than across a batch like Batch Normalization. i.e. applies a transformation that maintains the mean activation within each example close to 0 and the activation standard deviation close to 1.
If scale or center are enabled, the layer will scale the normalized
outputs by broadcasting them with a trainable variable gamma, and center
the outputs by broadcasting with a trainable variable beta. gamma will
default to a ones tensor and beta will default to a zeros tensor, so that
centering and scaling are no-ops before training has begun.
So, with scaling and centering enabled the normalization equations are as follows:
Let the intermediate activations for a mini-batch to be the inputs.
For each sample x_i in inputs with k features, we compute the mean and
variance of the sample:
mean_i = sum(x_i[j] for j in range(k)) / k
var_i = sum((x_i[j] - mean_i) ** 2 for j in range(k)) / k
and then compute a normalized x_i_normalized, including a small factor
epsilon for numerical stability.
x_i_normalized = (x_i - mean_i) / sqrt(var_i + epsilon)
And finally x_i_normalized is linearly transformed by gamma and beta,
which are learned parameters:
output_i = x_i_normalized * gamma + beta
gamma and beta will span the axes of inputs specified in axis, and
this part of the inputs' shape must be fully defined.
For example:
layer = keras.layers.LayerNormalization(axis=[1, 2, 3])layer.build([5, 20, 30, 40])print(layer.beta.shape)(20, 30, 40)print(layer.gamma.shape)(20, 30, 40)
Note that other implementations of layer normalization may choose to define
gamma and beta over a separate set of axes from the axes being
normalized across. For example, Group Normalization
(Wu et al. 2018) with group size of 1
corresponds to a Layer Normalization that normalizes across height, width,
and channel and has gamma and beta span only the channel dimension.
So, this Layer Normalization implementation will not match a Group
Normalization layer with group size set to 1.
Args |
|---|
axis
-1 is the last dimension in the
input. Defaults to -1.
epsilon
center
beta to normalized tensor. If False,
beta is ignored. Defaults to True.
scale
gamma. If False, gamma is not used.
When the next layer is linear (also e.g. nn.relu), this can be
disabled since the scaling will be done by the next layer.
Defaults to True.
rms_scaling
center and scale are ignored, and the
inputs are scaled by gamma and the inverse square root
of the square of all inputs. This is an approximate and faster
approach that avoids ever computing the mean of the input.
beta_initializer
gamma_initializer
beta_regularizer
gamma_regularizer
beta_constraint
gamma_constraint
**kwargs
name and dtype).
Reference:
Attributes |
|---|
input
Only returns the tensor(s) corresponding to the first time the operation was called.
output
Only returns the tensor(s) corresponding to the first time the operation was called.
Methods
from_config
@classmethodfrom_config( config )
Creates a layer from its config.
This method is the reverse of get_config,
capable of instantiating the same layer from the config
dictionary. It does not handle layer connectivity
(handled by Network), nor weights (handled by set_weights).
| Args |
|---|
config
| Returns | |
|---|---|
| A layer instance. |
symbolic_call
symbolic_call(
*args, **kwargs
)
View source on GitHub