Core¶
Attention Mechanism¶
AttentionWrapperState¶
- class texar.torch.core.AttentionWrapperState(cell_state, attention, time, alignments, alignment_history, attention_state)[source]¶
A namedtuple storing the state of an
AttentionWrapper
.- property cell_state¶
The state of the wrapped RNNCell at the previous time step.
- property attention¶
The attention emitted at the previous time step.
- property time¶
The current time step.
- property alignments¶
A single or tuple of tensor(s) containing the alignments emitted at the previous time step for each attention mechanism.
- property alignment_history¶
(If enabled) A single or tuple of list(s) containing alignment matrices from all time steps for each attention mechanism. Call torch.stack on each list to convert to a torch.Tensor.
- property attention_state¶
A single or tuple of nested objects containing attention mechanism states for each attention mechanism.
LuongAttention¶
- class texar.torch.core.LuongAttention(num_units, encoder_output_size, scale=False, probability_fn=None, score_mask_value=None)[source]¶
Implements Luong-style (multiplicative) attention scoring. This attention has two forms.
The first is standard Luong attention, as described in: Minh-Thang Luong, Hieu Pham, Christopher D. Manning. [Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015.]
The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with parameter scale=True.
- Parameters
num_units – The depth of the attention mechanism.
encoder_output_size – The output size of the encoder cell.
scale – Python boolean. Whether to scale the energy term.
probability_fn (optional) – probabilities. The default is torch.nn.softmax. Other options include
hardmax()
andsparsemax()
. Its signature should be:probabilities = probability_fn(score)
.score_mask_value (optional) – into probability_fn. The default is -inf. Only used if
memory_sequence_length
is not None.
- forward(query, state, memory, memory_sequence_length=None)[source]¶
Score the query based on the keys and values.
- Parameters
query – tensor, shaped
[batch_size, query_depth]
.state – tensor, shaped
[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped
[batch_size, max_time, ...]
.memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
Tensor of dtype matching
memory
and shape[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).
BahdanauAttention¶
- class texar.torch.core.BahdanauAttention(num_units, decoder_output_size, encoder_output_size, normalize=False, probability_fn=None, score_mask_value=None)[source]¶
Implements Bahdanau-style (additive) attention. This attention has two forms.
The first is Bahdanau attention, as described in: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate.” ICLR 2015.
The second is the normalized form. This form is inspired by the weight normalization article: Tim Salimans, Diederik P. Kingma. “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” To enable the second form, construct the object with parameter normalize=True.
- Parameters
num_units – The depth of the query mechanism.
decoder_output_size – The output size of the decoder cell.
encoder_output_size – The output size of the encoder cell.
normalize – bool. Whether to normalize the energy term.
probability_fn (optional) – probabilities. The default is torch.nn.softmax. Other options include
hardmax()
andsparsemax()
. Its signature should be:probabilities = probability_fn(score)
:.score_mask_value (optional) – The mask value for score before passing into
probability_fn
. The default is -inf. Only used ifmemory_sequence_length
is not None.
- forward(query, state, memory, memory_sequence_length=None)[source]¶
Score the query based on the keys and values.
- Parameters
query – tensor, shaped
[batch_size, query_depth]
.state – tensor, shaped
[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped
[batch_size, max_time, ...]
.memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
Tensor of dtype matching
memory
and shape[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).
BahdanauMonotonicAttention¶
- class texar.torch.core.BahdanauMonotonicAttention(num_units, decoder_output_size, encoder_output_size, normalize=False, score_mask_value=None, sigmoid_noise=0.0, score_bias_init=0.0, mode='parallel')[source]¶
Monotonic attention mechanism with Bahdanau-style energy function. This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output time steps. It achieves this by using the
_monotonic_probability_fn()
instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in: Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017.- Parameters
num_units – The depth of the query mechanism.
decoder_output_size – The output size of the decoder cell.
encoder_output_size – The output size of the encoder cell.
normalize – Python boolean. Whether to normalize the energy term.
score_mask_value – (optional): The mask value for score before passing into
probability_fn
. The default is -inf. Only used ifmemory_sequence_length
is not None.sigmoid_noise – Standard deviation of pre-sigmoid noise. Refer to
_monotonic_probability_fn()
for more information.score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
mode – How to compute the attention distribution. Must be one of
"recursive"
,"parallel"
, or"hard"
. Refer tomonotonic_attention()
for more information.
- forward(query, state, memory, memory_sequence_length=None)[source]¶
Score the query based on the keys and values.
- Parameters
query – tensor, shaped
[batch_size, query_depth]
.state – tensor, shaped
[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped
[batch_size, max_time, ...]
.memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
Tensor of dtype matching
memory
and shape[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).
LuongMonotonicAttention¶
- class texar.torch.core.LuongMonotonicAttention(num_units, encoder_output_size, scale=False, score_mask_value=None, sigmoid_noise=0.0, score_bias_init=0.0, mode='parallel')[source]¶
Monotonic attention mechanism with Luong-style energy function. This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output time steps. It achieves this by using
_monotonic_probability_fn()
instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in: Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017.- Parameters
num_units – The depth of the query mechanism.
encoder_output_size – The output size of the encoder cell.
scale – Python boolean. Whether to scale the energy term.
score_mask_value – (optional): The mask value for score before passing into
probability_fn
. The default is -inf. Only used ifmemory_sequence_length
is not None.sigmoid_noise – Standard deviation of pre-sigmoid noise. Refer to
_monotonic_probability_fn()
for more information.score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
mode – How to compute the attention distribution. Must be one of
"recursive"
,"parallel"
, or"hard"
. Refer tomonotonic_attention()
for more information.
- forward(query, state, memory, memory_sequence_length=None)[source]¶
Score the query based on the keys and values.
- Parameters
query – tensor, shaped
[batch_size, query_depth]
.state – tensor, shaped
[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped
[batch_size, max_time, ...]
.memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
Tensor of dtype matching
memory
and shape[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).
compute_attention¶
- texar.torch.core.compute_attention(attention_mechanism, cell_output, attention_state, memory, attention_layer, memory_sequence_length=None)[source]¶
Computes the attention and alignments for a given
attention_mechanism
.- Parameters
attention_mechanism – The
AttentionMechanism
instance used to compute attention.cell_output (tensor) – The decoder output (query tensor), shaped
[batch_size, query_depth]
.attention_state (tensor) – tensor, shaped
[batch_size, alignments_size]
(alignments_size
is memory’smax_time
).memory (tensor) – the memory to query; usually the output of an RNN encoder. This tensor should be shaped
[batch_size, max_time, ...]
.attention_layer (torch.nn.Module, optional) – If specified, the attention context is concatenated with
cell_output
, and fed through this layer.memory_sequence_length (tensor, optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
A tuple of (attention, alignments, next_attention_state), where
attention
: The attention context (or the output ofattention_layer
, if specified).alignments
: The computed attention alignments.next_attention_state
: The attention state after the current time step.
monotonic_attention¶
- texar.torch.core.monotonic_attention(p_choose_i, previous_attention, mode)[source]¶
Compute monotonic attention distribution from choosing probabilities. Monotonic attention implies that the input sequence is processed in an explicitly left-to-right manner when generating the output sequence. In addition, once an input sequence element is attended to at a given output time step, elements occurring before it cannot be attended to at subsequent output time steps. This function generates attention distributions according to these assumptions. For more information, see Online and Linear-Time Attention by Enforcing Monotonic Alignments.
- Parameters
p_choose_i – Probability of choosing input sequence/memory element i. Should be of shape (batch_size, input_sequence_length), and should all be in the range [0, 1].
previous_attention – The attention distribution from the previous output time step. Should be of shape (batch_size, input_sequence_length). For the first output time step, previous_attention[n] should be [1, 0, 0, …, 0] for all n in [0, … batch_size - 1].
mode –
How to compute the attention distribution. Must be one of
"recursive"
,"parallel"
, or"hard"
:"recursive"
recursively computes the distribution. This is slowest but is exact, general, and does not suffer from numerical instabilities."parallel"
uses parallelized cumulative-sum and cumulative-product operations to compute a closed-form solution to the recurrence relation defining the attention distribution. This makes it more efficient than"recursive"
, but it requires numerical checks which make the distribution non-exact. This can be a problem in particular when input sequence is long and/orp_choose_i
has entries very close to 0 or 1."hard"
requires that the probabilities inp_choose_i
are all either 0 or 1, and subsequently uses a more efficient and exact solution.
- Returns
A tensor of shape (batch_size, input_sequence_length) representing the attention distributions for each sequence in the batch.
- Raises
ValueError – mode is not one of
"recursive"
,"parallel"
,"hard"
.
hardmax¶
sparsemax¶
Cells¶
default_rnn_cell_hparams¶
- texar.torch.core.default_rnn_cell_hparams()[source]¶
Returns a dict of RNN cell hyperparameters and their default values.
{ "type": "LSTMCell", "input_size": 256, "kwargs": { "hidden_size": 256 }, "num_layers": 1, "dropout": { "input_keep_prob": 1.0, "output_keep_prob": 1.0, "state_keep_prob": 1.0, "variational_recurrent": False, }, "residual": False, "highway": False, }
Here:
- “type”: str or cell class or cell instance
The RNN cell type. This can be
The string name or full module path of a cell class. If class name is provided, the class must be in module
torch.nn.modules.rnn
,texar.torch.core.cell_wrappers
, ortexar.torch.custom
.A cell class.
An instance of a cell class. This is not valid if “num_layers” > 1.
For example
"type": "LSTMCell" # class name "type": "torch.nn.GRUCell" # module path "type": "my_module.MyCell" # module path "type": torch.nn.GRUCell # class "type": LSTMCell(hidden_size=100) # cell instance "type": MyCell(...) # cell instance
- “kwargs”: dict
Keyword arguments for the constructor of the cell class. A cell is created by
cell_class(**kwargs)
, where cell_class is specified in “type” above.Ignored if “type” is a cell instance.
Note
It is unnecessary to specify “input_size” within “kwargs”. This value will be automatically filled based on layer index.
Note
Although PyTorch uses “hidden_size” to denote the hidden layer size, we follow TensorFlow conventions and use “num_units”.
- “num_layers”: int
Number of cell layers. Each layer is a cell created as above, with the same hyperparameters specified in “kwargs”.
- “dropout”: dict
Dropout applied to the cell in each layer. See
DropoutWrapper
for details of the hyperparameters. If all “*_keep_prob” = 1, no dropout is applied.Specifically, if “variational_recurrent” = True, the same dropout mask is applied across all time steps per batch.
- “residual”: bool
If True, apply residual connection on the inputs and outputs of cell in each layer except the first layer. Ignored if “num_layers” = 1.
- “highway”: bool
If True, apply highway connection on the inputs and outputs of cell in each layer except the first layer. Ignored if “num_layers” = 1.
get_rnn_cell¶
- texar.torch.core.get_rnn_cell(input_size, hparams=None)[source]¶
Creates an RNN cell.
See
default_rnn_cell_hparams()
for all hyperparameters and default values.- Parameters
- Returns
A cell instance.
- Raises
ValueError – If
hparams["num_layers"]
>1 andhparams["type"]
is a class instance.
wrap_builtin_cell¶
- texar.torch.core.wrap_builtin_cell(cell)[source]¶
Convert a built-in torch.nn.RNNCellBase derived RNN cell to our wrapped version.
- Parameters
cell – the RNN cell to wrap around.
- Returns
The wrapped cell derived from
texar.torch.core.cell_wrappers.RNNCellBase
.
RNNCellBase¶
- class texar.torch.core.cell_wrappers.RNNCellBase(cell)[source]¶
The base class for RNN cells in our framework. Major differences over torch.nn.RNNCell are two-fold:
Holds an torch.nn.Module which could either be a built-in RNN cell or a wrapped cell instance. This design allows
RNNCellBase
to serve as the base class for both vanilla cells and wrapped cells.Adds
zero_state()
method for initialization of hidden states, which can also be used to implement batch-specific initialization routines.
- property input_size¶
The number of expected features in the input.
The number of features in the hidden state.
- init_batch()[source]¶
Perform batch-specific initialization routines. For most cells this is a no-op.
RNNCell¶
- class texar.torch.core.cell_wrappers.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]¶
A wrapper over torch.nn.RNNCell.
GRUCell¶
- class texar.torch.core.cell_wrappers.GRUCell(input_size, hidden_size, bias=True)[source]¶
A wrapper over torch.nn.GRUCell.
LSTMCell¶
- class texar.torch.core.cell_wrappers.LSTMCell(input_size, hidden_size, bias=True, forget_bias=None)[source]¶
A wrapper over torch.nn.LSTMCell, additionally providing the option to initialize the forget-gate bias to a constant value.
DropoutWrapper¶
- class texar.torch.core.cell_wrappers.DropoutWrapper(cell, input_keep_prob=1.0, output_keep_prob=1.0, state_keep_prob=1.0, variational_recurrent=False)[source]¶
Operator adding dropout to inputs and outputs of the given cell.
ResidualWrapper¶
HighwayWrapper¶
- class texar.torch.core.cell_wrappers.HighwayWrapper(cell, carry_bias_init=None, couple_carry_transform_gates=True)[source]¶
RNNCell wrapper that adds highway connection on cell input and output.
Based on: R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks”, arXiv preprint arXiv:1505.00387, 2015. https://arxiv.org/pdf/1505.00387.pdf
MultiRNNCell¶
- class texar.torch.core.cell_wrappers.MultiRNNCell(cells)[source]¶
RNN cell composed sequentially of multiple simple cells.
sizes = [128, 128, 64] cells = [BasicLSTMCell(input_size, hidden_size) for input_size, hidden_size in zip(sizes[:-1], sizes[1:])] stacked_rnn_cell = MultiRNNCell(cells)
- property input_size¶
The number of expected features in the input.
The number of features in the hidden state.
- init_batch()[source]¶
Perform batch-specific initialization routines. For most cells this is a no-op.
AttentionWrapper¶
- class texar.torch.core.cell_wrappers.AttentionWrapper(cell, attention_mechanism, attention_layer_size=None, alignment_history=False, cell_input_fn=None, output_attention=True)[source]¶
Wraps another RNNCell with attention.
- property output_size¶
The number of features in the output tensor.
- zero_state(batch_size)[source]¶
Return an initial (zero) state tuple for this
AttentionWrapper
.Note
Please see the initializer documentation for details of how to call
zero_state()
if using anAttentionWrapper
with aBeamSearchDecoder
.- Parameters
batch_size – 0D integer: the batch size.
- Returns
An
AttentionWrapperState
tuple containing zeroed out tensors and Python lists.
- forward(inputs, state, memory, memory_sequence_length=None)[source]¶
Perform a step of attention-wrapped RNN.
Step 1: Mix the
inputs
and previous step’s attention output via cell_input_fn.Step 2: Call the wrapped cell with this input and its previous state.
Step 3: Score the cell’s output with attention_mechanism.
Step 4: Calculate the alignments by passing the score through the normalizer.
Step 5: Calculate the context vector as the inner product between the alignments and the attention_mechanism’s values (memory).
Step 6: Calculate the attention output by concatenating the cell output and context through the attention layer (a linear layer with attention_layer_size outputs).
- Parameters
inputs – (Possibly nested tuple of) Tensor, the input at this time step.
state – An instance of
AttentionWrapperState
containing tensors from the previous time step.memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
memory_sequence_length – (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
- Returns
A tuple (attention_or_cell_output, next_state), where
attention_or_cell_output depending on output_attention.
next_state is an instance of
AttentionWrapperState
containing the state calculated at this time step.
- Raises
TypeError – If state is not an instance of
AttentionWrapperState
.
Layers¶
get_layer¶
- texar.torch.core.get_layer(hparams)[source]¶
Makes a layer instance.
The layer must be an instance of torch.nn.Module.
- Parameters
Hyperparameters of the layer, with structure:
{ "type": "LayerClass", "kwargs": { # Keyword arguments of the layer class # ... } }
Here:
- ”type”: str or layer class or layer instance
The layer type. This can be
The string name or full module path of a layer class. If the class name is provided, the class must be in module torch.nn.Module,
texar.torch.core
, ortexar.torch.custom
.A layer class.
An instance of a layer class.
For example
"type": "Conv1D" # class name "type": "texar.torch.core.MaxReducePooling1D" # module path "type": "my_module.MyLayer" # module path "type": torch.nn.Module.Linear # class "type": Conv1D(filters=10, kernel_size=2) # cell instance "type": MyLayer(...) # cell instance
- ”kwargs”: dict
A dictionary of keyword arguments for constructor of the layer class. Ignored if
"type"
is a layer instance.Arguments named “activation” can be a callable, or a str of the name or module path to the activation function.
Arguments named “*_regularizer” and “*_initializer” can be a class instance, or a dict of hyperparameters of respective regularizers and initializers. See
Arguments named “*_constraint” can be a callable, or a str of the name or full path to the constraint function.
- Returns
A layer instance. If
hparams["type"]
is a layer instance, returns it directly.- Raises
ValueError – If
hparams
is None.ValueError – If the resulting layer is not an instance of torch.nn.Module.
MaxReducePool1d¶
- class texar.torch.core.MaxReducePool1d[source]¶
A subclass of torch.nn.Module. Max Pool layer for 1D inputs. The same as torch.nn.MaxPool1d except that the pooling dimension is entirely reduced (i.e., pool_size=input_length).
- forward(input)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
AvgReducePool1d¶
- class texar.torch.core.AvgReducePool1d[source]¶
A subclass of torch.nn.Module. Avg Pool layer for 1D inputs. The same as torch.nn.AvgPool1d except that the pooling dimension is entirely reduced (i.e., pool_size=input_length).
- forward(input)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
get_pooling_layer_hparams¶
- texar.torch.core.get_pooling_layer_hparams(hparams)[source]¶
Creates pooling layer hyperparameters dict for
get_layer()
.If the
hparams
sets ‘pool_size’ to None, the layer will be changed to the respective reduce-pooling layer. For example, torch.conv.MaxPool1d is replaced withMaxReducePool1d
.
MergeLayer¶
- class texar.torch.core.MergeLayer(layers=None, mode='concat', dim=None)[source]¶
A subclass of torch.nn.Module. A layer that consists of multiple layers in parallel. Input is fed to each of the parallel layers, and the outputs are merged with a specified mode.
- Parameters
layers (list, optional) –
A list of torch.nn.Module instances, or a list of hyperparameter dictionaries each of which specifies “type” and “kwargs” of each layer (see the hparams argument of
get_layer()
).If None, this layer degenerates to a merging operator that merges inputs directly.
mode (str) –
Mode of the merge op. This can be:
'concat'
: Concatenates layer outputs along one dim. Tensors must have the same shape except for the dimension specified in dim, which can have different sizes.'elemwise_sum'
: Outputs element-wise sum.'elemwise_mul'
: Outputs element-wise product.'sum'
: Computes the sum of layer outputs along the dimension given by dim. For example, given dim=1, two tensors of shape [a, b] and [a, c] respectively will result in a merged tensor of shape [a].'mean'
: Computes the mean of layer outputs along the dimension given in dim.'prod'
: Computes the product of layer outputs along the dimension given in dim.'max'
: Computes the maximum of layer outputs along the dimension given in dim.'min'
: Computes the minimum of layer outputs along the dimension given in dim.'and'
: Computes the logical and of layer outputs along the dimension given in dim.'or'
: Computes the logical or of layer outputs along the dimension given in dim.'logsumexp'
: Computes log(sum(exp(elements across the dimension of layer outputs)))
dim (int) – The dim to use in merging. Ignored in modes
'elemwise_sum'
and'elemwise_mul'
.
- forward(input)[source]¶
Feed input to every containing layer and merge the outputs.
- Parameters
input – The input tensor.
- Returns
The merged tensor.
- property layers¶
The list of parallel layers.
Flatten¶
Identity¶
default_regularizer_hparams¶
get_regularizer¶
- texar.torch.core.get_regularizer(hparams=None)[source]¶
Returns a variable regularizer instance.
See
default_regularizer_hparams()
for all hyperparameters and default values.The “type” field can be a subclass of
Regularizer
, its string name or module path, or a class instance.- Parameters
hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameters are set to default values.
- Returns
A
Regularizer
instance. None ifhparams
is None or taking the default hyperparameter value.- Raises
ValueError – The resulting regularizer is not an instance of
Regularizer
.
get_initializer¶
- texar.torch.core.get_initializer(hparams=None)[source]¶
Returns an initializer instance.
- Parameters
hparams (dict or HParams, optional) –
Hyperparameters with the structure
{ "type": "initializer_class_or_function", "kwargs": { # ... } }
The “type” field can be a function name or module path. If name is provided, it be must be from one the following modules: torch.nn.init and
texar.torch.custom
.Besides, the “type” field can also be an initialization function called with
initialization_fn(**kwargs)
. In this case “type” can be the function, or its name or module path. If no keyword argument is required, “kwargs” can be omitted.- Returns
An initializer instance. None if
hparams
is None.
get_activation_fn¶
- texar.torch.core.get_activation_fn(fn_name=None, kwargs=None)[source]¶
Returns an activation function fn with the signature output = fn(input).
If the function specified by
fn_name
has more than one arguments without default values, then all these arguments except the input feature argument must be specified inkwargs
. Arguments with default values can also be specified inkwargs
to take values other than the defaults. In this case a partial function is returned with the above signature.- Parameters
fn_name (str or callable) –
An activation function, or its name or module path. The function can be:
Built-in function defined in torch.nn.functional
User-defined activation functions in module
texar.torch.custom
.External activation functions. Must provide the full module path, e.g.,
"my_module.my_activation_fn"
.
kwargs (optional) – A dict or instance of
HParams
containing the keyword arguments of the activation function.
- Returns
An activation function. None if
fn_name
is None.
Optimization¶
default_optimization_hparams¶
- texar.torch.core.default_optimization_hparams()[source]¶
Returns a dict of default hyperparameters of training op and their default values
{ "optimizer": { "type": "Adam", "kwargs": { "lr": 0.001 } }, "learning_rate_decay": { "type": "", "kwargs": {} }, "gradient_clip": { "type": "", "kwargs": {} }, "gradient_noise_scale": None, "name": None }
Here:
- “optimizer”: dict
Hyperparameters of a torch.optim.Optimizer.
“type” specifies the optimizer class. This can be
The string name or full module path of an optimizer class. If the class name is provided, the class must be in module torch.optim or
texar.torch.custom
,texar.torch.core.optimization
An optimizer class.
An instance of an optimizer class.
For example
"type": "Adam" # class name "type": "my_module.MyOptimizer" # module path "type": texar.torch.custom.BertAdam # class "type": my_module.MyOptimizer # class
“kwargs” is a dict specifying keyword arguments for creating the optimizer class instance, with
opt_class(**kwargs)
. Ignored if “type” is a class instance.
- “learning_rate_decay”: dict
Hyperparameters of learning rate decay function. The learning rate starts decay from
"start_decay_step"
and keeps unchanged after"end_decay_step"
or reaching"min_learning_rate"
.The decay function is specified in “type” and “kwargs”.
“type” can be a decay function or its name or module path. If function name is provided, it must be from module torch.optim or
texar.torch.custom
,texar.torch.core.optimization
.“kwargs” is a dict of keyword arguments for the function excluding arguments named “global_step” and “learning_rate”.
The function is called with
lr = decay_fn(learning_rate=lr, global_step=offset_step, **kwargs)
, where offset_step is the global step offset as above.- “gradient_clip”: dict
Hyperparameters of gradient clipping. The gradient clipping function takes a list of (gradients, variables) tuples and returns a list of (clipped_gradients, variables) tuples. Typical examples include torch.nn.utils.clip_grad_norm_ and torch.nn.utils.clip_grad_value_.
“type” specifies the gradient clip function, and can be a function, or its name or module path. If function name is provided, the function must be from module
torch.nn.utils
,texar.torch.custom
, ortexar.torch.core.optimization
.“kwargs” specifies keyword arguments to the function, except arguments named “parameters”.
- “gradient_noise_scale”: float, optional
Adds 0-mean normal noise scaled by this value to gradient.
get_train_op¶
- texar.torch.core.get_train_op(params=None, optimizer=None, scheduler=None, hparams=None)[source]¶
Creates a training op.
- Parameters
params – an iterable of
torch.Tensor
ordict
. Specifies what Tensors should be optimized.optimizer – A torch.optim.Optimizer instance.
scheduler – A torch.optim.lr_scheduler._LRScheduler instance.
hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See
default_optimization_hparams()
for all hyperparameters and default values.
- Returns
The callable used for variable optimization.
get_scheduler¶
- texar.torch.core.get_scheduler(optimizer, hparams=None)[source]¶
Creates a scheduler instance.
- Parameters
optimizer – A torch.optim.Optimizer instance.
hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See
default_optimization_hparams()
for all hyperparameters and default values.
- Returns
A torch.optim.lr_scheduler._LRScheduler instance.
get_optimizer¶
- texar.torch.core.get_optimizer(params, hparams=None)[source]¶
Creates a optimizer instance.
- Parameters
params – an iterable of
torch.Tensor
ordict
. Specifies what Tensors should be optimized.hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See
default_optimization_hparams()
for all hyperparameters and default values.
- Returns
The torch.optim.Optimizer instance specified in
hparams
.
get_grad_clip_fn¶
- texar.torch.core.get_grad_clip_fn(hparams=None)[source]¶
Create a gradient clipping function.
- Parameters
hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See
default_optimization_hparams()
for all hyperparameters and default values.- Returns
A gradient clipping function.