Core

Attention Mechanism

AttentionWrapperState

class texar.torch.core.AttentionWrapperState[source]

A namedtuple storing the state of an AttentionWrapper.

cell_state

The state of the wrapped RNNCell at the previous time step.

attention

The attention emitted at the previous time step.

time

The current time step.

alignments

A single or tuple of tensor(s) containing the alignments emitted at the previous time step for each attention mechanism.

alignment_history

(If enabled) A single or tuple of list(s) containing alignment matrices from all time steps for each attention mechanism. Call torch.stack on each list to convert to a torch.Tensor.

attention_state

A single or tuple of nested objects containing attention mechanism states for each attention mechanism.

LuongAttention

class texar.torch.core.LuongAttention(num_units: int, encoder_output_size: int, scale: bool = False, probability_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, score_mask_value: Optional[torch.Tensor] = None)[source]

Implements Luong-style (multiplicative) attention scoring. This attention has two forms.

The first is standard Luong attention, as described in: Minh-Thang Luong, Hieu Pham, Christopher D. Manning. [Effective Approaches to Attention-based Neural Machine Translation. EMNLP 2015.]

The second is the scaled form inspired partly by the normalized form of Bahdanau attention. To enable the second form, construct the object with parameter scale=True.

Parameters:
  • num_units – The depth of the attention mechanism.
  • encoder_output_size – The output size of the encoder cell.
  • scale – Python boolean. Whether to scale the energy term.
  • probability_fn (optional) – probabilities. The default is torch.nn.softmax. Other options include hardmax() and sparsemax(). Its signature should be: probabilities = probability_fn(score).
  • score_mask_value (optional) – into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
forward(query: torch.Tensor, state: torch.Tensor, memory: torch.Tensor, memory_sequence_length: Optional[torch.LongTensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Score the query based on the keys and values.

Parameters:
  • query – tensor, shaped [batch_size, query_depth].
  • state – tensor, shaped [batch_size, alignments_size] (alignments_size is memory’s max_time).
  • memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

Tensor of dtype matching memory and shape [batch_size, alignments_size] (alignments_size is memory’s max_time).

BahdanauAttention

class texar.torch.core.BahdanauAttention(num_units: int, decoder_output_size: int, encoder_output_size: int, normalize: bool = False, probability_fn: Optional[Callable[[torch.Tensor], torch.Tensor]] = None, score_mask_value: Optional[torch.Tensor] = None)[source]

Implements Bahdanau-style (additive) attention. This attention has two forms.

The first is Bahdanau attention, as described in: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate.” ICLR 2015.

The second is the normalized form. This form is inspired by the weight normalization article: Tim Salimans, Diederik P. Kingma. “Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks.” To enable the second form, construct the object with parameter normalize=True.

Parameters:
  • num_units – The depth of the query mechanism.
  • decoder_output_size – The output size of the decoder cell.
  • encoder_output_size – The output size of the encoder cell.
  • normalize – bool. Whether to normalize the energy term.
  • probability_fn (optional) – probabilities. The default is torch.nn.softmax. Other options include hardmax() and sparsemax(). Its signature should be: probabilities = probability_fn(score):.
  • score_mask_value (optional) – The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
forward(query: torch.Tensor, state: torch.Tensor, memory: torch.Tensor, memory_sequence_length: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Score the query based on the keys and values.

Parameters:
  • query – tensor, shaped [batch_size, query_depth].
  • state – tensor, shaped [batch_size, alignments_size] (alignments_size is memory’s max_time).
  • memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

Tensor of dtype matching memory and shape [batch_size, alignments_size] (alignments_size is memory’s max_time).

BahdanauMonotonicAttention

class texar.torch.core.BahdanauMonotonicAttention(num_units: int, decoder_output_size: int, encoder_output_size: int, normalize: bool = False, score_mask_value: Optional[torch.Tensor] = None, sigmoid_noise: float = 0.0, score_bias_init: float = 0.0, mode: str = 'parallel')[source]

Monotonic attention mechanism with Bahdanau-style energy function. This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output time steps. It achieves this by using the _monotonic_probability_fn() instead of softmax to construct its attention distributions. Since the attention scores are passed through a sigmoid, a learnable scalar bias parameter is applied after the score function and before the sigmoid. Otherwise, it is equivalent to BahdanauAttention. This approach is proposed in: Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017.

Parameters:
  • num_units – The depth of the query mechanism.
  • decoder_output_size – The output size of the decoder cell.
  • encoder_output_size – The output size of the encoder cell.
  • normalize – Python boolean. Whether to normalize the energy term.
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • sigmoid_noise – Standard deviation of pre-sigmoid noise. Refer to _monotonic_probability_fn() for more information.
  • score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
  • mode – How to compute the attention distribution. Must be one of "recursive", "parallel", or "hard". Refer to monotonic_attention() for more information.
forward(query: torch.Tensor, state: torch.Tensor, memory: torch.Tensor, memory_sequence_length: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Score the query based on the keys and values.

Parameters:
  • query – tensor, shaped [batch_size, query_depth].
  • state – tensor, shaped [batch_size, alignments_size] (alignments_size is memory’s max_time).
  • memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

Tensor of dtype matching memory and shape [batch_size, alignments_size] (alignments_size is memory’s max_time).

LuongMonotonicAttention

class texar.torch.core.LuongMonotonicAttention(num_units: int, encoder_output_size: int, scale: bool = False, score_mask_value: Optional[torch.Tensor] = None, sigmoid_noise: float = 0.0, score_bias_init: float = 0.0, mode: str = 'parallel')[source]

Monotonic attention mechanism with Luong-style energy function. This type of attention enforces a monotonic constraint on the attention distributions; that is once the model attends to a given point in the memory it can’t attend to any prior points at subsequence output time steps. It achieves this by using _monotonic_probability_fn() instead of softmax to construct its attention distributions. Otherwise, it is equivalent to LuongAttention. This approach is proposed in: Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck, “Online and Linear-Time Attention by Enforcing Monotonic Alignments.” ICML 2017.

Parameters:
  • num_units – The depth of the query mechanism.
  • encoder_output_size – The output size of the encoder cell.
  • scale – Python boolean. Whether to scale the energy term.
  • score_mask_value – (optional): The mask value for score before passing into probability_fn. The default is -inf. Only used if memory_sequence_length is not None.
  • sigmoid_noise – Standard deviation of pre-sigmoid noise. Refer to _monotonic_probability_fn() for more information.
  • score_bias_init – Initial value for score bias scalar. It’s recommended to initialize this to a negative value when the length of the memory is large.
  • mode – How to compute the attention distribution. Must be one of "recursive", "parallel", or "hard". Refer to monotonic_attention() for more information.
forward(query: torch.Tensor, state: torch.Tensor, memory: torch.Tensor, memory_sequence_length: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]

Score the query based on the keys and values.

Parameters:
  • query – tensor, shaped [batch_size, query_depth].
  • state – tensor, shaped [batch_size, alignments_size] (alignments_size is memory’s max_time).
  • memory – the memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • memory_sequence_length (optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

Tensor of dtype matching memory and shape [batch_size, alignments_size] (alignments_size is memory’s max_time).

compute_attention

texar.torch.core.compute_attention(attention_mechanism: texar.torch.core.attention_mechanism.AttentionMechanism, cell_output: torch.Tensor, attention_state: torch.Tensor, memory: torch.Tensor, attention_layer: Optional[torch.nn.modules.module.Module], memory_sequence_length: Optional[torch.LongTensor] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Computes the attention and alignments for a given attention_mechanism.

Parameters:
  • attention_mechanism – The AttentionMechanism instance used to compute attention.
  • cell_output (tensor) – The decoder output (query tensor), shaped [batch_size, query_depth].
  • attention_state (tensor) – tensor, shaped [batch_size, alignments_size] (alignments_size is memory’s max_time).
  • memory (tensor) – the memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, ...].
  • attention_layer (torch.nn.Module, optional) – If specified, the attention context is concatenated with cell_output, and fed through this layer.
  • memory_sequence_length (tensor, optional) – sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

A tuple of (attention, alignments, next_attention_state), where

  • attention: The attention context (or the output of attention_layer, if specified).
  • alignments: The computed attention alignments.
  • next_attention_state: The attention state after the current time step.

monotonic_attention

texar.torch.core.monotonic_attention(p_choose_i: torch.Tensor, previous_attention: torch.Tensor, mode: str) → torch.Tensor[source]

Compute monotonic attention distribution from choosing probabilities. Monotonic attention implies that the input sequence is processed in an explicitly left-to-right manner when generating the output sequence. In addition, once an input sequence element is attended to at a given output time step, elements occurring before it cannot be attended to at subsequent output time steps. This function generates attention distributions according to these assumptions. For more information, see Online and Linear-Time Attention by Enforcing Monotonic Alignments.

Parameters:
  • p_choose_i – Probability of choosing input sequence/memory element i. Should be of shape (batch_size, input_sequence_length), and should all be in the range [0, 1].
  • previous_attention – The attention distribution from the previous output time step. Should be of shape (batch_size, input_sequence_length). For the first output time step, previous_attention[n] should be [1, 0, 0, …, 0] for all n in [0, … batch_size - 1].
  • mode

    How to compute the attention distribution. Must be one of "recursive", "parallel", or "hard":

    • "recursive" recursively computes the distribution. This is slowest but is exact, general, and does not suffer from numerical instabilities.
    • "parallel" uses parallelized cumulative-sum and cumulative-product operations to compute a closed-form solution to the recurrence relation defining the attention distribution. This makes it more efficient than "recursive", but it requires numerical checks which make the distribution non-exact. This can be a problem in particular when input sequence is long and/or p_choose_i has entries very close to 0 or 1.
    • "hard" requires that the probabilities in p_choose_i are all either 0 or 1, and subsequently uses a more efficient and exact solution.
Returns:

A tensor of shape (batch_size, input_sequence_length) representing the attention distributions for each sequence in the batch.

Raises:

ValueError – mode is not one of "recursive", "parallel", "hard".

hardmax

texar.torch.core.hardmax(logits: torch.Tensor) → torch.Tensor[source]

Returns batched one-hot vectors. The depth index containing the 1 is that of the maximum logit value.

Parameters:logits – A batch tensor of logit values.
Returns:A batched one-hot tensor.

sparsemax

texar.torch.core.sparsemax(input: torch.Tensor, dim: int = -1) → torch.Tensor[source]

sparsemax: normalizing sparse transform (a la softmax).

Parameters:
  • input (Tensor) – A batch tensor of logit values.
  • dim – Dimension along which to apply sparsemax.
Returns:

output with the same shape as input.

Return type:

Tensor

Cells

default_rnn_cell_hparams

texar.torch.core.default_rnn_cell_hparams()[source]

Returns a dict of RNN cell hyperparameters and their default values.

{
    "type": "LSTMCell",
    "input_size": 256,
    "kwargs": {
        "hidden_size": 256
    },
    "num_layers": 1,
    "dropout": {
        "input_keep_prob": 1.0,
        "output_keep_prob": 1.0,
        "state_keep_prob": 1.0,
        "variational_recurrent": False,
    },
    "residual": False,
    "highway": False,
}

Here:

“type”: str or cell class or cell instance

The RNN cell type. This can be

  • The string name or full module path of a cell class. If class name is provided, the class must be in module torch.nn.modules.rnn, texar.torch.core.cell_wrappers, or texar.torch.custom.
  • A cell class.
  • An instance of a cell class. This is not valid if “num_layers” > 1.

For example

"type": "LSTMCell"  # class name
"type": "torch.nn.GRUCell"  # module path
"type": "my_module.MyCell"  # module path
"type": torch.nn.GRUCell  # class
"type": LSTMCell(hidden_size=100)  # cell instance
"type": MyCell(...)  # cell instance
“kwargs”: dict

Keyword arguments for the constructor of the cell class. A cell is created by cell_class(**kwargs), where cell_class is specified in “type” above.

Ignored if “type” is a cell instance.

Note

It is unnecessary to specify “input_size” within “kwargs”. This value will be automatically filled based on layer index.

Note

Although PyTorch uses “hidden_size” to denote the hidden layer size, we follow TensorFlow conventions and use “num_units”.

“num_layers”: int
Number of cell layers. Each layer is a cell created as above, with the same hyperparameters specified in “kwargs”.
“dropout”: dict

Dropout applied to the cell in each layer. See DropoutWrapper for details of the hyperparameters. If all “*_keep_prob” = 1, no dropout is applied.

Specifically, if “variational_recurrent” = True, the same dropout mask is applied across all time steps per batch.

“residual”: bool
If True, apply residual connection on the inputs and outputs of cell in each layer except the first layer. Ignored if “num_layers” = 1.
“highway”: bool
If True, apply highway connection on the inputs and outputs of cell in each layer except the first layer. Ignored if “num_layers” = 1.

get_rnn_cell

texar.torch.core.get_rnn_cell(input_size, hparams=None)[source]

Creates an RNN cell.

See default_rnn_cell_hparams() for all hyperparameters and default values.

Parameters:
  • input_size (int) – Size of the input to the cell in the first layer.
  • hparams (dict or HParams, optional) – Cell hyperparameters. Missing hyperparameters are set to default values.
Returns:

A cell instance.

Raises:

ValueError – If hparams["num_layers"]>1 and hparams["type"] is a class instance.

wrap_builtin_cell

texar.torch.core.wrap_builtin_cell(cell: torch.nn.modules.rnn.RNNCellBase)[source]

Convert a built-in torch.nn.RNNCellBase derived RNN cell to our wrapped version.

Parameters:cell – the RNN cell to wrap around.
Returns:The wrapped cell derived from texar.torch.core.cell_wrappers.RNNCellBase.

RNNCellBase

class texar.torch.core.cell_wrappers.RNNCellBase(cell: Union[torch.nn.modules.rnn.RNNCellBase, RNNCellBase])[source]

The base class for RNN cells in our framework. Major differences over torch.nn.RNNCell are two-fold:

  1. Holds an torch.nn.Module which could either be a built-in RNN cell or a wrapped cell instance. This design allows RNNCellBase to serve as the base class for both vanilla cells and wrapped cells.
  2. Adds zero_state() method for initialization of hidden states, which can also be used to implement batch-specific initialization routines.
input_size

The number of expected features in the input.

hidden_size

The number of features in the hidden state.

init_batch()[source]

Perform batch-specific initialization routines. For most cells this is a no-op.

zero_state(batch_size: int) → State[source]

Return zero-filled state tensor(s).

Parameters:batch_size – int, the batch size.
Returns:State tensor(s) initialized to zeros. Note that different subclasses might return tensors of different shapes and structures.
forward(input: torch.Tensor, state: Optional[State] = None) → Tuple[torch.Tensor, State][source]
Returns:A tuple of (output, state). For single layer RNNs, output is the same as state.

RNNCell

class texar.torch.core.cell_wrappers.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')[source]

A wrapper over torch.nn.RNNCell.

GRUCell

class texar.torch.core.cell_wrappers.GRUCell(input_size, hidden_size, bias=True)[source]

A wrapper over torch.nn.GRUCell.

LSTMCell

class texar.torch.core.cell_wrappers.LSTMCell(input_size, hidden_size, bias=True, forget_bias: Optional[float] = None)[source]

A wrapper over torch.nn.LSTMCell, additionally providing the option to initialize the forget-gate bias to a constant value.

zero_state(batch_size: int) → Tuple[torch.Tensor, torch.Tensor][source]

Returns the zero state for LSTMs as (h, c).

forward(input: torch.Tensor, state: Optional[Tuple[torch.Tensor, torch.Tensor]] = None) → Tuple[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]][source]

Returns: A tuple of (output, state). For single layer RNNs, output is the same as state.

DropoutWrapper

class texar.torch.core.cell_wrappers.DropoutWrapper(cell: texar.torch.core.cell_wrappers.RNNCellBase[~State][State], input_keep_prob: float = 1.0, output_keep_prob: float = 1.0, state_keep_prob: float = 1.0, variational_recurrent=False)[source]

Operator adding dropout to inputs and outputs of the given cell.

init_batch()[source]

Initialize dropout masks for variational dropout.

Note that we do not create dropout mask here, because the batch size may not be known until actual input is passed in.

forward(input: torch.Tensor, state: Optional[State] = None) → Tuple[torch.Tensor, State][source]

Returns: A tuple of (output, state). For single layer RNNs, output is the same as state.

ResidualWrapper

class texar.torch.core.cell_wrappers.ResidualWrapper(cell: Union[torch.nn.modules.rnn.RNNCellBase, RNNCellBase])[source]

RNNCell wrapper that ensures cell inputs are added to the outputs.

forward(input: torch.Tensor, state: Optional[State] = None) → Tuple[torch.Tensor, State][source]

Returns: A tuple of (output, state). For single layer RNNs, output is the same as state.

HighwayWrapper

class texar.torch.core.cell_wrappers.HighwayWrapper(cell: texar.torch.core.cell_wrappers.RNNCellBase[~State][State], carry_bias_init: Optional[float] = None, couple_carry_transform_gates: bool = True)[source]

RNNCell wrapper that adds highway connection on cell input and output.

Based on: R. K. Srivastava, K. Greff, and J. Schmidhuber, “Highway networks”, arXiv preprint arXiv:1505.00387, 2015. https://arxiv.org/pdf/1505.00387.pdf

forward(input: torch.Tensor, state: Optional[State] = None) → Tuple[torch.Tensor, State][source]

Returns: A tuple of (output, state). For single layer RNNs, output is the same as state.

MultiRNNCell

class texar.torch.core.cell_wrappers.MultiRNNCell(cells: List[texar.torch.core.cell_wrappers.RNNCellBase[~State][State]])[source]

RNN cell composed sequentially of multiple simple cells.

sizes = [128, 128, 64]
cells = [BasicLSTMCell(input_size, hidden_size)
         for input_size, hidden_size in zip(sizes[:-1], sizes[1:])]
stacked_rnn_cell = MultiRNNCell(cells)
input_size

The number of expected features in the input.

hidden_size

The number of features in the hidden state.

init_batch()[source]

Perform batch-specific initialization routines. For most cells this is a no-op.

zero_state(batch_size: int) → List[State][source]

Return zero-filled state tensor(s).

Parameters:batch_size – int, the batch size.
Returns:State tensor(s) initialized to zeros. Note that different subclasses might return tensors of different shapes and structures.
forward(input: torch.Tensor, state: Optional[List[State]] = None) → Tuple[torch.Tensor, List[State]][source]

Run this multi-layer cell on inputs, starting from state.

AttentionWrapper

class texar.torch.core.cell_wrappers.AttentionWrapper(cell: texar.torch.core.cell_wrappers.RNNCellBase, attention_mechanism: Union[texar.torch.core.attention_mechanism.AttentionMechanism, List[texar.torch.core.attention_mechanism.AttentionMechanism]], attention_layer_size: Union[int, List[int], None] = None, alignment_history: bool = False, cell_input_fn: Optional[Callable[[torch.Tensor, torch.Tensor], torch.Tensor]] = None, output_attention: bool = True)[source]

Wraps another RNNCell with attention.

output_size

The number of features in the output tensor.

zero_state(batch_size: int) → texar.torch.core.attention_mechanism.AttentionWrapperState[source]

Return an initial (zero) state tuple for this AttentionWrapper.

Note

Please see the initializer documentation for details of how to call zero_state() if using an AttentionWrapper with a BeamSearchDecoder.

Parameters:batch_size0D integer: the batch size.
Returns:An AttentionWrapperState tuple containing zeroed out tensors and Python lists.
forward(inputs: torch.Tensor, state: Optional[texar.torch.core.attention_mechanism.AttentionWrapperState], memory: torch.Tensor, memory_sequence_length: Optional[torch.LongTensor] = None) → Tuple[torch.Tensor, texar.torch.core.attention_mechanism.AttentionWrapperState][source]

Perform a step of attention-wrapped RNN.

  • Step 1: Mix the inputs and previous step’s attention output via cell_input_fn.
  • Step 2: Call the wrapped cell with this input and its previous state.
  • Step 3: Score the cell’s output with attention_mechanism.
  • Step 4: Calculate the alignments by passing the score through the normalizer.
  • Step 5: Calculate the context vector as the inner product between the alignments and the attention_mechanism’s values (memory).
  • Step 6: Calculate the attention output by concatenating the cell output and context through the attention layer (a linear layer with attention_layer_size outputs).
Parameters:
  • inputs – (Possibly nested tuple of) Tensor, the input at this time step.
  • state – An instance of AttentionWrapperState containing tensors from the previous time step.
  • memory – The memory to query; usually the output of an RNN encoder. This tensor should be shaped [batch_size, max_time, …].
  • memory_sequence_length – (optional) Sequence lengths for the batch entries in memory. If provided, the memory tensor rows are masked with zeros for values past the respective sequence lengths.
Returns:

A tuple (attention_or_cell_output, next_state), where

  • attention_or_cell_output depending on output_attention.
  • next_state is an instance of AttentionWrapperState containing the state calculated at this time step.

Raises:

TypeError – If state is not an instance of AttentionWrapperState.

Layers

get_layer

texar.torch.core.get_layer(hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any]]) → torch.nn.modules.module.Module[source]

Makes a layer instance.

The layer must be an instance of torch.nn.Module.

Parameters:

hparams (dict or HParams) –

Hyperparameters of the layer, with structure:

{
    "type": "LayerClass",
    "kwargs": {
        # Keyword arguments of the layer class
        # ...
    }
}

Here:

”type”: str or layer class or layer instance

The layer type. This can be

  • The string name or full module path of a layer class. If the class name is provided, the class must be in module torch.nn.Module, texar.torch.core, or texar.torch.custom.
  • A layer class.
  • An instance of a layer class.

For example

"type": "Conv1D"                               # class name
"type": "texar.torch.core.MaxReducePooling1D"  # module path
"type": "my_module.MyLayer"                    # module path
"type": torch.nn.Module.Linear                 # class
"type": Conv1D(filters=10, kernel_size=2)  # cell instance
"type": MyLayer(...)                       # cell instance
”kwargs”: dict

A dictionary of keyword arguments for constructor of the layer class. Ignored if "type" is a layer instance.

  • Arguments named “activation” can be a callable, or a str of the name or module path to the activation function.
  • Arguments named “*_regularizer” and “*_initializer” can be a class instance, or a dict of hyperparameters of respective regularizers and initializers. See
  • Arguments named “*_constraint” can be a callable, or a str of the name or full path to the constraint function.

Returns:

A layer instance. If hparams["type"] is a layer instance, returns it directly.

Raises:

MaxReducePool1d

class texar.torch.core.MaxReducePool1d[source]

A subclass of torch.nn.Module. Max Pool layer for 1D inputs. The same as torch.nn.MaxPool1d except that the pooling dimension is entirely reduced (i.e., pool_size=input_length).

forward(input: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

AvgReducePool1d

class texar.torch.core.AvgReducePool1d[source]

A subclass of torch.nn.Module. Avg Pool layer for 1D inputs. The same as torch.nn.AvgPool1d except that the pooling dimension is entirely reduced (i.e., pool_size=input_length).

forward(input: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_pooling_layer_hparams

texar.torch.core.get_pooling_layer_hparams(hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any]]) → Dict[str, Any][source]

Creates pooling layer hyperparameters dict for get_layer().

If the hparams sets ‘pool_size’ to None, the layer will be changed to the respective reduce-pooling layer. For example, torch.conv.MaxPool1d is replaced with MaxReducePool1d.

MergeLayer

class texar.torch.core.MergeLayer(layers: Optional[List[torch.nn.modules.module.Module]] = None, mode: str = 'concat', dim: Optional[int] = None)[source]

A subclass of torch.nn.Module. A layer that consists of multiple layers in parallel. Input is fed to each of the parallel layers, and the outputs are merged with a specified mode.

Parameters:
  • layers (list, optional) –

    A list of torch.nn.Module instances, or a list of hyperparameter dictionaries each of which specifies “type” and “kwargs” of each layer (see the hparams argument of get_layer()).

    If None, this layer degenerates to a merging operator that merges inputs directly.

  • mode (str) –

    Mode of the merge op. This can be:

    • 'concat': Concatenates layer outputs along one dim. Tensors must have the same shape except for the dimension specified in dim, which can have different sizes.
    • 'elemwise_sum': Outputs element-wise sum.
    • 'elemwise_mul': Outputs element-wise product.
    • 'sum': Computes the sum of layer outputs along the dimension given by dim. For example, given dim=1, two tensors of shape [a, b] and [a, c] respectively will result in a merged tensor of shape [a].
    • 'mean': Computes the mean of layer outputs along the dimension given in dim.
    • 'prod': Computes the product of layer outputs along the dimension given in dim.
    • 'max': Computes the maximum of layer outputs along the dimension given in dim.
    • 'min': Computes the minimum of layer outputs along the dimension given in dim.
    • 'and': Computes the logical and of layer outputs along the dimension given in dim.
    • 'or': Computes the logical or of layer outputs along the dimension given in dim.
    • 'logsumexp': Computes log(sum(exp(elements across the dimension of layer outputs)))
  • dim (int) – The dim to use in merging. Ignored in modes 'elemwise_sum' and 'elemwise_mul'.
forward(input: torch.Tensor) → torch.Tensor[source]

Feed input to every containing layer and merge the outputs.

Parameters:input – The input tensor.
Returns:The merged tensor.
layers

The list of parallel layers.

Flatten

class texar.torch.core.Flatten[source]

Flatten layer to flatten a tensor after convolution.

Identity

class texar.torch.core.Identity[source]

Identity activation layer.

default_regularizer_hparams

texar.torch.core.default_regularizer_hparams()[source]

Returns the hyperparameters and their default values of a variable regularizer:

{
    "type": "L1L2",
    "kwargs": {
        "l1": 0.,
        "l2": 0.
    }
}

The default value corresponds to L1L2 and, with (l1=0, l2=0), disables regularization.

get_regularizer

texar.torch.core.get_regularizer(hparams=None)[source]

Returns a variable regularizer instance.

See default_regularizer_hparams() for all hyperparameters and default values.

The “type” field can be a subclass of Regularizer, its string name or module path, or a class instance.

Parameters:hparams (dict or HParams, optional) – Hyperparameters. Missing hyperparameters are set to default values.
Returns:A Regularizer instance. None if hparams is None or taking the default hyperparameter value.
Raises:ValueError – The resulting regularizer is not an instance of Regularizer.

get_initializer

texar.torch.core.get_initializer(hparams=None) → Optional[Callable[[torch.Tensor], torch.Tensor]][source]

Returns an initializer instance.

Parameters:hparams (dict or HParams, optional) –

Hyperparameters with the structure

{
    "type": "initializer_class_or_function",
    "kwargs": {
        # ...
    }
}

The “type” field can be a function name or module path. If name is provided, it be must be from one the following modules: torch.nn.init and texar.torch.custom.

Besides, the “type” field can also be an initialization function called with initialization_fn(**kwargs). In this case “type” can be the function, or its name or module path. If no keyword argument is required, “kwargs” can be omitted.

Returns:An initializer instance. None if hparams is None.

get_activation_fn

texar.torch.core.get_activation_fn(fn_name: Union[str, Callable[[torch.Tensor], torch.Tensor], None] = None, kwargs: Union[texar.torch.hyperparams.HParams, Dict[KT, VT], None] = None) → Optional[Callable[[torch.Tensor], torch.Tensor]][source]

Returns an activation function fn with the signature output = fn(input).

If the function specified by fn_name has more than one arguments without default values, then all these arguments except the input feature argument must be specified in kwargs. Arguments with default values can also be specified in kwargs to take values other than the defaults. In this case a partial function is returned with the above signature.

Parameters:
  • fn_name (str or callable) –

    An activation function, or its name or module path. The function can be:

    • Built-in function defined in torch.nn.functional
    • User-defined activation functions in module texar.torch.custom.
    • External activation functions. Must provide the full module path, e.g., "my_module.my_activation_fn".
  • kwargs (optional) – A dict or instance of HParams containing the keyword arguments of the activation function.
Returns:

An activation function. None if fn_name is None.

Optimization

default_optimization_hparams

texar.torch.core.default_optimization_hparams() → Dict[str, Any][source]

Returns a dict of default hyperparameters of training op and their default values

{
    "optimizer": {
        "type": "Adam",
        "kwargs": {
            "lr": 0.001
        }
    },
    "learning_rate_decay": {
        "type": "",
        "kwargs": {}
    },
    "gradient_clip": {
        "type": "",
        "kwargs": {}
    },
    "gradient_noise_scale": None,
    "name": None
}

Here:

“optimizer”: dict

Hyperparameters of a torch.optim.Optimizer.

  • “type” specifies the optimizer class. This can be

    • The string name or full module path of an optimizer class. If the class name is provided, the class must be in module torch.optim or texar.torch.custom, texar.torch.core.optimization
    • An optimizer class.
    • An instance of an optimizer class.

    For example

    "type": "Adam"                    # class name
    "type": "my_module.MyOptimizer"   # module path
    "type": texar.torch.custom.BertAdam     # class
    "type": my_module.MyOptimizer     # class
    
  • “kwargs” is a dict specifying keyword arguments for creating the optimizer class instance, with opt_class(**kwargs). Ignored if “type” is a class instance.

“learning_rate_decay”: dict

Hyperparameters of learning rate decay function. The learning rate starts decay from "start_decay_step" and keeps unchanged after "end_decay_step" or reaching "min_learning_rate".

The decay function is specified in “type” and “kwargs”.

  • “type” can be a decay function or its name or module path. If function name is provided, it must be from module torch.optim or texar.torch.custom, texar.torch.core.optimization.
  • “kwargs” is a dict of keyword arguments for the function excluding arguments named “global_step” and “learning_rate”.

The function is called with lr = decay_fn(learning_rate=lr, global_step=offset_step, **kwargs), where offset_step is the global step offset as above.

“gradient_clip”: dict

Hyperparameters of gradient clipping. The gradient clipping function takes a list of (gradients, variables) tuples and returns a list of (clipped_gradients, variables) tuples. Typical examples include torch.nn.utils.clip_grad_norm_ and torch.nn.utils.clip_grad_value_.

“type” specifies the gradient clip function, and can be a function, or its name or module path. If function name is provided, the function must be from module torch.nn.utils, texar.torch.custom, or texar.torch.core.optimization.

“kwargs” specifies keyword arguments to the function, except arguments named “parameters”.

“gradient_noise_scale”: float, optional
Adds 0-mean normal noise scaled by this value to gradient.

get_train_op

texar.torch.core.get_train_op(params: Optional[Iterable[Union[torch.Tensor, Dict[str, Any]]]] = None, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, scheduler: Optional[torch.optim.lr_scheduler._LRScheduler] = None, hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any], None] = None) → Callable[[], None][source]

Creates a training op.

Parameters:
Returns:

The callable used for variable optimization.

get_scheduler

texar.torch.core.get_scheduler(optimizer: torch.optim.optimizer.Optimizer, hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any], None] = None) → Optional[torch.optim.lr_scheduler._LRScheduler][source]

Creates a scheduler instance.

Parameters:
Returns:

A torch.optim.lr_scheduler._LRScheduler instance.

get_optimizer

texar.torch.core.get_optimizer(params: Iterable[Union[torch.Tensor, Dict[str, Any]]], hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any], None] = None) → torch.optim.optimizer.Optimizer[source]

Creates a optimizer instance.

Parameters:
  • params – an iterable of torch.Tensor or dict. Specifies what Tensors should be optimized.
  • hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See default_optimization_hparams() for all hyperparameters and default values.
Returns:

The torch.optim.Optimizer instance specified in hparams.

get_grad_clip_fn

texar.torch.core.get_grad_clip_fn(hparams: Union[texar.torch.hyperparams.HParams, Dict[str, Any], None] = None) → Optional[Callable[[torch.Tensor], Optional[torch.Tensor]]][source]

Create a gradient clipping function.

Parameters:hparams (dict or HParams, optional) – hyperparameters. Missing hyperparameters are set to default values automatically. See default_optimization_hparams() for all hyperparameters and default values.
Returns:A gradient clipping function.