Loss Functions¶
MLE Loss¶
sequence_softmax_cross_entropy¶

texar.torch.losses.
sequence_softmax_cross_entropy
(labels: torch.Tensor, logits: torch.Tensor, sequence_length: Optional[torch.LongTensor], average_across_batch: bool = True, average_across_timesteps: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, time_major: bool = False, stop_gradient_to_label: bool = False) → torch.Tensor[source]¶ Computes softmax cross entropy for each time step of sequence predictions.
Parameters:  labels –
Target class distributions.
 If
time_major
is False (default), this must be a Tensor of shape [batch_size, max_time, num_classes].  If time_major is True, this must be a Tensor of shape [max_time, batch_size, num_classes].
Each row of labels should be a valid probability distribution, otherwise, the computation of the gradient will be incorrect.
 If
 logits – Unscaled log probabilities. This must have the shape of [max_time, batch_size, num_classes] or [batch_size, max_time, num_classes] according to the value of time_major.
 sequence_length – A Tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero losses.
 average_across_timesteps (bool) – If set, average the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 sum_over_timesteps (bool) – If set, sum the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 time_major (bool) – The shape format of the inputs. If True,
labels
andlogits
must have shape [max_time, batch_size, …]. If False (default), they must have shape [batch_size, max_time, …].  stop_gradient_to_label (bool) – If set, gradient propagation to
labels
will be disabled.
Returns: A Tensor containing the loss, of rank 0, 1, or 2 depending on the arguments
{average_across}/{sum_over}_{timesteps}/{batch}
. For example: If
sum_over_timesteps
andaverage_across_batch
are True (default), the return Tensor is of rank 0.  If
average_across_batch
is True and other arguments are False, the return Tensor is of shape [max_time].
 labels –
sequence_sparse_softmax_cross_entropy¶

texar.torch.losses.
sequence_sparse_softmax_cross_entropy
(labels: torch.Tensor, logits: torch.Tensor, sequence_length: Optional[torch.LongTensor], average_across_batch: bool = True, average_across_timesteps: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, time_major: bool = False) → torch.Tensor[source]¶ Computes sparse softmax cross entropy for each time step of sequence predictions.
Parameters:  labels –
Target class indexes. I.e., classes are mutually exclusive (each entry is in exactly one class).
 If
time_major
is False (default), this must be a Tensor of shape [batch_size, max_time].  If time_major is True, this must be a Tensor of shape [max_time, batch_size].
 If
 logits – Unscaled log probabilities. This must have the shape of [max_time, batch_size, num_classes] or [batch_size, max_time, num_classes] according to the value of time_major.
 sequence_length – A Tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero losses.
 average_across_timesteps (bool) – If set, average the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 sum_over_timesteps (bool) – If set, sum the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 time_major (bool) – The shape format of the inputs. If True,
labels
andlogits
must have shape [max_time, batch_size, …]. If False (default), they must have shape [batch_size, max_time, …].
Returns: A Tensor containing the loss, of rank 0, 1, or 2 depending on the arguments
{average_across}/{sum_over}_{timesteps}/{batch}
. For example: If
sum_over_timesteps
andaverage_across_batch
are True (default), the return Tensor is of rank 0.  If
average_across_batch
is True and other arguments are False, the return Tensor is of shape [max_time].
Example
embedder = WordEmbedder(vocab_size=data.vocab.size) decoder = BasicRNNDecoder(vocab_size=data.vocab.size) outputs, _, _ = decoder( decoding_strategy='train_greedy', inputs=embedder(data_batch['text_ids']), sequence_length=data_batch['length']1) loss = sequence_sparse_softmax_cross_entropy( labels=data_batch['text_ids'][:, 1:], logits=outputs.logits, sequence_length=data_batch['length']1)
 labels –
sequence_sigmoid_cross_entropy¶

texar.torch.losses.
sequence_sigmoid_cross_entropy
(labels: torch.Tensor, logits: torch.Tensor, sequence_length: Optional[torch.LongTensor], average_across_batch: bool = True, average_across_timesteps: bool = False, average_across_classes: bool = True, sum_over_batch: bool = False, sum_over_timesteps: bool = True, sum_over_classes: bool = False, time_major: bool = False, stop_gradient_to_label: bool = False) → torch.Tensor[source]¶ Computes sigmoid cross entropy for each time step of sequence predictions.
Parameters:  labels –
Target class distributions.
 If
time_major
is False (default), this must be a Tensor of shape [batch_size, max_time(, num_classes)].  If time_major is True, this must be a Tensor of shape [max_time, batch_size(, num_classes)].
Each row of labels should be a valid probability distribution, otherwise, the computation of the gradient will be incorrect.
 If
 logits – Unscaled log probabilities having the same shape as with
labels
.  sequence_length – A Tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will have zero losses.
 average_across_timesteps (bool) – If set, average the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_classes (bool) – If set, average the loss across the
class dimension (if exists). Must not set
average_across_classes’ and sum_over_classes at
the same time. Ignored if
logits
is a 2D Tensor.  sum_over_timesteps (bool) – If set, sum the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_classes (bool) – If set, sum the loss across the
class dimension. Must not set average_across_classes
and sum_over_classes at the same time. Ignored if
logits
is a 2D Tensor.  time_major (bool) – The shape format of the inputs. If True,
labels
andlogits
must have shape [max_time, batch_size, …]. If False (default), they must have shape [batch_size, max_time, …].  stop_gradient_to_label (bool) – If set, gradient propagation to
labels
will be disabled.
Returns: A Tensor containing the loss, of rank 0, 1, or 2 depending on the arguments
{average_across}/{sum_over}_{timesteps}/{batch}/{classes}
. For example, if the class dimension does not exist, and If
sum_over_timesteps
andaverage_across_batch
are True (default), the return Tensor is of rank 0.  If
average_across_batch
is True and other arguments are False, the return Tensor is of shape [max_time].
 labels –
binary_sigmoid_cross_entropy¶

texar.torch.losses.
binary_sigmoid_cross_entropy
(pos_logits: Optional[torch.Tensor] = None, neg_logits: Optional[torch.Tensor] = None, average_across_batch: bool = True, average_across_classes: bool = True, sum_over_batch: bool = False, sum_over_classes: bool = False, return_pos_neg_losses: bool = False) → Union[Tuple[torch.Tensor, torch.Tensor, torch.Tensor], torch.Tensor][source]¶ Computes sigmoid cross entropy of binary predictions.
Parameters:  pos_logits – The logits of predicting positive on positive data. A tensor of shape [batch_size(, num_classes)].
 neg_logits – The logits of predicting positive on negative data. A tensor of shape [batch_size(, num_classes)].
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_classes (bool) – If set, average the loss across the
class dimension (if exists). Must not set
average_across_classes’ and sum_over_classes at
the same time. Ignored if
logits
is a 1D Tensor.  sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_classes (bool) – If set, sum the loss across the
class dimension. Must not set average_across_classes
and sum_over_classes at the same time. Ignored if
logits
is a 2D Tensor.  return_pos_neg_losses (bool) – If set, additionally returns the losses
on
pos_logits
andneg_logits
, respectively.
Returns: By default, a Tensor containing the loss, of rank 0, 1, or 2 depending on the arguments
{average_across}/{sum_over}_{batch}/{classes}
. For example: If
sum_over_batch
andaverage_across_classes
are True (default), the return Tensor is of rank 0.  If arguments are False, the return Tensor is of shape [batch_size(, num_classes)].
If
return_pos_neg_losses
is True, returns a tuple (loss, pos_loss, neg_loss), where loss is the loss above; pos_loss is the loss on pos_logits only; and neg_loss is the loss on neg_logits only. They have loss = pos_loss + neg_loss.
binary_sigmoid_cross_entropy_with_clas¶

texar.torch.losses.
binary_sigmoid_cross_entropy_with_clas
(clas_fn: Callable[[torch.Tensor], Union[torch.Tensor, Tuple[torch.Tensor, ...]]], pos_inputs: Optional[torch.Tensor] = None, neg_inputs: Optional[torch.Tensor] = None, average_across_batch: bool = True, average_across_classes: bool = True, sum_over_batch: bool = False, sum_over_classes: bool = False, return_pos_neg_losses: bool = False) → Union[Tuple[torch.Tensor, torch.Tensor, torch.Tensor], torch.Tensor][source]¶ Computes sigmoid cross entropy of binary classifier.
Parameters:  clas_fn – A callable takes data (e.g.,
pos_inputs
andfake_inputs
) and returns the logits of being positive. The signature of clas_fn must be:logits (, …) = clas_fn(inputs)
. The return value of clas_fn can be the logits, or a tuple where the logits are the first element.  pos_inputs – The positive data fed into clas_fn.
 neg_inputs – The negative data fed into clas_fn.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_classes (bool) – If set, average the loss across the
class dimension (if exists). Must not set
average_across_classes’ and sum_over_classes at
the same time. Ignored if
logits
is a 1D Tensor.  sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_classes (bool) – If set, sum the loss across the
class dimension. Must not set average_across_classes
and sum_over_classes at the same time. Ignored if
logits
is a 2D Tensor.  return_pos_neg_losses (bool) – If set, additionally returns the losses
on
pos_logits
andneg_logits
, respectively.
Returns: By default, a Tensor containing the loss, of rank 0, 1, or 2 depending on the arguments
{average_across}/{sum_over}_{batch}/{classes}
. For example: If
sum_over_batch
andaverage_across_classes
are True (default), the return Tensor is of rank 0.  If arguments are False, the return Tensor is of shape [batch_size(, num_classes)].
If
return_pos_neg_losses`=`True
, returns a tuple (loss, pos_loss, neg_loss), where loss is the loss above; pos_loss is the loss on pos_logits only; and neg_loss is the loss on neg_logits only. They have loss = pos_loss + neg_loss. clas_fn – A callable takes data (e.g.,
Policy Gradient Loss¶
pg_loss_with_logits¶

texar.torch.losses.
pg_loss_with_logits
(actions: torch.Tensor, logits: torch.Tensor, advantages: torch.Tensor, rank: Optional[int] = None, batched: bool = False, sequence_length: Optional[torch.LongTensor] = None, average_across_batch: bool = True, average_across_timesteps: bool = False, average_across_remaining: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, sum_over_remaining: bool = True, time_major: bool = False) → torch.Tensor[source]¶ Policy gradient loss with logits. Used for discrete actions.
pg_loss = reduce( advantages * log_prob( actions ) ), where advantages and actions do not backpropagate gradients.
All arguments except
logits
andactions
are the same withpg_loss_with_log_probs()
.Parameters:  actions –
Tensor of shape [(batch_size,) max_time, d_3, …, d_rank] and of dtype int32 or int64. The rank of the Tensor is specified with
rank
.The batch dimension exists only if
batched
is True.The batch and time dimensions are exchanged, i.e., [max_time, batch_size, …] if
time_major
is True.  logits – Unscaled log probabilities of shape [(batch_size,) max_time, d_3, …, d_{rank+1}] and dtype float32 or float64. The batch and time dimensions are exchanged if time_major is True.
 advantages – Tensor of shape [(batch_size,) max_time, d_3, …, d_rank] and dtype float32 or float64. The batch and time dimensions are exchanged if time_major is True.
 rank (int, optional) – The rank of
actions
. If None (default), rank is automatically inferred from actions or advantages. If the inference fails, rank is set to 1 ifbatched
is False, and set to 2 ifbatched
is True.  batched (bool) – True if the inputs are batched.
 sequence_length (optional) – A Tensor of shape [batch_size].
Time steps beyond the respective sequence lengths will have zero
losses. Used if
batched
is True.  average_across_timesteps (bool) – If set, average the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time. Ignored if batched is False.
 average_across_remaining (bool) – If set, average the sequence across the remaining dimensions. Must not set average_across_remaining’ and sum_over_remaining at the same time. Ignored if no more dimensions other than the batch and time dimensions.
 sum_over_timesteps (bool) – If set, sum the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time. Ignored if batched is False.
 sum_over_remaining (bool) – If set, sum the loss across the remaining dimension. Must not set average_across_remaining and sum_over_remaining at the same time. Ignored if no more dimensions other than the batch and time dimensions.
 time_major (bool) – The shape format of the inputs. If True,
logits
,actions
andadvantages
must have shape [max_time, batch_size, …]. If False (default), they must have shape [batch_size, max_time, …]. Ignored if batched is False.
Returns: A Tensor containing the loss to minimize, whose rank depends on the reduce arguments. For example, the batch dimension is reduced if either
average_across_batch
orsum_over_batch
is True, which decreases the rank of output tensor by 1. actions –
pg_loss_with_log_probs¶

texar.torch.losses.
pg_loss_with_log_probs
(log_probs: torch.Tensor, advantages: torch.Tensor, rank: Optional[int] = None, batched: bool = False, sequence_length: Optional[torch.LongTensor] = None, average_across_batch: bool = True, average_across_timesteps: bool = False, average_across_remaining: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, sum_over_remaining: bool = True, time_major: bool = False) → torch.Tensor[source]¶ Policy gradient loss with log probabilities of actions.
pg_loss = reduce(advantages * log_probs), where advantages does not backpropagate gradients.
All arguments except
log_probs
are the same aspg_loss_with_logits()
.Parameters:  log_probs –
Log probabilities of shape [(batch_size,) max_time, …, d_rank] and dtype float32 or float64. The rank of the Tensor is specified with
rank
.The batch dimension exists only if
batched
is True.The batch and time dimensions are exchanged, i.e., [max_time, batch_size, …] if
time_major
is True.  advantages – Tensor of shape [(batch_size,) max_time, d_3, …, d_rank] and dtype float32 or float64. The batch dimension exists only if batched is True. The batch and time dimensions are exchanged if time_major is True.
 rank (int, optional) – The rank of
log_probs
. If None (default), rank is automatically inferred from log_probs or advantages. If the inference fails, rank is set to 1 if batched``==False, and set to 2 if batched``==True.  batched (bool) – True if the inputs are batched.
 sequence_length (optional) – A Tensor of shape [batch_size].
Time steps beyond the respective sequence lengths will have zero
losses. Used if
batched
is True.  average_across_timesteps (bool) – If set, average the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the loss across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time. Ignored if batched is False.
 average_across_remaining (bool) – If set, average the sequence across the remaining dimensions. Must not set average_across_remaining’ and sum_over_remaining at the same time. Ignored if no more dimensions other than the batch and time dimensions.
 sum_over_timesteps (bool) – If set, sum the loss across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the loss across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time. Ignored if batched is False.
 sum_over_remaining (bool) – If set, sum the loss across the remaining dimension. Must not set average_across_remaining and sum_over_remaining at the same time. Ignored if no more dimensions other than the batch and time dimensions.
 time_major (bool) – The shape format of the inputs. If True,
log_probs
andadvantages
must have shape [max_time, batch_size, …]. If False (default), they must have shape [batch_size, max_time, …]. Ignored ifbatched
is False.
Returns: A Tensor containing the loss to minimize, whose rank depends on the reduce arguments. For example, the batch dimension is reduced if either
average_across_batch
orsum_over_batch
is True, which decreases the rank of output tensor by 1. log_probs –
Reward¶
discount_reward¶

texar.torch.losses.
discount_reward
(reward: torch.Tensor, sequence_length: Optional[torch.LongTensor] = None, discount: float = 1.0, normalize: bool = False) → torch.Tensor[source]¶ Computes discounted reward.
Parameters:  reward – A Tensor. Can be 1D with shape [batch_size], or 2D with shape [batch_size, max_time].
 sequence_length (optional) – A Tensor of shape [batch_size].
Time steps beyond the respective sequence lengths will be masked.
Required if
reward
is 1D.  discount (float) – A scalar. The discount factor.
 normalize (bool) – Whether to normalize the discounted reward, by (discounted_reward  mean) / std. Here mean and std are over all time steps and all samples in the batch.
Returns: A 2D Tensor of the discounted reward.
Adversarial Loss¶
binary_adversarial_losses¶

texar.torch.losses.
binary_adversarial_losses
(real_data: torch.Tensor, fake_data: torch.Tensor, discriminator_fn: Callable[[torch.Tensor], Union[torch.Tensor, Tuple[torch.Tensor, ...]]], mode: str = 'max_real') → Tuple[torch.Tensor, torch.Tensor][source]¶ Computes adversarial losses of real/fake binary discrimination game.
Example:
# Using BERTClassifier as the discriminator, which can accept # "soft" token ids for gradient backpropagation discriminator = tx.modules.BERTClassifier('bertbaseuncased') G_loss, D_loss = tx.losses.binary_adversarial_losses( real_data=real_token_ids, # [batch_size, max_time] fake_data=fake_soft_token_ids, # [batch_size, max_time, vocab_size] discriminator_fn=discriminator)
Parameters:  real_data (Tensor or array) – Real data of shape [num_real_examples, …].
 fake_data (Tensor or array) – Fake data of shape [num_fake_examples, …]. num_real_examples does not necessarily equal num_fake_examples.
 discriminator_fn – A callable takes data (e.g.,
real_data
andfake_data
) and returns the logits of being real. The signature of discriminator_fn must be:logits, … = discriminator_fn(data)
. The return value of discriminator_fn can be the logits, or a tuple where the logits are the first element.  mode (str) –
Mode of the generator loss. Either “max_real” or “min_fake”.
 ”max_real” (default): minimizing the generator loss is to maximize the probability of fake data being classified as real.
 ”min_fake”: minimizing the generator loss is to minimize the probability of fake data being classified as fake.
Returns: A tuple (generator_loss, discriminator_loss) each of which is a scalar Tensor, loss to be minimized.
Entropy¶
entropy_with_logits¶

texar.torch.losses.
entropy_with_logits
(logits: torch.Tensor, rank: Optional[int] = None, average_across_batch: bool = True, average_across_remaining: bool = False, sum_over_batch: bool = False, sum_over_remaining: bool = True) → torch.Tensor[source]¶ Shannon entropy given logits.
Parameters:  logits –
Unscaled log probabilities of shape [batch_size, d_2, …, d_{rank1}, distribution_dim] and of dtype float32 or float64.
The rank of the tensor is optionally specified by the argument
rank
.The tensor is considered as having [batch_size, .., d_{rank1}] elements, each of which has a distribution of length d_rank (i.e., distribution_dim). So the last dimension is always summed out to compute the entropy.
 rank (int, optional) – The rank of
logits
. If None (default), rank is inferred automatically from logits. If the inference fails, rank is set to 2, i.e., assuminglogits
is of shape [batch_size, distribution_dim]  average_across_batch (bool) – If set, average the entropy across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_remaining (bool) – If set, average the entropy across the
remaining dimensions. Must not set average_across_remaining’
and sum_over_remaining at the same time.
Used only when
logits
has rank >= 3.  sum_over_batch (bool) – If set, sum the entropy across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_remaining (bool) – If set, sum the entropy across the
remaining dimension. Must not set average_across_remaining
and sum_over_remaining at the same time.
Used only when
logits
has rank >= 3.
Returns: A Tensor containing the Shannon entropy. The dimensionality of the Tensor depends on the configuration of reduction arguments. For example, if both batch and remaining dimensions are reduced (by either sum or average), the returned Tensor is a scalar Tensor.
 logits –
sequence_entropy_with_logits¶

texar.torch.losses.
sequence_entropy_with_logits
(logits: torch.Tensor, rank: Optional[int] = None, sequence_length: Optional[torch.LongTensor] = None, average_across_batch: bool = True, average_across_timesteps: bool = False, average_across_remaining: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, sum_over_remaining: bool = True, time_major: bool = False) → torch.Tensor[source]¶ Shannon entropy given logits.
Parameters:  logits –
Unscaled log probabilities of shape [batch_size, max_time, d_3, …, d_{rank1}, distribution_dim] and of dtype float32 or float64.
The rank of the tensor is optionally specified by the argument
rank
.The tensor is considered as having [batch_size, .., d_{rank1}] elements, each of which has a distribution of length d_rank (i.e., distribution_dim). So the last dimension is always summed out to compute the entropy.
The batch and time dimensions are exchanged if
time_major
is True.  rank (int, optional) – The rank of
logits
. If None (default), rank is inferred automatically from logits. If the inference fails, rank is set to 3, i.e., assuming logits is of shape [batch_size, max_time, distribution_dim]  sequence_length (optional) – A Tensor of shape [batch_size]. Time steps beyond the respective sequence lengths are counted into the entropy.
 average_across_timesteps (bool) – If set, average the entropy across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the entropy across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_remaining (bool) – If set, average the entropy across the
remaining dimensions. Must not set average_across_remaining’
and sum_over_remaining at the same time.
Used only when
logits
has rank >= 4.  sum_over_timesteps (bool) – If set, sum the entropy across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the entropy across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_remaining (bool) – If set, sum the entropy across the
remaining dimension. Must not set average_across_remaining
and sum_over_remaining at the same time.
Used only when
logits
has rank >= 4.  time_major (bool) – The shape format of the inputs. If True,
logits
must have shape [max_time, batch_size, …]. If False (default), it must have shape [batch_size, max_time, …].
Returns: A Tensor containing the Shannon entropy. The dimensionality of the Tensor depends on the configuration of reduction arguments. For example, if batch, time, and remaining dimensions are all reduced (by either sum or average), the returned Tensor is a scalar Tensor.
 logits –
Loss Utilities¶
mask_and_reduce¶

texar.torch.losses.
mask_and_reduce
(sequence: torch.Tensor, sequence_length: Optional[torch.LongTensor], rank: int = 2, average_across_batch: bool = True, average_across_timesteps: bool = False, average_across_remaining: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True, sum_over_remaining: bool = True, dtype: Optional[torch.dtype] = None, time_major: bool = False) → torch.Tensor[source]¶ Masks out sequence entries that are beyond the respective sequence lengths, and reduces (average or sum) away dimensions.
This is a combination of
mask_sequences()
andreduce_batch_time()
.Parameters:  sequence – A tensor of sequence values.
If time_major=False (default), this must be a tensor of shape
[batch_size, max_time, d_2, …, d_rank], where the rank of
the tensor is specified with
rank
. The batch and time dimensions are exchanged if time_major is True.  sequence_length – A tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will be made zero. If None, no masking is performed.
 rank (int) – The rank of
sequence
. Must be >= 2. Default is 2, i.e., sequence is a 2D Tensor consisting of batch and time dimensions.  average_across_timesteps (bool) – If set, average the sequence across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 average_across_batch (bool) – If set, average the sequence across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_remaining (bool) – If set, average the sequence across the remaining dimensions. Must not set average_across_remaining’ and sum_over_remaining at the same time.
 sum_over_timesteps (bool) – If set, sum the sequence across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the sequence across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_remaining (bool) – If set, sum the sequence across the remaining dimension. Must not set average_across_remaining and sum_over_remaining at the same time.
 dtype (torch.dtype) – The dtype of the returned mask.
 time_major (bool) – The shape format of the inputs. If True,
sequence
must have shape [max_time, batch_size, …]. If False (default), sequence must have shape [batch_size, max_time, …].
Returns: A tensor containing the masked and reduced sequence.
 sequence – A tensor of sequence values.
If time_major=False (default), this must be a tensor of shape
[batch_size, max_time, d_2, …, d_rank], where the rank of
the tensor is specified with
reduce_batch_time¶

texar.torch.losses.
reduce_batch_time
(sequence: torch.Tensor, sequence_length: Optional[torch.LongTensor], average_across_batch: bool = True, average_across_timesteps: bool = False, sum_over_batch: bool = False, sum_over_timesteps: bool = True) → torch.Tensor[source]¶ Average or sum over the respective dimensions of
sequence
, which is of shape [batch_size, max_time, …].Assumes
sequence
has been properly masked according tosequence_length
.Parameters:  sequence – A tensor to reduce.
 sequence_length – A tensor of shape [batch_size]. Time steps beyond the respective sequence lengths will be made zero. If None, no masking is performed.
 average_across_batch (bool) – If set, average the sequence across the batch dimension. Must not set average_across_batch’ and sum_over_batch at the same time.
 average_across_timesteps (bool) – If set, average the sequence across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
 sum_over_batch (bool) – If set, sum the sequence across the batch dimension. Must not set average_across_batch and sum_over_batch at the same time.
 sum_over_timesteps (bool) – If set, sum the sequence across the time dimension. Must not set average_across_timesteps and sum_over_timesteps at the same time.
Returns: A tensor with dimension reduction.
reduce_dimensions¶

texar.torch.losses.
reduce_dimensions
(tensor: torch.Tensor, average_axes: Union[int, List[int], None] = None, sum_axes: Union[int, List[int], None] = None, keepdims: Optional[bool] = None) → torch.Tensor[source]¶ Average or sum over dimensions of
tensor
.average_axes
andsum_axes
must be mutually exclusive. That is, elements in average_axes must not be contained in sum_axes, and vice versa.Parameters:  tensor – A tensor to reduce.
 average_axes (optional) – A (list of) int that indicates the dimensions to reduce by taking average.
 sum_axes (optional) – A (list of) int that indicates the dimensions to reduce by taking sum.
 keepdims (optional) – If True, retains reduced dimensions with length 1.
Returns: A tensor with dimension reduction.