Evaluations¶

BLEU¶

sentence_bleu¶

texar.torch.evals.sentence_bleu(references, hypothesis, max_order=4, lowercase=False, smooth=False, use_bp=True, return_all=False)[source]¶

Calculates BLEU score of a hypothesis sentence.

Parameters

references – A list of reference for the hypothesis. Each reference can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
hypothesis – A hypothesis sentence. Each hypothesis can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
lowercase (bool) – If True, lowercase reference and hypothesis tokens.
max_order (int) – Maximum n-gram order to use when computing BLEU score.
smooth (bool) – Whether or not to apply (Lin et al. 2004) smoothing.
use_bp (bool) – Whether to apply brevity penalty.
return_all (bool) – If True, returns BLEU and all n-gram precisions.

Returns

If return_all is False (default), returns a float32 BLEU score.

If return_all is True, returns a list of float32 [BLEU] + n-gram precisions, which is of length max_order +1.

corpus_bleu¶

texar.torch.evals.corpus_bleu(list_of_references, hypotheses, max_order=4, lowercase=False, smooth=False, use_bp=True, return_all=False)[source]¶

Computes corpus-level BLEU score.

Parameters

list_of_references – A list of lists of references for each hypothesis. Each reference can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
hypotheses – A list of hypothesis sentences. Each hypothesis can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
lowercase (bool) – If True, lowercase reference and hypothesis tokens.
max_order (int) – Maximum n-gram order to use when computing BLEU score.
smooth (bool) – Whether or not to apply (Lin et al. 2004) smoothing.
use_bp (bool) – Whether to apply brevity penalty.
return_all (bool) – If True, returns BLEU and all n-gram precisions.

Returns

If return_all is False (default), returns a float32 BLEU score.

If return_all is True, returns a list of float32 scores: [BLEU] + n-gram precisions, which is of length max_order +1.

sentence_bleu_moses¶

texar.torch.evals.sentence_bleu_moses(references, hypothesis, lowercase=False, return_all=False)[source]¶

Calculates BLEU score of a hypothesis sentence using the MOSES `multi-bleu.perl` script.

Parameters

references – A list of reference for the hypothesis. Each reference can be either a string, or a list of string tokens. List can also be numpy array.
hypothesis – A hypothesis sentence. The hypothesis can be either a string, or a list of string tokens. List can also be numpy array.
lowercase (bool) – If True, pass the "-lc" flag to the multi-bleu script.
return_all (bool) – If True, returns BLEU and all n-gram precisions.

Returns

If return_all is False (default), returns a float32 BLEU score.

If return_all is True, returns a list of 5 float32 scores: [BLEU, 1-gram precision, ..., 4-gram precision].

corpus_bleu_moses¶

texar.torch.evals.corpus_bleu_moses(list_of_references, hypotheses, lowercase=False, return_all=False)[source]¶

Calculates corpus-level BLEU score using the MOSES `multi-bleu.perl` script.

Parameters

list_of_references – A list of lists of references for each hypothesis. Each reference can be either a string, or a list of string tokens. List can also be numpy array.
hypotheses – A list of hypothesis sentences. Each hypothesis can be either a string, or a list of string tokens. List can also be numpy array.
lowercase (bool) – If True, pass the "-lc" flag to the multi-bleu script.
return_all (bool) – If True, returns BLEU and all n-gram precisions.

Returns

If return_all is False (default), returns a float32 BLEU score.

If return_all is True, returns a list of 5 float32 scores: [BLEU, 1-gram precision, ..., 4-gram precision].

corpus_bleu_transformer¶

texar.torch.evals.corpus_bleu_transformer(reference_corpus, translation_corpus, max_order=4, use_bp=True)[source]¶

Computes BLEU score of translated segments against references.

This BLEU has been used in evaluating Transformer (Vaswani et al.) “Attention is all you need” for machine translation. The resulting BLEU score are usually a bit higher than that in texar.torch.evals.corpus_bleu and texar.torch.evals.corpus_bleu_moses.

Parameters

reference_corpus – list of references for each translation. Each reference should be tokenized into a list of tokens.
translation_corpus – list of translations to score. Each translation should be tokenized into a list of tokens.
max_order – Maximum n-gram order to use when computing BLEU score.
use_bp – boolean, whether to apply brevity penalty.

Returns

BLEU score.

bleu_transformer_tokenize¶

texar.torch.evals.bleu_transformer_tokenize(string)[source]¶

Tokenize a string following the official BLEU implementation.

The BLEU scores from multi-bleu.perl depend on your tokenizer, which is unlikely to be reproducible from your experiment or consistent across different users. This function provides a standard tokenization following mteval-v14.pl.

See https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl#L954-L983. In our case, the input string is expected to be just one line and no HTML entities de-escaping is needed. So we just tokenize on punctuation and symbols, except when a punctuation is preceded and followed by a digit (e.g. a comma/dot as a thousand/decimal separator).

Note that a number (e.g. a year) followed by a dot at the end of sentence is NOT tokenized, i.e. the dot stays with the number because s/(p{P})(P{N})/ $1 $2/g does not match this case (unless we add a space after each sentence). However, this error is already in the original mteval-v14.pl and we want to be consistent with it.

Parameters: string – the input string
Returns: a list of tokens

file_bleu¶

texar.torch.evals.file_bleu(ref_filename, hyp_filename, bleu_version='corpus_bleu_transformer', case_sensitive=False)[source]¶

Compute BLEU for two files (reference and hypothesis translation).

Parameters

ref_filename – Reference file path.
hyp_filename – Hypothesis file path.
bleu_version – A str with the name of a BLEU computing method selected in the list of: corpus_bleu, corpus_bleu_moses, corpus_bleu_transformer.
case_sensitive – If False, lowercase reference and hypothesis tokens.

Returns

BLEU score.

Accuracy¶

accuracy¶

texar.torch.evals.accuracy(labels, preds)[source]¶

Calculates the accuracy of predictions.

Parameters

labels – The ground truth values. A Tensor of the same shape of preds.
preds – A Tensor of any shape containing the predicted values.

Returns

A float scalar Tensor containing the accuracy.

binary_clas_accurac¶

texar.torch.evals.binary_clas_accuracy(pos_preds=None, neg_preds=None)[source]¶

Calculates the accuracy of binary predictions.

Parameters

pos_preds (optional) – A Tensor of any shape containing the predicted values on positive data (i.e., ground truth labels are 1).
neg_preds (optional) – A Tensor of any shape containing the predicted values on negative data (i.e., ground truth labels are 0).

Returns

A float scalar Tensor containing the accuracy.