Evaluations¶
BLEU¶
sentence_bleu¶
- texar.torch.evals.sentence_bleu(references, hypothesis, max_order=4, lowercase=False, smooth=False, use_bp=True, return_all=False)[source]¶
Calculates BLEU score of a hypothesis sentence.
- Parameters
references – A list of reference for the hypothesis. Each reference can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
hypothesis – A hypothesis sentence. Each hypothesis can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
lowercase (bool) – If True, lowercase reference and hypothesis tokens.
max_order (int) – Maximum n-gram order to use when computing BLEU score.
smooth (bool) – Whether or not to apply (Lin et al. 2004) smoothing.
use_bp (bool) – Whether to apply brevity penalty.
return_all (bool) – If True, returns BLEU and all n-gram precisions.
- Returns
If
return_all
is False (default), returns a float32 BLEU score.If
return_all
is True, returns a list of float32[BLEU] + n-gram precisions
, which is of lengthmax_order
+1.
corpus_bleu¶
- texar.torch.evals.corpus_bleu(list_of_references, hypotheses, max_order=4, lowercase=False, smooth=False, use_bp=True, return_all=False)[source]¶
Computes corpus-level BLEU score.
- Parameters
list_of_references – A list of lists of references for each hypothesis. Each reference can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
hypotheses – A list of hypothesis sentences. Each hypothesis can be either a list of string tokens, or a string containing tokenized tokens separated with whitespaces. List can also be numpy array.
lowercase (bool) – If True, lowercase reference and hypothesis tokens.
max_order (int) – Maximum n-gram order to use when computing BLEU score.
smooth (bool) – Whether or not to apply (Lin et al. 2004) smoothing.
use_bp (bool) – Whether to apply brevity penalty.
return_all (bool) – If True, returns BLEU and all n-gram precisions.
- Returns
If
return_all
is False (default), returns afloat32
BLEU score.If
return_all
is True, returns a list offloat32
scores:[BLEU] + n-gram precisions
, which is of lengthmax_order
+1.
sentence_bleu_moses¶
- texar.torch.evals.sentence_bleu_moses(references, hypothesis, lowercase=False, return_all=False)[source]¶
Calculates BLEU score of a hypothesis sentence using the MOSES `multi-bleu.perl` script.
- Parameters
references – A list of reference for the hypothesis. Each reference can be either a string, or a list of string tokens. List can also be numpy array.
hypothesis – A hypothesis sentence. The hypothesis can be either a string, or a list of string tokens. List can also be numpy array.
lowercase (bool) – If True, pass the
"-lc"
flag to the multi-bleu script.return_all (bool) – If True, returns BLEU and all n-gram precisions.
- Returns
If
return_all
is False (default), returns afloat32
BLEU score.If
return_all
is True, returns a list of 5float32
scores:[BLEU, 1-gram precision, ..., 4-gram precision]
.
corpus_bleu_moses¶
- texar.torch.evals.corpus_bleu_moses(list_of_references, hypotheses, lowercase=False, return_all=False)[source]¶
Calculates corpus-level BLEU score using the MOSES `multi-bleu.perl` script.
- Parameters
list_of_references – A list of lists of references for each hypothesis. Each reference can be either a string, or a list of string tokens. List can also be numpy array.
hypotheses – A list of hypothesis sentences. Each hypothesis can be either a string, or a list of string tokens. List can also be numpy array.
lowercase (bool) – If True, pass the
"-lc"
flag to the multi-bleu script.return_all (bool) – If True, returns BLEU and all n-gram precisions.
- Returns
If
return_all
is False (default), returns afloat32
BLEU score.If
return_all
is True, returns a list of 5float32
scores:[BLEU, 1-gram precision, ..., 4-gram precision]
.
corpus_bleu_transformer¶
- texar.torch.evals.corpus_bleu_transformer(reference_corpus, translation_corpus, max_order=4, use_bp=True)[source]¶
Computes BLEU score of translated segments against references.
This BLEU has been used in evaluating Transformer (Vaswani et al.) “Attention is all you need” for machine translation. The resulting BLEU score are usually a bit higher than that in texar.torch.evals.corpus_bleu and texar.torch.evals.corpus_bleu_moses.
- Parameters
reference_corpus – list of references for each translation. Each reference should be tokenized into a list of tokens.
translation_corpus – list of translations to score. Each translation should be tokenized into a list of tokens.
max_order – Maximum n-gram order to use when computing BLEU score.
use_bp – boolean, whether to apply brevity penalty.
- Returns
BLEU score.
bleu_transformer_tokenize¶
- texar.torch.evals.bleu_transformer_tokenize(string)[source]¶
Tokenize a string following the official BLEU implementation.
The BLEU scores from multi-bleu.perl depend on your tokenizer, which is unlikely to be reproducible from your experiment or consistent across different users. This function provides a standard tokenization following mteval-v14.pl.
See https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl#L954-L983. In our case, the input string is expected to be just one line and no HTML entities de-escaping is needed. So we just tokenize on punctuation and symbols, except when a punctuation is preceded and followed by a digit (e.g. a comma/dot as a thousand/decimal separator).
Note that a number (e.g. a year) followed by a dot at the end of sentence is NOT tokenized, i.e. the dot stays with the number because s/(p{P})(P{N})/ $1 $2/g does not match this case (unless we add a space after each sentence). However, this error is already in the original mteval-v14.pl and we want to be consistent with it.
- Parameters
string – the input string
- Returns
a list of tokens
file_bleu¶
- texar.torch.evals.file_bleu(ref_filename, hyp_filename, bleu_version='corpus_bleu_transformer', case_sensitive=False)[source]¶
Compute BLEU for two files (reference and hypothesis translation).
- Parameters
ref_filename – Reference file path.
hyp_filename – Hypothesis file path.
bleu_version – A str with the name of a BLEU computing method selected in the list of: corpus_bleu, corpus_bleu_moses, corpus_bleu_transformer.
case_sensitive – If False, lowercase reference and hypothesis tokens.
- Returns
BLEU score.
Accuracy¶
accuracy¶
binary_clas_accurac¶
- texar.torch.evals.binary_clas_accuracy(pos_preds=None, neg_preds=None)[source]¶
Calculates the accuracy of binary predictions.
- Parameters
pos_preds (optional) – A Tensor of any shape containing the predicted values on positive data (i.e., ground truth labels are 1).
neg_preds (optional) – A Tensor of any shape containing the predicted values on negative data (i.e., ground truth labels are 0).
- Returns
A float scalar Tensor containing the accuracy.