`sparknlp_jsl.utils.training_log_parser_utils`#

Module Contents#

`aggregate_entities`(metrics)
`calc_metrics`(tp, p, t[, percent])	Computes overall precision, recall and FB1.
`count_chunks`(true_seqs, pred_seqs)	Counts the number of true positives, false positives and false negatives.
`get_result`(correct_chunks, true_chunks, pred_chunks, ...)	Returns overall precision, recall and FB1 (default values are 0.0).
`is_chunk_end`(prev_tag, tag)	Checks if the previous chunk ended between the previous and current word.
`is_chunk_start`(prev_tag, tag)	Checks if a new chunk started between the previous and current word.
`split_tag`(chunk_tag)	Splits chunk tag into IOBES prefix and chunk_type.

calc_metrics(tp, p, t, percent=True)#

Computes overall precision, recall and FB1.

(default values are 0.0) if percent is True, return 100 * original decimal value

Parameters:

tp (int) – The number of true positives.
p (int) – The number of predicted positives.
t (int) – The number of true positives.
percent (bool, optional) – If True, return 100 * original decimal value. Defaults to True.

count_chunks(true_seqs, pred_seqs)#

Counts the number of true positives, false positives and false negatives.

Parameters:

Returns:

A tuple containing the number of true positives, false positives and false negatives.: (correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts), where: correct_chunks: a dict (counter) where key = chunk types, value = number of correctly identified chunks per type true_chunks: a dict, number of true chunks per type pred_chunks: a dict, number of identified chunks per type correct_counts, true_counts, pred_counts: similar to above, but for tags

Return type:

tuple

get_result(correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts, verbose=True)#

Returns overall precision, recall and FB1 (default values are 0.0).

if verbose, print overall performance, as well as performance per chunk type; otherwise, simply return overall prec, rec, f1 scores.

Parameters:

correct_chunks (dict) – A dict (counter) where key = chunk types, value = number of correctly identified chunks per type.
true_chunks (dict) – A dict, number of true chunks per type.
pred_chunks (dict) – A dict, number of identified chunks per type.
correct_counts (dict) – A dict, number of correctly identified tags per type.
true_counts (dict) – A dict, number of true tags per type.
pred_counts (dict) – A dict, number of identified tags per type.
verbose (bool, optional) – If True, print overall performance, as well as performance per chunk type. Defaults to True.

is_chunk_end(prev_tag, tag)#

Checks if the previous chunk ended between the previous and current word.

e.g. (B-PER, I-PER) -> False (B-LOC, O) -> True

Note: in case of contradicting tags, e.g. (B-PER, I-LOC) this is considered as (B-PER, B-LOC)

Parameters:

is_chunk_start(prev_tag, tag)#

Checks if a new chunk started between the previous and current word.

Parameters:

split_tag(chunk_tag)#

Splits chunk tag into IOBES prefix and chunk_type.

e.g. B-PER -> (B, PER) O -> (O, None)