sparknlp_jsl.utils.training_log_parser_utils#

Module Contents#

Functions#

aggregate_entities(metrics)

calc_metrics(tp, p, t[, percent])

Computes overall precision, recall and FB1.

count_chunks(true_seqs, pred_seqs)

Counts the number of true positives, false positives and false negatives.

get_result(correct_chunks, true_chunks, pred_chunks, ...)

Returns overall precision, recall and FB1 (default values are 0.0).

is_chunk_end(prev_tag, tag)

Checks if the previous chunk ended between the previous and current word.

is_chunk_start(prev_tag, tag)

Checks if a new chunk started between the previous and current word.

split_tag(chunk_tag)

Splits chunk tag into IOBES prefix and chunk_type.

aggregate_entities(metrics)#
calc_metrics(tp, p, t, percent=True)#

Computes overall precision, recall and FB1.

(default values are 0.0) if percent is True, return 100 * original decimal value

Parameters:
  • tp (int) – The number of true positives.

  • p (int) – The number of predicted positives.

  • t (int) – The number of true positives.

  • percent (bool, optional) – If True, return 100 * original decimal value. Defaults to True.

count_chunks(true_seqs, pred_seqs)#

Counts the number of true positives, false positives and false negatives.

Parameters:
  • true_seqs (list) – A list of true tags.

  • pred_seqs (list) – A list of predicted tags.

Returns:

A tuple containing the number of true positives, false positives and false negatives.

(correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts), where: correct_chunks: a dict (counter) where key = chunk types, value = number of correctly identified chunks per type true_chunks: a dict, number of true chunks per type pred_chunks: a dict, number of identified chunks per type correct_counts, true_counts, pred_counts: similar to above, but for tags

Return type:

tuple

get_result(correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts, verbose=True)#

Returns overall precision, recall and FB1 (default values are 0.0).

if verbose, print overall performance, as well as performance per chunk type; otherwise, simply return overall prec, rec, f1 scores.

Parameters:
  • correct_chunks (dict) – A dict (counter) where key = chunk types, value = number of correctly identified chunks per type.

  • true_chunks (dict) – A dict, number of true chunks per type.

  • pred_chunks (dict) – A dict, number of identified chunks per type.

  • correct_counts (dict) – A dict, number of correctly identified tags per type.

  • true_counts (dict) – A dict, number of true tags per type.

  • pred_counts (dict) – A dict, number of identified tags per type.

  • verbose (bool, optional) – If True, print overall performance, as well as performance per chunk type. Defaults to True.

is_chunk_end(prev_tag, tag)#

Checks if the previous chunk ended between the previous and current word.

e.g. (B-PER, I-PER) -> False (B-LOC, O) -> True

Note: in case of contradicting tags, e.g. (B-PER, I-LOC) this is considered as (B-PER, B-LOC)

Parameters:
  • prev_tag (str) – The previous chunk tag.

  • tag (str) – The current chunk tag.

is_chunk_start(prev_tag, tag)#

Checks if a new chunk started between the previous and current word.

Parameters:
  • prev_tag (str) – The previous chunk tag.

  • tag (str) – The current chunk tag.

split_tag(chunk_tag)#

Splits chunk tag into IOBES prefix and chunk_type.

e.g. B-PER -> (B, PER) O -> (O, None)

Parameters:

chunk_tag (str) – The chunk tag.