sparknlp_jsl.utils.training_log_parser_utils
#
Module Contents#
Functions#
|
|
|
Computes overall precision, recall and FB1. |
|
Counts the number of true positives, false positives and false negatives. |
|
Returns overall precision, recall and FB1 (default values are 0.0). |
|
Checks if the previous chunk ended between the previous and current word. |
|
Checks if a new chunk started between the previous and current word. |
|
Splits chunk tag into IOBES prefix and chunk_type. |
- aggregate_entities(metrics)#
- calc_metrics(tp, p, t, percent=True)#
Computes overall precision, recall and FB1.
(default values are 0.0) if percent is True, return 100 * original decimal value
- Parameters:
tp (int) – The number of true positives.
p (int) – The number of predicted positives.
t (int) – The number of true positives.
percent (bool, optional) – If True, return 100 * original decimal value. Defaults to True.
- count_chunks(true_seqs, pred_seqs)#
Counts the number of true positives, false positives and false negatives.
- Parameters:
true_seqs (list) – A list of true tags.
pred_seqs (list) – A list of predicted tags.
- Returns:
- A tuple containing the number of true positives, false positives and false negatives.
(correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts), where: correct_chunks: a dict (counter) where key = chunk types, value = number of correctly identified chunks per type true_chunks: a dict, number of true chunks per type pred_chunks: a dict, number of identified chunks per type correct_counts, true_counts, pred_counts: similar to above, but for tags
- Return type:
tuple
- get_result(correct_chunks, true_chunks, pred_chunks, correct_counts, true_counts, pred_counts, verbose=True)#
Returns overall precision, recall and FB1 (default values are 0.0).
if verbose, print overall performance, as well as performance per chunk type; otherwise, simply return overall prec, rec, f1 scores.
- Parameters:
correct_chunks (dict) – A dict (counter) where key = chunk types, value = number of correctly identified chunks per type.
true_chunks (dict) – A dict, number of true chunks per type.
pred_chunks (dict) – A dict, number of identified chunks per type.
correct_counts (dict) – A dict, number of correctly identified tags per type.
true_counts (dict) – A dict, number of true tags per type.
pred_counts (dict) – A dict, number of identified tags per type.
verbose (bool, optional) – If True, print overall performance, as well as performance per chunk type. Defaults to True.
- is_chunk_end(prev_tag, tag)#
Checks if the previous chunk ended between the previous and current word.
e.g. (B-PER, I-PER) -> False (B-LOC, O) -> True
Note: in case of contradicting tags, e.g. (B-PER, I-LOC) this is considered as (B-PER, B-LOC)
- Parameters:
prev_tag (str) – The previous chunk tag.
tag (str) – The current chunk tag.
- is_chunk_start(prev_tag, tag)#
Checks if a new chunk started between the previous and current word.
- Parameters:
prev_tag (str) – The previous chunk tag.
tag (str) – The current chunk tag.
- split_tag(chunk_tag)#
Splits chunk tag into IOBES prefix and chunk_type.
e.g. B-PER -> (B, PER) O -> (O, None)
- Parameters:
chunk_tag (str) – The chunk tag.