Relation Extraction between Tests, Results, and Dates

Description

Relation extraction between lab test names, their findings, measurements, results, and date.

Predicted Entities

is_finding_of, is_result_of, is_date_of, O.

Open in Colab Download

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel

ner_tagger = sparknlp.annotators.NerDLModel()\
    .pretrained('jsl_ner_wip_clinical','en','clinical/models')\
    .setInputCols("sentences", "tokens", "embeddings")\
    .setOutputCol("ner_tags") 

re_model = RelationExtractionModel()\
    .pretrained("re_test_result_date", "en", 'clinical/models')\
    .setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])\
    .setOutputCol("relations")\
    .setMaxSyntacticDistance(4)\ #default: 0
    .setPredictionThreshold(0.9)\ #default: 0.5
    .setRelationPairs(["external_body_part_or_region-test"]) # Possible relation pairs. Default: All Relations.

nlp_pipeline = Pipeline(stages=[ documenter, sentencer,tokenizer, words_embedder, pos_tagger,  clinical_ner_tagger,ner_chunker, dependency_parser,re_model])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate(''''He was advised chest X-ray or CT scan after checking his SpO2 which was <= 93%''')

Results

| index | relations    | entity1      | chunk1              | entity2      |  chunk2 |
|-------|--------------|--------------|---------------------|--------------|---------|
| 0     | O            | TEST         | chest X-ray         | MEASUREMENTS |  93%    | 
| 1     | O            | TEST         | CT scan             | MEASUREMENTS |  93%    |
| 2     | is_result_of | TEST         | SpO2                | MEASUREMENTS |  93%    |

Model Information

Model Name: re_test_result_date
Type: re
Compatibility: Spark NLP for Healthcare 2.7.4+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en

Data Source

Trained on internal data.

Benchmarking

| relation        | prec |
|-----------------|------|
| O               | 0.77 |
| is_finding_of   | 0.80 |
| is_result_of    | 0.96 |
| is_date_of      | 0.94 |