Relation extraction between dates and clinical entities

Description

Relation extraction between date and related other entities. 1 : Shows there is a relation between the date entity and other clinical entities, 0 : Shows there is no relation between the date entity and other clinical entities.

Predicted Entities

0, 1

Open in Colab Download

How to use

Use as part of an nlp pipeline with the following stages: DocumentAssembler, SentenceDetector, Tokenizer, PerceptronModel, DependencyParserModel, WordEmbeddingsModel, NerDLModel, NerConverter, RelationExtractionModel.

ner_tagger = sparknlp.annotators.NerDLModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")\ 
  .setInputCols("sentences", "tokens", "embeddings")\ 
  .setOutputCol("ner_tags")

re_model = RelationExtractionModel()
.pretrained("re_date", "en", 'clinical/models')
.setInputCols(["embeddings", "pos_tags", "ner_chunks", "dependencies"])
.setOutputCol("relations")
.setMaxSyntacticDistance(3)\ #default: 0 .setPredictionThreshold(0.9)\ #default: 0.5 .setRelationPairs(["test-date", "symptom-date"]) # Possible relation pairs. Default: All Relations.

nlp_pipeline = Pipeline(stages=[documenter, sentencer,tokenizer, words_embedder, pos_tagger, ner_tagger, ner_chunker, dependency_parser,re_model])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate('''This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.''')
...
val ner_tagger = sparknlp.annotators.NerDLModel().pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
      .setInputCols("sentences", "tokens", "embeddings")
      .setOutputCol("ner_tags")

val re_model = RelationExtractionModel()
        .pretrained("re_date", "en", 'clinical/models')
        .setInputCols(Array("embeddings", "pos_tags", "ner_chunks", "dependencies"))
        .setOutputCol("relations")
        .setMaxSyntacticDistance(3) #default: 0 
        .setPredictionThreshold(0.9) #default: 0.5 
        .setRelationPairs(Array("test-date", "symptom-date")) # Possible relation pairs. Default: All Relations.

val nlpPipeline = new Pipeline().setStages(Array(documenter, sentencer,tokenizer, words_embedder, pos_tagger, ner_tagger, ner_chunker, dependency_parser,re_model))

val result = pipeline.fit(Seq.empty[String]).transform(data)

val annotations = light_pipeline.fullAnnotate('''This 73 y/o patient had CT on 1/12/95, with progressive memory and cognitive decline since 8/11/94.''')

Results

|   | relations | entity1 | entity1_begin | entity1_end | chunk1                                   | entity2 | entity2_end | entity2_end | chunk2  | confidence |
|---|-----------|---------|---------------|-------------|------------------------------------------|---------|-------------|-------------|---------|------------|
| 0 | 1         | Test    | 24            | 25          | CT                                       | Date    | 31          | 37          | 1/12/95 | 1.0        |
| 1 | 1         | Symptom | 45            | 84          | progressive memory and cognitive decline | Date    | 92          | 98          | 8/11/94 | 1.0        |

Model Information

Model Name: re_date_clinical
Type: re
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [embeddings, pos_tags, train_ner_chunks, dependencies]
Output Labels: [relations]
Language: en
Dependencies: embeddings_clinical

Data Source

Trained on data gathered and manually annotated by John Snow Labs

Benchmarking

| relation | recall | precision | f1   |
|----------|--------|-----------|------|
| 0        | 0.74   | 0.71      | 0.72 |
| 1        | 0.94   | 0.95      | 0.94 |