Detect Anatomical References (biobert)

Description

Detect anatomical sites and references in medical text using pretrained NER model.

Predicted Entities

tissue_structure, Organism_substance, Developing_anatomical_structure, Cell, Cellular_component, Immaterial_anatomical_entity, Organ, Pathological_formation, Organism_subdivision, Anatomical_system, Tissue

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")
         
sentence_detector = SentenceDetector()\
    .setInputCols(["document"])\
    .setOutputCol("sentence")

tokenizer = Tokenizer()\
    .setInputCols(["sentence"])\
    .setOutputCol("token")

embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")\
    .setInputCols(["sentence", "token"])\
    .setOutputCol("embeddings")

clinical_ner = MedicalNerModel.pretrained("ner_anatomy_biobert", "en", "clinical/models")\
    .setInputCols(["sentence", "token", "embeddings"])\
    .setOutputCol("ner")

ner_converter = NerConverter()\
 	.setInputCols(["sentence", "token", "ner"])\
 	.setOutputCol("ner_chunk")
    
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])

model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))

results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))
val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
         
val sentence_detector = new SentenceDetector()
    .setInputCols("document")
    .setOutputCol("sentence")

val tokenizer = new Tokenizer()
    .setInputCols("sentence")
    .setOutputCol("token")

val embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
    .setInputCols(Array("sentence", "token"))
    .setOutputCol("embeddings")

val ner = MedicalNerModel.pretrained("ner_anatomy_biobert", "en", "clinical/models")
    .setInputCols(Array("sentence", "token", "embeddings"))
    .setOutputCol("ner")

val ner_converter = new NerConverter()
 	.setInputCols(Array("sentence", "token", "ner"))
 	.setOutputCol("ner_chunk")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.med_ner.anatomy.biobert").predict("""Put your text here.""")

Model Information

Model Name: ner_anatomy_biobert
Compatibility: Healthcare NLP 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

+-------------------------------+-----+----+----+-----+---------+------+------+
|                         entity|   tp|  fp|  fn|total|precision|recall|    f1|
+-------------------------------+-----+----+----+-----+---------+------+------+
|                          Organ| 53.0|17.0|12.0| 65.0|   0.7571|0.8154|0.7852|
|         Pathological_formation| 83.0|23.0|14.0| 97.0|    0.783|0.8557|0.8177|
|             Organism_substance| 42.0| 1.0|14.0| 56.0|   0.9767|  0.75|0.8485|
|               tissue_structure|131.0|28.0|49.0|180.0|   0.8239|0.7278|0.7729|
|             Cellular_component| 17.0| 0.0|20.0| 37.0|      1.0|0.4595|0.6296|
|                         Tissue| 27.0| 4.0|16.0| 43.0|    0.871|0.6279|0.7297|
|              Anatomical_system| 15.0| 3.0| 8.0| 23.0|   0.8333|0.6522|0.7317|
|Developing_anatomical_structure|  2.0| 1.0| 3.0|  5.0|   0.6667|   0.4|   0.5|
|   Immaterial_anatomical_entity|  7.0| 2.0| 6.0| 13.0|   0.7778|0.5385|0.6364|
|                           Cell|180.0| 6.0|15.0|195.0|   0.9677|0.9231|0.9449|
|           Organism_subdivision| 11.0| 5.0|10.0| 21.0|   0.6875|0.5238|0.5946|
+-------------------------------+-----+----+----+-----+---------+------+------+

+------------------+
|             macro|
+------------------+
|0.7264701979913192|
+------------------+

+------------------+
|             micro|
+------------------+
|0.8108878300337679|
+------------------+