Detect Anatomical References (biobert)

Description

Detect anatomical sites and references in medical text using pretrained NER model.

Predicted Entities

tissue_structure, Organism_substance, Developing_anatomical_structure, Cell, Cellular_component, Immaterial_anatomical_entity, Organ, Pathological_formation, Organism_subdivision, Anatomical_system, Tissue

Live Demo Open in Colab Download

How to use


...
embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")  .setInputCols(["sentence", "token"])  .setOutputCol("embeddings")
clinical_ner = MedicalNerModel.pretrained("ner_anatomy_biobert", "en", "clinical/models")   .setInputCols(["sentence", "token", "embeddings"])   .setOutputCol("ner")
...
nlpPipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings_clinical, clinical_ner, ner_converter])
model = nlpPipeline.fit(spark.createDataFrame([[""]]).toDF("text"))
results = model.transform(spark.createDataFrame([["EXAMPLE_TEXT"]]).toDF("text"))

...
val embeddings_clinical = BertEmbeddings.pretrained("biobert_pubmed_base_cased")
  .setInputCols(Array("sentence", "token"))
  .setOutputCol("embeddings")
val ner = MedicalNerModel.pretrained("ner_anatomy_biobert", "en", "clinical/models")
  .setInputCols(Array("sentence", "token", "embeddings"))
  .setOutputCol("ner")
...
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings_clinical, ner, ner_converter))
val result = pipeline.fit(Seq.empty[String]).transform(data)

Model Information

Model Name: ner_anatomy_biobert
Compatibility: Spark NLP for Healthcare 3.0.0+
License: Licensed
Edition: Official
Input Labels: [sentence, token, embeddings]
Output Labels: [ner]
Language: en

Benchmarking

+-------------------------------+-----+----+----+-----+---------+------+------+
|                         entity|   tp|  fp|  fn|total|precision|recall|    f1|
+-------------------------------+-----+----+----+-----+---------+------+------+
|                          Organ| 53.0|17.0|12.0| 65.0|   0.7571|0.8154|0.7852|
|         Pathological_formation| 83.0|23.0|14.0| 97.0|    0.783|0.8557|0.8177|
|             Organism_substance| 42.0| 1.0|14.0| 56.0|   0.9767|  0.75|0.8485|
|               tissue_structure|131.0|28.0|49.0|180.0|   0.8239|0.7278|0.7729|
|             Cellular_component| 17.0| 0.0|20.0| 37.0|      1.0|0.4595|0.6296|
|                         Tissue| 27.0| 4.0|16.0| 43.0|    0.871|0.6279|0.7297|
|              Anatomical_system| 15.0| 3.0| 8.0| 23.0|   0.8333|0.6522|0.7317|
|Developing_anatomical_structure|  2.0| 1.0| 3.0|  5.0|   0.6667|   0.4|   0.5|
|   Immaterial_anatomical_entity|  7.0| 2.0| 6.0| 13.0|   0.7778|0.5385|0.6364|
|                           Cell|180.0| 6.0|15.0|195.0|   0.9677|0.9231|0.9449|
|           Organism_subdivision| 11.0| 5.0|10.0| 21.0|   0.6875|0.5238|0.5946|
+-------------------------------+-----+----+----+-----+---------+------+------+

+------------------+
|             macro|
+------------------+
|0.7264701979913192|
+------------------+

+------------------+
|             micro|
+------------------+
|0.8108878300337679|
+------------------+