Description
Pretrained named entity recognition deep learning model for clinical terms in Japanese. The SparkNLP deep learning model (MedicalNerModel) is inspired by a former state of the art model for NER: Chiu & Nicols, Named Entity Recognition with Bidirectional LSTM-CNN.
Predicted Entities
PROBLEM
, TEST
, TREATMENT
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\
.setInputCols(["document"])\
.setOutputCol("sentence")
tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")
clinical_embeddings = BertEmbeddings.pretrained("bert_embeddings_bert_large_japanese","ja") \
.setInputCols(["document", "token"]) \
.setOutputCol("embeddings")
ner_model = MedicalNerModel.pretrained("ner_clinical", "ja", "clinical/models")\
.setInputCols(["sentence", "token", "embeddings"])\
.setOutputCol("ner")
ner_converter = NerConverterInternal()\
.setInputCols(["sentence", "token", "ner"])\
.setOutputCol("ner_chunk")
pipeline = Pipeline(stages=[
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter
])
sample_df = spark.createDataFrame([["""中等度肺高血圧 、 PA圧 48/24、 1+僧帽弁逆流 、 重度大動脈弁狭窄 、 LVEDP 19、 駆出率 43%。 クロトリマゾール 、1錠 p.o . q.i.d .;"""]]).toDF("text")
result = pipeline.fit(sample_df).transform(sample_df)
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")
.setInputCols("document")
.setOutputCol("sentence")
val tokenizer = new Tokenizer()
.setInputCols("sentence")
.setOutputCol("token")
val clinical_embeddings = BertEmbeddings.pretrained("bert_embeddings_bert_large_japanese","ja")
.setInputCols(Array("document", "token"))
.setOutputCol("embeddings")
val ner_model = MedicalNerModel.pretrained("ner_clinical", "ja", "clinical/models")
.setInputCols(Array("sentence", "token", "embeddings"))
.setOutputCol("ner")
val ner_converter = new NerConverterInternal()
.setInputCols(Array("sentence", "token", "ner"))
.setOutputCol("ner_chunk")
val pipeline = new Pipeline().setStages(Array(
document_assembler,
sentence_detector,
tokenizer,
clinical_embeddings,
ner_model,
ner_converter))
val sample_data = Seq("""中等度肺高血圧 、 PA圧 48/24、 1+僧帽弁逆流 、 重度大動脈弁狭窄 、 LVEDP 19、 駆出率 43%。 クロトリマゾール 、1錠 p.o . q.i.d .;""").toDS.toDF("text")
val result = pipeline.fit(sample_data).transform(sample_data)
Results
+----------------+-----+---+---------+
|chunk |begin|end|ner_label|
+----------------+-----+---+---------+
|中等度肺高血圧 |0 |6 |PROBLEM |
|PA圧 |10 |12 |TEST |
|1+僧帽弁逆流 |21 |27 |PROBLEM |
|重度大動脈弁狭窄|31 |38 |PROBLEM |
|LVEDP |42 |46 |TEST |
|駆出率 |52 |54 |TEST |
|クロトリマゾール|61 |68 |TREATMENT|
+----------------+-----+---+---------+
Model Information
Model Name: | ner_clinical |
Compatibility: | Healthcare NLP 5.1.0+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence, token, embeddings] |
Output Labels: | [ner] |
Language: | ja |
Size: | 4.3 MB |
Benchmarking
label precision recall f1-score support
TEST 0.90 0.90 0.90 105
PROBLEM 0.86 0.90 0.89 134
TREATMENT 0.71 0.61 0.66 36
micro-avg 0.86 0.86 0.86 275
macro-avg 0.83 0.80 0.81 275
weighted-avg 0.86 0.86 0.86 275