Description
This model classifies the gender of the patient in the clinical document.
Predicted Entities
Female
, ``Male,
Unknown`.
How to use
To classify your text, you can use this model as part of an nlp pipeline with the following stages: DocumentAssembler, BertSentenceEmbeddings (biobert_pubmed_base_cased
), ClassifierDLModel.
...
biobert_embeddings = BertEmbeddings().pretrained('biobert_pubmed_base_cased') \
.setInputCols(["document","token"])\
.setOutputCol("bert_embeddings")
sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "bert_embeddings"]) \
.setOutputCol("sentence_bert_embeddings") \
.setPoolingStrategy("AVERAGE")
genderClassifier = ClassifierDLModel.pretrained('classifierdl_gender_biobert', 'en', 'clinical/models') \
.setInputCols(["document", "sentence_bert_embeddings"]) \
.setOutputCol("gender")
nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, gender_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
val tokenizer = new Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val biobert_embeddings = BertEmbeddings().pretrained("biobert_pubmed_base_cased")
.setInputCols(Array("document","token"))
.setOutputCol("bert_embeddings")
val sentence_embeddings = SentenceEmbeddings()
.setInputCols(Array("document", "bert_embeddings"))
.setOutputCol("sentence_bert_embeddings")
.setPoolingStrategy("AVERAGE")
val genderClassifier = ClassifierDLModel.pretrained("classifierdl_gender_biobert", "en", "clinical/models")
.setInputCols(Array("document", "sentence_bert_embeddings"))
.setOutputCol("gender")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, gender_classifier))
val data = Seq("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.gender.biobert").predict("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
Results
Female
Model Information
Model Name: | classifierdl_gender_biobert |
Type: | ClassifierDLModel |
Compatibility: | Healthcare NLP 2.6.5 + |
Edition: | Official |
License: | Licensed |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | [en] |
Case sensitive: | True |
Data Source
This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.
Benchmarking
label precision recall f1-score support
Female 0.9224 0.8954 0.9087 239
Male 0.7895 0.8468 0.8171 124
Unknown 0.8077 0.7778 0.7925 54
accuracy 0.8657 417
macro-avg 0.8399 0.8400 0.8394 417
weighted-avg 0.8680 0.8657 0.8664 417