Description
This model classifies the gender of the patient in the clinical document.
Predicted Entities
Female, ``Male, Unknown`.
How to use
To classify your text, you can use this model as part of an nlp pipeline with the following stages: DocumentAssembler, BertSentenceEmbeddings (biobert_pubmed_base_cased), ClassifierDLModel.
...
biobert_embeddings = BertEmbeddings().pretrained('biobert_pubmed_base_cased') \
    .setInputCols(["document","token"])\
    .setOutputCol("bert_embeddings")
sentence_embeddings = SentenceEmbeddings() \
    .setInputCols(["document", "bert_embeddings"]) \
    .setOutputCol("sentence_bert_embeddings") \
    .setPoolingStrategy("AVERAGE")
genderClassifier = ClassifierDLModel.pretrained('classifierdl_gender_biobert', 'en', 'clinical/models') \
    .setInputCols(["document", "sentence_bert_embeddings"]) \
    .setOutputCol("gender")
nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, gender_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
val documentAssembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")
val tokenizer = new Tokenizer()
    .setInputCols("document")
    .setOutputCol("token")
val biobert_embeddings = BertEmbeddings().pretrained("biobert_pubmed_base_cased")
    .setInputCols(Array("document","token"))
    .setOutputCol("bert_embeddings")
val sentence_embeddings = SentenceEmbeddings()
    .setInputCols(Array("document", "bert_embeddings"))
    .setOutputCol("sentence_bert_embeddings")
    .setPoolingStrategy("AVERAGE") 
val genderClassifier = ClassifierDLModel.pretrained("classifierdl_gender_biobert", "en", "clinical/models")
    .setInputCols(Array("document", "sentence_bert_embeddings"))
    .setOutputCol("gender")
val pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, gender_classifier))
val data = Seq("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""").toDS().toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.gender.biobert").predict("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
Results
Female
Model Information
| Model Name: | classifierdl_gender_biobert | 
| Type: | ClassifierDLModel | 
| Compatibility: | Healthcare NLP 2.6.5 + | 
| Edition: | Official | 
| License: | Licensed | 
| Input Labels: | [sentence_embeddings] | 
| Output Labels: | [class] | 
| Language: | [en] | 
| Case sensitive: | True | 
Data Source
This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.
Benchmarking
label          precision    recall    f1-score   support
Female          0.9224      0.8954    0.9087       239
Male            0.7895      0.8468    0.8171       124
Unknown         0.8077      0.7778    0.7925        54
accuracy                              0.8657       417
macro-avg       0.8399      0.8400    0.8394       417
weighted-avg    0.8680      0.8657    0.8664       417