Classifier for Genders - SBERT

Description

This model classifies the gender of the patient in the clinical document.

Predicted Entities

Female, Male, Unknown.

Open in Colab Copy S3 URI

How to use

To classify your text, you can use this model as part of an nlp pipeline with the following stages: DocumentAssembler, BertSentenceEmbeddings (sbiobert_base_cased_mli), ClassifierDLModel.

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sbert_embedder = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")\
    .setMaxSentenceLength(512)

gender_classifier = ClassifierDLModel.pretrained("classifierdl_gender_sbert", "en", "clinical/models") \
    .setInputCols(["document", "sentence_embeddings"]) \
    .setOutputCol("class")

nlp_pipeline = Pipeline(stages=[document_assembler, sbert_embedder, gender_classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))

annotations = light_pipeline.fullAnnotate("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")

val document_assembler = new DocumentAssembler()
    .setInputCol("text")
    .setOutputCol("document")

val sentence_embeddings = BertSentenceEmbeddings.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
    .setInputCols("document")
    .setOutputCol("sentence_embeddings")
    .setMaxSentenceLength(512)

val gender_classifier = ClassifierDLModel.pretrained("classifierdl_gender_sbert", "en", "clinical/models")
    .setInputCols(Array("document", "sentence_embeddings"))
    .setOutputCol("class")

val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_embeddings, gender_classifier))

val data = Seq("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""").toDS().toDF("text")

val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.gender.sbert").predict("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")

Results

Female

Model Information

Model Name: classifierdl_gender_sbert
Type: ClassifierDLModel
Compatibility: Healthcare NLP 2.6.5 +
Edition: Official
License: Licensed
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: [en]
Case sensitive: True

Data Source

This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.

Benchmarking

label           precision    recall    f1-score   support
Female           0.9224      0.8954    0.9087       239
Male             0.7895      0.8468    0.8171       124
Unknown          0.8077      0.7778    0.7925        54
accuracy          -           -        0.8657       417
macro-avg        0.8399      0.8400    0.8394       417
weighted-avg     0.8680      0.8657    0.8664       417