Classifier for Genders - SBERT

Description

This model classifies the gender of the patient in the clinical document.

Classified Labels

Female, Male, Unknown.

Open in Colab Download

How to use

To classify your text, you can use this model as part of an nlp pipeline with the following stages: DocumentAssembler, BertSentenceEmbeddings (sbiobert_base_cased_mli), ClassifierDLModel.

...
sbert_embedder = BertSentenceEmbeddings\
     .pretrained("sbiobert_base_cased_mli", 'en', 'clinical/models')\
     .setInputCols(["document"])\
     .setOutputCol("sentence_embeddings")\
     .setMaxSentenceLength(512)
gender_classifier = ClassifierDLModel.pretrained( 'classifierdl_gender_sbert', 'en', 'clinical/models') \
               .setInputCols(["document", "sentence_embeddings"]) \
               .setOutputCol("class")
nlp_pipeline = Pipeline(stages=[document_assembler, sbert_embedder, gender_classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")

...
val sentence_embeddings = BertSentenceEmbeddings
     .pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
     .setInputCols(Array("document"))
     .setOutputCol("sentence_embeddings")
     .setMaxSentenceLength(512)
val gender_classifier = ClassifierDLModel.pretrained("classifierdl_gender_sbert", "en", "clinical/models")
               .setInputCols(Array("document", "sentence_embeddings"))
               .setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_embeddings, gender_classifier))

val result = pipeline.fit(Seq.empty["social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer."].toDS.toDF("text")).transform(data)

Results

Female

Model Information

Model Name: classifierdl_gender_sbert
Type: ClassifierDLModel
Compatibility: Spark NLP for Healthcare 2.6.5 +
Edition: Official
License: Licensed
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: [en]
Case sensitive: True

Data Source

This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.

Benchmarking

  precision    recall  f1-score   support

      Female     0.9224    0.8954    0.9087       239
        Male     0.7895    0.8468    0.8171       124
     Unknown     0.8077    0.7778    0.7925        54

    accuracy                         0.8657       417
   macro avg     0.8399    0.8400    0.8394       417
weighted avg     0.8680    0.8657    0.8664       417