Classifier for Genders - SBERT

Description

This model classifies the gender of the patient in the clinical document using context.

Predicted Entities

Female, Male, Unknown

Live Demo Open in Colab Copy S3 URI

How to use

document_assembler = DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

sbert_embedder = BertSentenceEmbeddings\
    .pretrained("sbiobert_base_cased_mli", 'en', 'clinical/models')\
    .setInputCols(["document"])\
    .setOutputCol("sentence_embeddings")

gender_classifier = ClassifierDLModel.pretrained('classifierdl_gender_sbert', 'en', 'clinical/models') \
    .setInputCols(["sentence_embeddings"])\
    .setOutputCol("class")

nlp_pipeline = Pipeline(stages=[document_assembler, sbert_embedder, gender_classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
val document_assembler = new DocumentAssembler()
	.setInputCol("text")
	.setOutputCol("document")
	
val sbert_embedder = BertSentenceEmbeddings
	.pretrained("sbiobert_base_cased_mli","en","clinical/models")
	.setInputCols(Array("document"))
	.setOutputCol("sentence_embeddings")
	
val gender_classifier = ClassifierDLModel.pretrained("classifierdl_gender_sbert","en","clinical/models")
	.setInputCols(Array("sentence_embeddings"))
	.setOutputCol("class")
	
val nlp_pipeline = nnew Pipeline().setStages(Array(
		 document_assembler,
		 sbert_embedder,
		 gender_classifier))

val data = Seq("""social history: shows that does not smoke cigarettes or drink alcohol,lives in a nursing home. family history: shows a family history of breast cancer.""").toDF("text")	

val result = nlp_pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.gender.sbert").predict("""social history: shows that  does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")

Results

Female

Model Information

Model Name: classifierdl_gender_sbert
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Dependencies: sbiobert_base_cased_mli

Data Source

This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.

Benchmarking

precision    recall  f1-score   support

Female     0.9390    0.9747    0.9565       237
Male     0.9561    0.8720    0.9121       125
Unknown     0.8491    0.8824    0.8654        51

accuracy                         0.9322       413
macro avg     0.9147    0.9097    0.9113       413
weighted avg     0.9331    0.9322    0.9318       413