Description
This model classifies the gender of the patient in the clinical document.
Classified Labels
Female
, Male
, Unknown
.
How to use
To classify your text, you can use this model as part of an nlp pipeline with the following stages: DocumentAssembler, BertSentenceEmbeddings (sbiobert_base_cased_mli
), ClassifierDLModel.
...
sbert_embedder = BertSentenceEmbeddings\
.pretrained("sbiobert_base_cased_mli", 'en', 'clinical/models')\
.setInputCols(["document"])\
.setOutputCol("sentence_embeddings")\
.setMaxSentenceLength(512)
gender_classifier = ClassifierDLModel.pretrained( 'classifierdl_gender_sbert', 'en', 'clinical/models') \
.setInputCols(["document", "sentence_embeddings"]) \
.setOutputCol("class")
nlp_pipeline = Pipeline(stages=[document_assembler, sbert_embedder, gender_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
...
val sentence_embeddings = BertSentenceEmbeddings
.pretrained("sbiobert_base_cased_mli", "en", "clinical/models")
.setInputCols(Array("document"))
.setOutputCol("sentence_embeddings")
.setMaxSentenceLength(512)
val gender_classifier = ClassifierDLModel.pretrained("classifierdl_gender_sbert", "en", "clinical/models")
.setInputCols(Array("document", "sentence_embeddings"))
.setOutputCol("class")
val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_embeddings, gender_classifier))
val result = pipeline.fit(Seq.empty["social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer."].toDS.toDF("text")).transform(data)
Results
Female
Model Information
Model Name: | classifierdl_gender_sbert |
Type: | ClassifierDLModel |
Compatibility: | Spark NLP for Healthcare 2.6.5 + |
Edition: | Official |
License: | Licensed |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | [en] |
Case sensitive: | True |
Data Source
This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.
Benchmarking
precision recall f1-score support
Female 0.9224 0.8954 0.9087 239
Male 0.7895 0.8468 0.8171 124
Unknown 0.8077 0.7778 0.7925 54
accuracy 0.8657 417
macro avg 0.8399 0.8400 0.8394 417
weighted avg 0.8680 0.8657 0.8664 417