Description
This model classifies the gender of the patient in the clinical document using context.
Predicted Entities
Female
, Male
, Unknown
Live Demo Open in Colab Download
How to use
document_assembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
tokenizer = Tokenizer().setInputCols(['document']).setOutputCol('token')
biobert_embeddings = BertEmbeddings().pretrained('biobert_pubmed_base_cased') \
.setInputCols(["document",'token'])\
.setOutputCol("bert_embeddings")
sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "bert_embeddings"]) \
.setOutputCol("sentence_bert_embeddings") \
.setPoolingStrategy("AVERAGE")
genderClassifier = ClassifierDLModel.pretrained('classifierdl_gender_biobert', 'en', 'clinical/models') \
.setInputCols(["document", "sentence_bert_embeddings"]) \
.setOutputCol("gender")
nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, gender_classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
Results
Female
Model Information
Model Name: | classifierdl_gender_biobert |
Compatibility: | Spark NLP 2.7.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Dependencies: | biobert_pubmed_base_cased |
Data Source
This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.
Benchmarking
precision recall f1-score support
Female 0.9020 0.9364 0.9189 236
Male 0.8761 0.7857 0.8285 126
Unknown 0.7091 0.7647 0.7358 51
accuracy 0.8692 413
macro avg 0.8291 0.8290 0.8277 413
weighted avg 0.8703 0.8692 0.8687 413