Description
This model classifies the gender of the patient in the clinical document using context.
Predicted Entities
Female
, Male
, Unknown
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = Tokenizer()\
.setInputCols(['document'])\
.setOutputCol('token')
biobert_embeddings = BertEmbeddings().pretrained("biobert_pubmed_base_cased") \
.setInputCols(["document", "token"])\
.setOutputCol("bert_embeddings")
sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "bert_embeddings"]) \
.setOutputCol("sentence_bert_embeddings") \
.setPoolingStrategy("AVERAGE")
genderClassifier = ClassifierDLModel.pretrained("classifierdl_gender_biobert", "en", "clinical/models") \
.setInputCols(["document", "sentence_bert_embeddings"]) \
.setOutputCol("gender")
nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, genderClassifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")))
annotations = light_pipeline.fullAnnotate("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
val document_assembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")
tokenizer = Tokenizer()
.setInputCols("document")
.setOutputCol("token")
val biobert_embeddings = BertEmbeddings().pretrained("biobert_pubmed_base_cased")
.setInputCols(Array("document","token"))
.setOutputCol("bert_embeddings")
val sentence_embeddings = new SentenceEmbeddings()
.setInputCols(Array("document", "bert_embeddings"))
.setOutputCol("sentence_bert_embeddings")
.setPoolingStrategy("AVERAGE")
val genderClassifier = ClassifierDLModel.pretrained("classifierdl_gender_biobert", "en", "clinical/models")
.setInputCols(Array("document", "sentence_bert_embeddings"))
.setOutputCol("gender")
val nlp_pipeline = new Pipeline().setStages(Array(document_assembler, tokenizer, biobert_embeddings, sentence_embeddings, genderClassifier))
val data = Seq("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""").toDS.toDF("text")
val result = pipeline.fit(data).transform(data)
import nlu
nlu.load("en.classify.gender.biobert").predict("""social history: shows that does not smoke cigarettes or drink alcohol, lives in a nursing home. family history: shows a family history of breast cancer.""")
Results
Female
Model Information
Model Name: | classifierdl_gender_biobert |
Compatibility: | Spark NLP 2.7.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Dependencies: | biobert_pubmed_base_cased |
Data Source
This model is trained on more than four thousands clinical documents (radiology reports, pathology reports, clinical visits etc.), annotated internally.
Benchmarking
label precision recall f1-score support
Female 0.9020 0.9364 0.9189 236
Male 0.8761 0.7857 0.8285 126
Unknown 0.7091 0.7647 0.7358 51
accuracy - - 0.8692 413
macro-avg 0.8291 0.8290 0.8277 413
weighted-avg 0.8703 0.8692 0.8687 413