Description
Classify medical text according to PICO framework.
Predicted Entities
CONCLUSIONS
, DESIGN_SETTING
, INTERVENTION
, PARTICIPANTS
, FINDINGS
, MEASUREMENTS
, AIMS
.
Live Demo Open in Colab Copy S3 URI
How to use
document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')
tokenizer = Tokenizer().setInputCols('document').setOutputCol('token')
embeddings = BertEmbeddings.pretrained('biobert_pubmed_base_cased')\
.setInputCols(["document", 'token'])\
.setOutputCol("word_embeddings")
sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "word_embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")
classifier = ClassifierDLModel.pretrained('classifierdl_pico_biobert', 'en', 'clinical/models')\
.setInputCols(['document', 'token', 'sentence_embeddings']).setOutputCol('class')
nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, sentence_embeddings, classifier])
light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))
annotations = light_pipeline.fullAnnotate(["""A total of 10 adult daily smokers who reported at least one stressful event and coping episode and provided post-quit data.""", """When carbamazepine is withdrawn from the combination therapy, aripiprazole dose should then be reduced."""])
import nlu
nlu.load("en.classify.pico").predict("""A total of 10 adult daily smokers who reported at least one stressful event and coping episode and provided post-quit data.""")
Results
| sentences | class |
|------------------------------------------------------+--------------+
| A total of 10 adult daily smokers who reported at... | PARTICIPANTS |
| When carbamazepine is withdrawn from the combinat... | CONCLUSIONS |
Model Information
Model Name: | classifierdl_pico_biobert |
Compatibility: | Spark NLP 2.7.1+ |
License: | Licensed |
Edition: | Official |
Input Labels: | [sentence_embeddings] |
Output Labels: | [class] |
Language: | en |
Dependencies: | biobert_pubmed_base_cased |
Data Source
Trained on a custom dataset derived from PICO classification dataset.
Benchmarking
precision recall f1-score support
AIMS 0.9229 0.9186 0.9207 7815
CONCLUSIONS 0.8556 0.8401 0.8478 8837
DESIGN_SETTING 0.8556 0.7494 0.7990 11551
FINDINGS 0.8949 0.9342 0.9142 18827
INTERVENTION 0.6866 0.7508 0.7173 4920
MEASUREMENTS 0.7564 0.8664 0.8077 6505
PARTICIPANTS 0.8483 0.7559 0.7994 5539
accuracy 0.8495 63994
macro avg 0.8315 0.8308 0.8294 63994
weighted avg 0.8517 0.8495 0.8491 63994