PICO Classifier

Description

Classify medical text according to PICO framework.

Predicted Entities

CONCLUSIONS, DESIGN_SETTING, INTERVENTION, PARTICIPANTS, FINDINGS, MEASUREMENTS, AIMS.

Live Demo Open in Colab Download

How to use

document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')

tokenizer = Tokenizer().setInputCols('document').setOutputCol('token')

embeddings = BertEmbeddings.pretrained('biobert_pubmed_base_cased')\
.setInputCols(["document", 'token'])\
.setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings() \
.setInputCols(["document", "word_embeddings"]) \
.setOutputCol("sentence_embeddings") \
.setPoolingStrategy("AVERAGE")

classifier = ClassifierDLModel.pretrained('classifierdl_pico_biobert', 'en', 'clinical/models')\
.setInputCols(['document', 'token', 'sentence_embeddings']).setOutputCol('class')

nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, sentence_embeddings, classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate(["""A total of 10 adult daily smokers who reported at least one stressful event and coping episode and provided post-quit data.""", """When carbamazepine is withdrawn from the combination therapy, aripiprazole dose should then be reduced."""])
import nlu
nlu.load("en.classify.pico").predict("""A total of 10 adult daily smokers who reported at least one stressful event and coping episode and provided post-quit data.""")

Results

|                                            sentences | class        |
|------------------------------------------------------+--------------+
| A total of 10 adult daily smokers who reported at... | PARTICIPANTS |
| When carbamazepine is withdrawn from the combinat... | CONCLUSIONS  |

Model Information

Model Name: classifierdl_pico_biobert
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Dependencies: biobert_pubmed_base_cased

Data Source

Trained on a custom dataset derived from PICO classification dataset.

Benchmarking

precision    recall  f1-score   support

AIMS     0.9229    0.9186    0.9207      7815
CONCLUSIONS     0.8556    0.8401    0.8478      8837
DESIGN_SETTING     0.8556    0.7494    0.7990     11551
FINDINGS     0.8949    0.9342    0.9142     18827
INTERVENTION     0.6866    0.7508    0.7173      4920
MEASUREMENTS     0.7564    0.8664    0.8077      6505
PARTICIPANTS     0.8483    0.7559    0.7994      5539

accuracy                         0.8495     63994
macro avg     0.8315    0.8308    0.8294     63994
weighted avg     0.8517    0.8495    0.8491     63994