PICO Classifier

Description

Classify medical text according to PICO framework.

Predicted Entities

CONCLUSIONS, DESIGN_SETTING, INTERVENTION, PARTICIPANTS, FINDINGS, MEASUREMENTS, AIMS.

Live Demo Open in Colab Download

How to use

document_assembler = DocumentAssembler().setInputCol('text').setOutputCol('document')

tokenizer = Tokenizer().setInputCols('document').setOutputCol('token')

embeddings = BertEmbeddings.pretrained('biobert_pubmed_base_cased')\
    .setInputCols(["document", 'token'])\
    .setOutputCol("word_embeddings")

sentence_embeddings = SentenceEmbeddings() \
      .setInputCols(["document", "word_embeddings"]) \
      .setOutputCol("sentence_embeddings") \
      .setPoolingStrategy("AVERAGE")

classifier = ClassifierDLModel.pretrained('classifierdl_pico_biobert', 'en', 'clinical/models')\
    .setInputCols(['document', 'token', 'sentence_embeddings']).setOutputCol('class')

nlp_pipeline = Pipeline(stages=[document_assembler, tokenizer, embeddings, sentence_embeddings, classifier])

light_pipeline = LightPipeline(nlp_pipeline.fit(spark.createDataFrame([['']]).toDF("text")))

annotations = light_pipeline.fullAnnotate(["""A total of 10 adult daily smokers who reported at least one stressful event and coping episode and provided post-quit data.""", """When carbamazepine is withdrawn from the combination therapy, aripiprazole dose should then be reduced."""])

Results

|                                            sentences | class        |
|------------------------------------------------------+--------------+
| A total of 10 adult daily smokers who reported at... | PARTICIPANTS |
| When carbamazepine is withdrawn from the combinat... | CONCLUSIONS  |

Model Information

Model Name: classifierdl_pico_biobert
Compatibility: Spark NLP 2.7.1+
License: Licensed
Edition: Official
Input Labels: [sentence_embeddings]
Output Labels: [class]
Language: en
Dependencies: biobert_pubmed_base_cased

Data Source

Trained on a custom dataset derived from PICO classification dataset.

Benchmarking

                precision    recall  f1-score   support

          AIMS     0.9229    0.9186    0.9207      7815
   CONCLUSIONS     0.8556    0.8401    0.8478      8837
DESIGN_SETTING     0.8556    0.7494    0.7990     11551
      FINDINGS     0.8949    0.9342    0.9142     18827
  INTERVENTION     0.6866    0.7508    0.7173      4920
  MEASUREMENTS     0.7564    0.8664    0.8077      6505
  PARTICIPANTS     0.8483    0.7559    0.7994      5539

      accuracy                         0.8495     63994
     macro avg     0.8315    0.8308    0.8294     63994
  weighted avg     0.8517    0.8495    0.8491     63994