Explain Clinical Document - Biomarker

Description

This specialized biomarker pipeline can;

  • extract biomarker entities,

  • classify sentences whether they contain biomarker entities or not,

  • establish relations between the extracted biomarker and biomarker results from the clinical documents.

In this pipeline, two NER, one text matcher, one sentence classifier, and one relation extraction model were employed to accomplish the designated tasks.

  • Clinical Entity Labels: Biomarker, Biomarker_Result

  • Relation Extraction Labels: is_finding_of

  • Classification Model Labels: 1, 0

Copy S3 URI

How to use


from sparknlp.pretrained import PretrainedPipeline

biomarker_pipeline = PretrainedPipeline("explain_clinical_doc_biomarker", "en", "clinical/models")

result = biomarker_pipeline.fullAnnotate("""In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD9 and CD10 on flow cytometry. Measurements of serum tumor markers showed elevated level of Cyfra21-1: 4.77 ng/mL, NSE: 19.60 ng/mL, and SCCA: 2.58 ng/mL. Immunohistochemical staining showed positive staining for CK5/6, P40, and negative staining for TTF-1 and weakly positive staining for ALK.""")


import com.johnsnowlabs.nlp.pretrained.PretrainedPipeline

val biomarker_pipeline = PretrainedPipeline("explain_clinical_doc_biomarker", "en", "clinical/models")

val result = biomarker_pipeline.fullAnnotate("""In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD9 and CD10 on flow cytometry. Measurements of serum tumor markers showed elevated level of Cyfra21-1: 4.77 ng/mL, NSE: 19.60 ng/mL, and SCCA: 2.58 ng/mL. Immunohistochemical staining showed positive staining for CK5/6, P40, and negative staining for TTF-1 and weakly positive staining for ALK.""")

Results

# NER Result
+------------------------+-----+---+----------------+
|chunk                   |begin|end|label           |
+------------------------+-----+---+----------------+
|positive                |84   |91 |Biomarker_Result|
|CD9                     |97   |99 |Biomarker       |
|CD10                    |105  |108|Biomarker       |
|tumor markers           |151  |163|Biomarker       |
|elevated level          |172  |185|Biomarker_Result|
|Cyfra21-1               |190  |198|Biomarker       |
|4.77 ng/mL              |201  |210|Biomarker_Result|
|NSE                     |213  |215|Biomarker       |
|19.60 ng/mL             |218  |228|Biomarker_Result|
|SCCA                    |235  |238|Biomarker       |
|2.58 ng/mL              |241  |250|Biomarker_Result|
|positive                |289  |296|Biomarker_Result|
|CK5/6                   |311  |315|Biomarker       |
|P40                     |318  |320|Biomarker       |
|negative                |327  |334|Biomarker_Result|
|TTF-1                   |349  |353|Biomarker       |
|weakly positive staining|359  |382|Biomarker_Result|
|ALK                     |388  |390|Biomarker       |
+------------------------+-----+---+----------------+

# RE Result
|    |   sentence |   entity1_begin |   entity1_end | chunk1                   | entity1          |   entity2_begin |   entity2_end | chunk2         | entity2          | relation      |   confidence |
|---:|-----------:|----------------:|--------------:|:-------------------------|:-----------------|----------------:|--------------:|:---------------|:-----------------|:--------------|-------------:|
|  0 |          0 |              84 |            91 | positive                 | Biomarker_Result |              97 |            99 | CD9            | Biomarker        | is_finding_of |     0.993281 |
|  1 |          0 |              84 |            91 | positive                 | Biomarker_Result |             105 |           108 | CD10           | Biomarker        | is_finding_of |     0.998891 |
|  2 |          1 |             151 |           163 | tumor markers            | Biomarker        |             172 |           185 | elevated level | Biomarker_Result | is_finding_of |     0.900508 |
|  6 |          1 |             172 |           185 | elevated level           | Biomarker_Result |             190 |           198 | Cyfra21-1      | Biomarker        | is_finding_of |     0.995038 |
|  9 |          1 |             190 |           198 | Cyfra21-1                | Biomarker        |             201 |           210 | 4.77 ng/mL     | Biomarker_Result | is_finding_of |     0.981873 |
| 10 |          1 |             190 |           198 | Cyfra21-1                | Biomarker        |             218 |           228 | 19.60 ng/mL    | Biomarker_Result | is_finding_of |     0.541739 |
| 14 |          1 |             213 |           215 | NSE                      | Biomarker        |             218 |           228 | 19.60 ng/mL    | Biomarker_Result | is_finding_of |     0.988173 |
| 17 |          1 |             235 |           238 | SCCA                     | Biomarker        |             241 |           250 | 2.58 ng/mL     | Biomarker_Result | is_finding_of |     0.995757 |
| 18 |          2 |             289 |           296 | positive                 | Biomarker_Result |             311 |           315 | CK5/6          | Biomarker        | is_finding_of |     0.866368 |
| 19 |          2 |             289 |           296 | positive                 | Biomarker_Result |             318 |           320 | P40            | Biomarker        | is_finding_of |     0.895999 |
| 26 |          2 |             327 |           334 | negative                 | Biomarker_Result |             349 |           353 | TTF-1          | Biomarker        | is_finding_of |     0.994164 |
| 29 |          2 |             359 |           382 | weakly positive staining | Biomarker_Result |             388 |           390 | ALK            | Biomarker        | is_finding_of |     0.988431 |

# Classification Result
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------+-----+
|sentence_ID|                                                                                                                                   sentence|class|
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------+-----+
|          0|           In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD9 and CD10 on flow cytometry.|    1|
|          1|                Measurements of serum tumor markers showed elevated level of Cyfra21-1: 4.77 ng/mL, NSE: 19.60 ng/mL, and SCCA: 2.58 ng/mL.|    1|
|          2|Immunohistochemical staining showed positive staining for CK5/6, P40, and negative staining for TTF-1 and weakly positive staining for ALK.|    1|
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------+-----+

Model Information

Model Name: explain_clinical_doc_biomarker
Type: pipeline
Compatibility: Healthcare NLP 5.3.0+
License: Licensed
Edition: Official
Language: en
Size: 2.2 GB

Included Models

  • DocumentAssembler
  • SentenceDetectorDLModel
  • TokenizerModel
  • WordEmbeddingsModel
  • MedicalBertForSequenceClassification
  • TextMatcherInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • MedicalNerModel
  • NerConverterInternalModel
  • ChunkMergeModel
  • PerceptronModel
  • DependencyParserModel
  • RelationExtractionModel