Biomarker Text Matcher


Extracts biomarker entities using rule based TextMatcherInternal annotator.

Predicted Entities


Copy S3 URI

How to use

documentAssembler = DocumentAssembler()\

tokenizer = Tokenizer()\

text_matcher = TextMatcherInternalModel.pretrained("biomarker_matcher","en","clinical/models") \
    .setInputCols(["document", "token"])\

mathcer_pipeline = Pipeline().setStages([

data = spark.createDataFrame([["In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD20, CD34, CD38, CD58, CD66c, CD123, HLA-DR, cCD79a, and TdT on flow cytometry. Measurements of serum tumor markers showed elevated level of cytokeratin 19 fragment (Cyfra21-1: 4.77 ng/mL), neuron-specific enolase (NSE: 19.60 ng/mL), and squamous cell carcinoma antigen (SCCA: 2.58 ng/mL)."]]).toDF("text")

matcher_model =
result = matcher_model.transform(data)
val documentAssembler = new DocumentAssembler()
val tokenizer = new Tokenizer()
val text_matcher = TextMatcherInternalModel.pretrained("biomarker_matcher","en","clinical/models")
val mathcer_pipeline = new Pipeline()
val data = Seq("In the bone- marrow (BM) aspiration, blasts accounted for 88.1% of ANCs, which were positive for CD20, CD34, CD38, CD58, CD66c, CD123, HLA-DR, cCD79a, and TdT on flow cytometry. Measurements of serum tumor markers showed elevated level of cytokeratin 19 fragment (Cyfra21-1: 4.77 ng/mL), neuron-specific enolase (NSE: 19.60 ng/mL), and squamous cell carcinoma antigen (SCCA: 2.58 ng/mL).") .toDF("text")
val matcher_model =
val result = matcher_model.transform(data)


|                          chunk|begin|end|    label|
|                           CD20|   97|100|Biomarker|
|                           CD34|  103|106|Biomarker|
|                           CD38|  109|112|Biomarker|
|                           CD58|  115|118|Biomarker|
|                          CD66c|  121|125|Biomarker|
|                          CD123|  128|132|Biomarker|
|                         HLA-DR|  135|140|Biomarker|
|                         cCD79a|  143|148|Biomarker|
|                            TdT|  155|157|Biomarker|
|        cytokeratin 19 fragment|  239|261|Biomarker|
|                      Cyfra21-1|  264|272|Biomarker|
|        neuron-specific enolase|  288|310|Biomarker|
|                            NSE|  313|315|Biomarker|
|squamous cell carcinoma antigen|  336|366|Biomarker|
|                           SCCA|  369|372|Biomarker|

Model Information

Model Name: biomarker_matcher
Compatibility: Healthcare NLP 5.3.0+
License: Licensed
Edition: Official
Input Labels: [document, token]
Output Labels: [matched_text]
Language: en
Size: 26.2 KB
Case sensitive: false