Spark NLP for Healthcare Release Notes 2.7.4

 

2.7.4

We are glad to announce that Spark NLP for Healthcare 2.7.4 has been released!

Highlights:

  • Introducing a new annotator to extract chunks with NER tags using regex-like patterns: NerChunker.
  • Introducing two new annotators to filter chunks: ChunkFilterer and AssertionFilterer.
  • Ability to change the entity type in NerConverterInternal without using ChunkMerger (setReplaceDict).
  • In DeIdentification model, ability to use faker and static look-up lists at the same time randomly in Obfuscation mode.
  • New De-Identification NER model, augmented with synthetic datasets to detect uppercased name entities.
  • Bug fixes & general improvements.

1. NerChunker:

Similar to what we used to do in POSChunker with POS tags, now we can also extract phrases that fits into a known pattern using the NER tags. NerChunker would be quite handy to extract entity groups with neighboring tokens when there is no pretrained NER model to address certain issues. Lets say we want to extract clinical findings and body parts together as a single chunk even if there are some unwanted tokens between.

How to use:

ner_model = NerDLModel.pretrained("ner_radiology", "en", "clinical/models")\
    .setInputCols("sentence","token","embeddings")\
    .setOutputCol("ner")

ner_chunker = NerChunker().\
    .setInputCols(["sentence","ner"])\
    .setOutputCol("ner_chunk")\
    .setRegexParsers(["<IMAGINGFINDINGS>*<BODYPART>"])

text = 'She has cystic cyst on her kidney.'

>> ner tags: [(cystic, B-IMAGINGFINDINGS), (cyst,I-IMAGINGFINDINGS), (kidney, B-BODYPART)
>> ner_chunk: ['cystic cyst on her kidney']

2. ChunkFilterer:

ChunkFilterer will allow you to filter out named entities by some conditions or predefined look-up lists, so that you can feed these entities to other annotators like Assertion Status or Entity Resolvers. It can be used with two criteria: isin and regex.

How to use:

ner_model = NerDLModel.pretrained("ner_clinical", "en", "clinical/models")\
      .setInputCols("sentence","token","embeddings")\
      .setOutputCol("ner")

ner_converter = NerConverter() \
      .setInputCols(["sentence", "token", "ner"]) \
      .setOutputCol("ner_chunk")

chunk_filterer = ChunkFilterer()\
      .setInputCols("sentence","ner_chunk")\
      .setOutputCol("chunk_filtered")\
      .setCriteria("isin") \
      .setWhiteList(['severe fever','sore throat'])

text = 'Patient with severe fever, sore throat, stomach pain, and a headache.'

>> ner_chunk: ['severe fever','sore throat','stomach pain','headache']
>> chunk_filtered: ['severe fever','sore throat']

3. AssertionFilterer:

AssertionFilterer will allow you to filter out the named entities by the list of acceptable assertion statuses. This annotator would be quite handy if you want to set a white list for the acceptable assertion statuses like present or conditional; and do not want absent conditions get out of your pipeline.

How to use:

clinical_assertion = AssertionDLModel.pretrained("assertion_dl", "en", "clinical/models") \
  .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
  .setOutputCol("assertion")

assertion_filterer = AssertionFilterer()\
  .setInputCols("sentence","ner_chunk","assertion")\
  .setOutputCol("assertion_filtered")\
  .setWhiteList(["present"])


text = 'Patient with severe fever and sore throat, but no stomach pain.'

>> ner_chunk: ['severe fever','sore throat','stomach pain','headache']
>> assertion_filtered: ['severe fever','sore throat']

Versions

Last updated