Packages

package ner

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class IOBTagger extends AnnotatorModel[IOBTagger] with CheckLicense with HasSimpleAnnotate[IOBTagger]

    Merges token tags and NER labels from chunks in the specified format.

    Merges token tags and NER labels from chunks in the specified format. For example output columns as inputs from NerConverter and Tokenizer can be used to merge.

    Example

    Pipeline stages are defined where NER is done. NER is converted to chunks.

    val data = Seq(("A 63-year-old man presents to the hospital ...")).toDF("text")
    val docAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val sentenceDetector = new SentenceDetector().setInputCols("document").setOutputCol("sentence")
    val tokenizer = new Tokenizer().setInputCols("sentence").setOutputCol("token")
    val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models").setOutputCol("embs")
    val nerModel = MedicalNerModel.pretrained("ner_jsl", "en", "clinical/models").setInputCols("sentence", "token", "embs").setOutputCol("ner")
    val nerConverter = new NerConverter().setInputCols("sentence", "token", "ner").setOutputCol("ner_chunk")

    Define the IOB tagger, which needs tokens and chunks as input. Show results.

    val iobTagger = new IOBTagger().setInputCols("token", "ner_chunk").setOutputCol("ner_label")
    val pipeline = new Pipeline().setStages(Array(docAssembler, sentenceDetector, tokenizer, embeddings, nerModel, nerConverter, iobTagger))
    
    result.selectExpr("explode(ner_label) as a")
      .selectExpr("a.begin","a.end","a.result as chunk","a.metadata.word as word")
      .where("chunk!='O'").show(5, false)
    
    +-----+---+-----------+-----------+
    |begin|end|chunk      |word       |
    +-----+---+-----------+-----------+
    |5    |15 |B-Age      |63-year-old|
    |17   |19 |B-Gender   |man        |
    |64   |72 |B-Modifier |recurrent  |
    |98   |107|B-Diagnosis|cellulitis |
    |110  |119|B-Diagnosis|pneumonias |
    +-----+---+-----------+-----------+
    See also

    Tokenizer

    MedicalNerModel

    NerConverter

  2. class MedicalNerApproach extends AnnotatorApproach[MedicalNerModel] with NerApproach[MedicalNerApproach] with Logging with ParamsAndFeaturesWritable with CheckLicense
  3. class MedicalNerModel extends AnnotatorModel[MedicalNerModel] with HasBatchedAnnotate[MedicalNerModel] with WriteTensorflowModel with HasStorageRef with ParamsAndFeaturesWritable with CheckLicense
  4. case class NamedEntityConfidence(start: Int, end: Int, entity: String, text: String, sentenceId: String, confidence: Option[Float]) extends Product with Serializable
  5. class NerChunker extends AnnotatorModel[NerChunker] with HasSimpleAnnotate[NerChunker]

    Extracts phrases that fits into a known pattern using the NER tags.

    Extracts phrases that fits into a known pattern using the NER tags. Useful for entity groups with neighboring tokens when there is no pretrained NER model to address certain issues. A Regex needs to be provided to extract the tokens between entities.

    Example

    Defining pipeline stages for NER

    val data= Seq("She has cystic cyst on her kidney.").toDF("text")
    
    val documentAssembler=new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val sentenceDetector=new SentenceDetector()
      .setInputCols("document")
      .setOutputCol("sentence")
      .setUseAbbreviations(false)
    
    val tokenizer=new Tokenizer()
      .setInputCols(Array("sentence"))
      .setOutputCol("token")
    
    val embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
      .setInputCols("sentence","token")
      .setOutputCol("embeddings")
      .setCaseSensitive(false)
    
    val ner = MedicalNerModel.pretrained("ner_radiology", "en", "clinical/models")
      .setInputCols("sentence","token","embeddings")
      .setOutputCol("ner")
      .setIncludeConfidence(true)

    Define the NerChunker to combine to chunks

    val chunker = new NerChunker()
      .setInputCols(Array("sentence","ner"))
      .setOutputCol("ner_chunk")
      .setRegexParsers(Array("<ImagingFindings>.*<BodyPart>"))
    
    val pipeline=new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      embeddings,
      ner,
      chunker
    ))
    
    val result = pipeline.fit(data).transform(data)

    Show results:

    result.selectExpr("explode(arrays_zip(ner.metadata , ner.result))")
      .selectExpr("col['0'].word as word" , "col['1'] as ner").show(truncate=false)
    +------+-----------------+
    |word  |ner              |
    +------+-----------------+
    |She   |O                |
    |has   |O                |
    |cystic|B-ImagingFindings|
    |cyst  |I-ImagingFindings|
    |on    |O                |
    |her   |O                |
    |kidney|B-BodyPart       |
    |.     |O                |
    +------+-----------------+
    
    result.select("ner_chunk.result").show(truncate=false)
    +---------------------------+
    |result                     |
    +---------------------------+
    |[cystic cyst on her kidney]|
    +---------------------------+
  6. class NerConverterInternal extends AnnotatorModel[NerConverterInternal] with HasSimpleAnnotate[NerConverterInternal]

    Converts a IOB or IOB2 representation of NER to a user-friendly one, by associating the tokens of recognized entities and their label.

    Converts a IOB or IOB2 representation of NER to a user-friendly one, by associating the tokens of recognized entities and their label. Chunks with no associated entity (tagged "O") are filtered. See also Inside–outside–beginning (tagging) for more information.

    Example

    The output of a MedicalNerModel follows the Annotator schema and looks like this after the transformation.

    result.selectExpr("explode(ner_result)").show(5, false)
    +--------------------------------------------------------------------------+
    |col                                                                       |
    +--------------------------------------------------------------------------+
    |{named_entity, 3, 3, O, {word -> A, confidence -> 0.994}, []}             |
    |{named_entity, 5, 15, B-Age, {word -> 63-year-old, confidence -> 1.0}, []}|
    |{named_entity, 17, 19, B-Gender, {word -> man, confidence -> 0.9858}, []} |
    |{named_entity, 21, 28, O, {word -> presents, confidence -> 0.9952}, []}   |
    |{named_entity, 30, 31, O, {word -> to, confidence -> 0.7063}, []}         |
    +--------------------------------------------------------------------------+

    After the converter is used:

    result.selectExpr("explode(ner_converter_result)").show(5, false)
    +-----------------------------------------------------------------------------------+
    |col                                                                                |
    +-----------------------------------------------------------------------------------+
    |{chunk, 5, 15, 63-year-old, {entity -> Age, sentence -> 0, chunk -> 0}, []}        |
    |{chunk, 17, 19, man, {entity -> Gender, sentence -> 0, chunk -> 1}, []}            |
    |{chunk, 64, 72, recurrent, {entity -> Modifier, sentence -> 0, chunk -> 2}, []}    |
    |{chunk, 98, 107, cellulitis, {entity -> Diagnosis, sentence -> 0, chunk -> 3}, []} |
    |{chunk, 110, 119, pneumonias, {entity -> Diagnosis, sentence -> 0, chunk -> 4}, []}|
    +-----------------------------------------------------------------------------------+
    See also

    MedicalNerModel

  7. trait ReadablePretrainedMedicalNer extends ParamsAndFeaturesReadable[MedicalNerModel] with HasPretrained[MedicalNerModel]
  8. trait ReadsMedicalNerGraph extends ParamsAndFeaturesReadable[MedicalNerModel] with ReadTensorflowModel

Value Members

  1. object IOBTagger extends ParamsAndFeaturesReadable[IOBTagger] with Serializable
  2. object MedicalNerApproach extends DefaultParamsReadable[MedicalNerApproach] with WithGraphResolver with Serializable
  3. object MedicalNerModel extends ReadablePretrainedMedicalNer with ReadsMedicalNerGraph with Serializable
  4. object NerChunker extends DefaultParamsReadable[Chunker] with Serializable
  5. object NerConverterInternal extends ParamsAndFeaturesReadable[NerConverterInternal] with Serializable
  6. object NerTaggedInternal
  7. object NerTagsEncodingInternal

    Works with different NER representations as tags Supports: IOB and IOB2 https://en.wikipedia.org/wiki/Inside%E2%80%93outside%E2%80%93beginning_(tagging)

Ungrouped