package resolution

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class BigFoundData(distance: Double, probability: Double, code: String, trained: Array[String], normalized: String) extends Product with Serializable
  2. class ChunkEntityResolverApproach extends AnnotatorApproach[ChunkEntityResolverModel] with ResolverParams with HasCaseSensitiveProperties with Licensed

    Contains all the parameters and methods to train a ChunkEntityResolverModel.

    Contains all the parameters and methods to train a ChunkEntityResolverModel. It transform a dataset with two Input Annotations of types TOKEN and WORD_EMBEDDINGS, coming from e.g. ChunkTokenizer and ChunkEmbeddings Annotators and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)

    To use pretrained models please use ChunkEntityResolverModel and see the Models Hub for available models.

    Example

    Training a SNOMED model

    Define pre-processing pipeline for training data. It needs consists of columns for the normalized training data and their labels.

    val document = new DocumentAssembler()
      .setInputCol("normalized_text")
      .setOutputCol("document")
    
    val chunk = new Doc2Chunk()
      .setInputCols("document")
      .setOutputCol("chunk")
    
    val token = new Tokenizer()
      .setInputCols("document")
      .setOutputCol("token")
    
    val embeddings = WordEmbeddingsModel.pretrained("embeddings_healthcare_100d", "en", "clinical/models")
      .setInputCols("document", "token")
      .setOutputCol("embeddings")
    
    val chunkEmb = new ChunkEmbeddings()
          .setInputCols("chunk", "embeddings")
          .setOutputCol("chunk_embeddings")
    
    val snomedTrainingPipeline = new Pipeline().setStages(Array(
      document,
      chunk,
      token,
      embeddings,
      chunkEmb
    ))
    
    val snomedTrainingModel = snomedTrainingPipeline.fit(data)
    
    val snomedData = snomedTrainingModel.transform(data).cache()

    Then the Resolver can be trained with

    val snomedExtractor = new ChunkEntityResolverApproach()
      .setInputCols("token", "chunk_embeddings")
      .setOutputCol("recognized")
      .setNeighbours(1000)
      .setAlternatives(25)
      .setNormalizedCol("normalized_text")
      .setLabelCol("label")
      .setEnableWmd(true).setEnableTfidf(true).setEnableJaccard(true)
      .setEnableSorensenDice(true).setEnableJaroWinkler(true).setEnableLevenshtein(true)
      .setDistanceWeights(Array(1, 2, 2, 1, 1, 1))
      .setAllDistancesMetadata(true)
      .setPoolingStrategy("MAX")
      .setThreshold(1e32)
    val model = snomedExtractor.fit(snomedData)
    See also

    ChunkEntityResolverModel

    SentenceEntityResolverApproach for sentence level embeddings

  3. class ChunkEntityResolverModel extends AnnotatorModel[ChunkEntityResolverModel] with ResolverParams with HasStorageModel with HasEmbeddingsProperties with HasCaseSensitiveProperties with Licensed with HasSimpleAnnotate[ChunkEntityResolverModel]

    Contains all the parameters to transform a dataset with two Input Annotations of types TOKEN and WORD_EMBEDDINGS, coming from ChunkTokenizer and ChunkEmbeddings Annotators and return the Normalized Entity for a particular trained ontology / curated dataset (e.g.

    Contains all the parameters to transform a dataset with two Input Annotations of types TOKEN and WORD_EMBEDDINGS, coming from ChunkTokenizer and ChunkEmbeddings Annotators and return the Normalized Entity for a particular trained ontology / curated dataset (e.g. ICD-10, RxNorm, SNOMED etc).

    For available pretrained models please see the Models Hub.

    Example

    Using pretrained models for SNOMED

    First the prior steps of the pipeline are defined. Output of types TOKEN and WORD_EMBEDDINGS are needed.

    val data = Seq(("A 63-year-old man presents to the hospital ...")).toDF("text")
    val docAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("document")
    val sentenceDetector = new SentenceDetector().setInputCols("document").setOutputCol("sentence")
    val tokenizer = new Tokenizer().setInputCols("sentence").setOutputCol("token")
    val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
      .setInputCols("sentence", "token")
      .setOutputCol("word_embeddings")
    val icdo_ner = MedicalNerModel.pretrained("ner_bionlp", "en", "clinical/models")
      .setInputCols("sentence", "token", "word_embeddings")
      .setOutputCol("icdo_ner")
    val icdo_chunk = new NerConverter().setInputCols("sentence","token","icdo_ner").setOutputCol("icdo_chunk").setWhiteList("Cancer")
    val icdo_chunk_embeddings = new ChunkEmbeddings()
      .setInputCols("icdo_chunk", "word_embeddings")
      .setOutputCol("icdo_chunk_embeddings")
    val icdo_chunk_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_icdo_clinical", "en", "clinical/models")
      .setInputCols("token","icdo_chunk_embeddings")
      .setOutputCol("tm_icdo_code")
    val clinical_ner = MedicalNerModel.pretrained("ner_clinical", "en", "clinical/models")
    .setInputCols("sentence", "token", "word_embeddings")
    .setOutputCol("ner")
    val ner_converter = new NerConverter()
    .setInputCols("sentence", "token", "ner")
    .setOutputCol("ner_chunk")
    val ner_chunk_tokenizer = new ChunkTokenizer()
      .setInputCols("ner_chunk")
      .setOutputCol("ner_token")
    val ner_chunk_embeddings = new ChunkEmbeddings()
      .setInputCols("ner_chunk", "word_embeddings")
      .setOutputCol("ner_chunk_embeddings")

    Definition of the SNOMED Resolution

    val ner_snomed_resolver = ChunkEntityResolverModel.pretrained("chunkresolve_snomed_findings_clinical","en","clinical/models")
        .setInputCols("ner_token","ner_chunk_embeddings").setOutputCol("snomed_result")
    val pipelineFull = new Pipeline().setStages(Array(
        docAssembler,
        sentenceDetector,
        tokenizer,
        word_embeddings,
    
        clinical_ner,
        ner_converter,
        ner_chunk_embeddings,
        ner_chunk_tokenizer,
        ner_snomed_resolver,
    
        icdo_ner,
        icdo_chunk,
        icdo_chunk_embeddings,
        icdo_chunk_resolver
    ))
    val pipelineModelFull = pipelineFull.fit(data)
    val result = pipelineModelFull.transform(data).cache()

    Show results

    result.selectExpr("explode(snomed_result)")
      .selectExpr(
        "col.metadata.target_text",
        "col.metadata.resolved_text",
        "col.metadata.confidence",
        "col.metadata.all_k_results",
        "col.metadata.all_k_resolutions")
      .filter($"confidence" > 0.2).show(5)
    +--------------------+--------------------+----------+--------------------+--------------------+
    |         target_text|       resolved_text|confidence|       all_k_results|   all_k_resolutions|
    +--------------------+--------------------+----------+--------------------+--------------------+
    |hypercholesterolemia|Hypercholesterolemia|    0.2524|13644009:::267432...|Hypercholesterole...|
    |                 CBC|             Neocyte|    0.4980|259680000:::11573...|Neocyte:::Blood g...|
    |                CD38|       Hypoviscosity|    0.2560|47872005:::370970...|Hypoviscosity:::E...|
    |           platelets| Increased platelets|    0.5267|6631009:::2596800...|Increased platele...|
    |                CD38|       Hypoviscosity|    0.2560|47872005:::370970...|Hypoviscosity:::E...|
    +--------------------+--------------------+----------+--------------------+--------------------+
    See also

    ChunkEntityResolverApproach on how to train your own model

    SentenceEntityResolverModel for sentence level embeddings

  4. case class DistanceResult(distance: Double, weightedDistance: Double) extends Product with Serializable

    Class that contains distance in both representations: weighted and non-weighted, for using later in DistancePooling

  5. class JDataReader extends AnyRef
  6. case class JTreeComponent(embeddings: Array[Float], data: JTreeData) extends Product with Serializable
  7. case class JTreeData(code: String, trained: Array[String], normalized: String) extends Product with Serializable
  8. class JTreeReader extends StorageReader[JTreeComponent]
  9. class JTreeWriter extends StorageBatchWriter[JTreeComponent]
  10. trait ReadablePretrainedBigChunkEntityResolver extends StorageReadable[BigChunkEntityResolverModel] with HasPretrained[BigChunkEntityResolverModel] with EvalEntityResolver
  11. trait ReadablePretrainedChunkEntityResolver extends ParamsAndFeaturesReadable[ChunkEntityResolverModel] with HasPretrained[ChunkEntityResolverModel] with EvalEntityResolver
  12. trait ReadablePretrainedSentenceEntityResolver extends ParamsAndFeaturesReadable[SentenceEntityResolverModel] with HasPretrained[SentenceEntityResolverModel] with EvalEntityResolver
  13. class SentenceEntityResolverApproach extends AnnotatorApproach[SentenceEntityResolverModel] with SentenceResolverParams with HasCaseSensitiveProperties with Licensed

    Contains all the parameters and methods to train a SentenceEntityResolverModel.

    Contains all the parameters and methods to train a SentenceEntityResolverModel. The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)

    To use pretrained models please use SentenceEntityResolverModel and see the Models Hub for available models.

    Example

    Training a SNOMED resolution model using BERT sentence embeddings

    Define pre-processing pipeline for training data. It needs consists of columns for the normalized training data and their labels.

    val documentAssembler = new DocumentAssembler()
       .setInputCol("normalized_text")
       .setOutputCol("document")
     val bertEmbeddings = BertSentenceEmbeddings.pretrained("sent_biobert_pubmed_base_cased")
       .setInputCols("sentence")
       .setOutputCol("bert_embeddings")
     val snomedTrainingPipeline = new Pipeline().setStages(Array(
       documentAssembler,
       bertEmbeddings
     ))
     val snomedTrainingModel = snomedTrainingPipeline.fit(data)
     val snomedData = snomedTrainingModel.transform(data).cache()

    Then the Resolver can be trained with

    val bertExtractor = new SentenceEntityResolverApproach()
      .setNeighbours(25)
      .setThreshold(1000)
      .setInputCols("bert_embeddings")
      .setNormalizedCol("normalized_text")
      .setLabelCol("label")
      .setOutputCol("snomed_code")
      .setDistanceFunction("EUCLIDIAN")
      .setCaseSensitive(false)
    
    val snomedModel = bertExtractor.fit(snomedData)
    See also

    SentenceEntityResolverModel

  14. class SentenceEntityResolverModel extends AnnotatorModel[SentenceEntityResolverModel] with SentenceResolverParams with HasStorageModel with HasEmbeddingsProperties with HasCaseSensitiveProperties with HasSimpleAnnotate[SentenceEntityResolverModel] with Licensed

    The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g.

    The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)

    To use pretrained models please see the Models Hub for available models.

    Example

    Resolving CPT

    First define pipeline stages to extract entities

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    val sentenceDetector = SentenceDetectorDLModel.pretrained()
      .setInputCols("document")
      .setOutputCol("sentence")
    val tokenizer = new Tokenizer()
      .setInputCols("sentence")
      .setOutputCol("token")
    val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
      .setInputCols("sentence", "token")
      .setOutputCol("embeddings")
    val clinical_ner = MedicalNerModel.pretrained("jsl_ner_wip_clinical", "en", "clinical/models")
      .setInputCols("sentence", "token", "embeddings")
      .setOutputCol("ner")
    val ner_converter = new NerConverter()
      .setInputCols("sentence", "token", "ner")
      .setOutputCol("ner_chunk")
      .setWhiteList("Test","Procedure")
    val c2doc = new Chunk2Doc()
      .setInputCols("ner_chunk")
      .setOutputCol("ner_chunk_doc")
    val sbert_embedder = BertSentenceEmbeddings
      .pretrained("sbiobert_base_cased_mli","en","clinical/models")
      .setInputCols("ner_chunk_doc")
      .setOutputCol("sbert_embeddings")

    Then the resolver is defined on the extracted entities and sentence embeddings

    val cpt_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_cpt_procedures_augmented","en", "clinical/models")
      .setInputCols("ner_chunk", "sbert_embeddings")
      .setOutputCol("cpt_code")
      .setDistanceFunction("EUCLIDEAN")
    val sbert_pipeline_cpt = new Pipeline().setStages(Array(
      documentAssembler,
      sentenceDetector,
      tokenizer,
      word_embeddings,
      clinical_ner,
      ner_converter,
      c2doc,
      sbert_embedder,
      cpt_resolver))

    Show results

    sbert_outputs
      .select("explode(arrays_zip(ner_chunk.result ,ner_chunk.metadata, cpt_code.result, cpt_code.metadata, ner_chunk.begin, ner_chunk.end)) as cpt_code")
      .selectExpr(
        "cpt_code['0'] as chunk",
        "cpt_code['1'].entity as entity",
        "cpt_code['2'] as code",
        "cpt_code['3'].confidence as confidence",
        "cpt_code['3'].all_k_resolutions as all_k_resolutions",
        "cpt_code['3'].all_k_results as all_k_results"
      ).show(5)
    +--------------------+---------+-----+----------+--------------------+--------------------+
    |               chunk|   entity| code|confidence|   all_k_resolutions|         all_k_codes|
    +--------------------+---------+-----+----------+--------------------+--------------------+
    |          heart cath|Procedure|93566|    0.1180|CCA - Cardiac cat...|93566:::62319:::9...|
    |selective coronar...|     Test|93460|    0.1000|Coronary angiogra...|93460:::93458:::9...|
    |common femoral an...|     Test|35884|    0.1808|Femoral artery by...|35884:::35883:::3...|
    |   StarClose closure|Procedure|33305|    0.1197|Heart closure:::H...|33305:::33300:::3...|
    |         stress test|     Test|93351|    0.2795|Cardiovascular st...|93351:::94621:::9...|
    +--------------------+---------+-----+----------+--------------------+--------------------+
    See also

    SentenceEntityResolverApproach for training a custom model

  15. case class TreeData(code: String, trained: Array[String], normalized: String) extends Product with Serializable
  16. class BigChunkEntityResolverApproach extends AnnotatorApproach[BigChunkEntityResolverModel] with HasStorage with HasStorageReader with Licensed

    This class is deprecated.

    This class is deprecated. Please use ChunkEntityResolverApproach instead.

    Annotations
    @deprecated
    Deprecated

    (Since version ) BigChunkEntityResolverApproach is deprecated and will not be supported in the future. Please use ChunkEntityResolverApproach instead.

  17. class BigChunkEntityResolverModel extends AnnotatorModel[BigChunkEntityResolverModel] with HasStorageModel with HasEmbeddingsProperties with Licensed with HasSimpleAnnotate[BigChunkEntityResolverModel]

    This class is deprecated.

    This class is deprecated. Please use ChunkEntityResolverModel instead.

    Annotations
    @deprecated
    Deprecated

    (Since version ) BigChunkEntityResolverModel is deprecated and will not be supported in the future. Please use ChunkEntityResolverModel instead.

Value Members

  1. object BigChunkEntityResolverModel extends ReadablePretrainedBigChunkEntityResolver with Serializable
  2. object ChunkEntityResolverModel extends ReadablePretrainedChunkEntityResolver with Serializable
  3. object ConfidenceFunction

    Helper object to use while setting confidenceFunction parameter

  4. object DistanceFunction

    Helper object to use while setting distanceFunction parameter

  5. object PoolingStrategy

    Helper object to use while setting poolingStrategy parameter

  6. object SentenceEntityResolverModel extends ReadablePretrainedSentenceEntityResolver with Serializable

Ungrouped