package resolution
- Alphabetic
- Public
- All
Type Members
-
case class
DistanceResult(distance: Double, weightedDistance: Double) extends Product with Serializable
Class that contains distance in both representations: weighted and non-weighted, for using later in DistancePooling
- class JDataReader extends AnyRef
- case class JTreeComponent(embeddings: Array[Float], data: JTreeData) extends Product with Serializable
- case class JTreeData(code: String, trained: Array[String], normalized: String) extends Product with Serializable
- class JTreeReader extends StorageReader[JTreeComponent]
- class JTreeWriter extends StorageBatchWriter[JTreeComponent]
- trait ReadablePretrainedSentenceEntityResolver extends ParamsAndFeaturesReadable[SentenceEntityResolverModel] with HasPretrained[SentenceEntityResolverModel] with EvalEntityResolver
-
class
Resolution2Chunk extends AnnotatorModel[Resolution2Chunk] with HasSimpleAnnotate[Resolution2Chunk]
This annotator converts 'Resolution' type annotations into 'CHUNK' type to create new chunk-type column, compatible with annotators that use chunk type as input.
This annotator converts 'Resolution' type annotations into 'CHUNK' type to create new chunk-type column, compatible with annotators that use chunk type as input.
Example
Define dataset
val testDS = Seq( "Has a past history of gastroenteritis and stomach pain, however patient shows no stomach pain now. " + "We don't care about gastroenteritis here, but we do care about heart failure. " + "Test for asma, no asma.").toDF("text")
Define a pipeline
val documentAssembler = new DocumentAssembler().setInputCol("text").setOutputCol("ner_chunk") val sbert_embedder = BertSentenceEmbeddings .pretrained("sbiobert_base_cased_mli","en","clinical/models" ) .setInputCols(Array("ner_chunk")) .setOutputCol("sentence_embeddings") .setCaseSensitive(false) val resolver = SentenceEntityResolverModel .pretrained("sbiobertresolve_rxnorm_augmented", "en", "clinical/models") .setInputCols(Array("sentence_embeddings")) .setOutputCol("resolve") val resolver2chunk = new Resolution2Chunk() .setInputCols(Array("resolve")) .setOutputCol("chunk")
val pipeline = new Pipeline().setStages(Array(documentAssembler, sbert_embedder, resolver, resolver2chunk)).fit(testDS) val result = pipeline.transform(testDS).selectExpr("chunk.result","chunk.annotatorType").show(false) +---------+-------------+ |result |annotatorType| +---------+-------------+ |[2550737]|[chunk] | +---------+-------------+
- class ResolverMerger extends AnnotatorModel[ResolverMerger] with HasSimpleAnnotate[ResolverMerger] with CheckLicense
-
class
SentenceEntityResolverApproach extends AnnotatorApproach[SentenceEntityResolverModel] with SentenceResolverParams with HasCaseSensitiveProperties with HandleExceptionParams with CheckLicense
Contains all the parameters and methods to train a SentenceEntityResolverModel.
Contains all the parameters and methods to train a SentenceEntityResolverModel. The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)
To use pretrained models please use SentenceEntityResolverModel and see the Models Hub for available models.
Example
Training a SNOMED resolution model using BERT sentence embeddings
Define pre-processing pipeline for training data. It needs consists of columns for the normalized training data and their labels.
val documentAssembler = new DocumentAssembler() .setInputCol("normalized_text") .setOutputCol("document") val bertEmbeddings = BertSentenceEmbeddings.pretrained("sent_biobert_pubmed_base_cased") .setInputCols("sentence") .setOutputCol("bert_embeddings") val snomedTrainingPipeline = new Pipeline().setStages(Array( documentAssembler, bertEmbeddings )) val snomedTrainingModel = snomedTrainingPipeline.fit(data) val snomedData = snomedTrainingModel.transform(data).cache()
Then the Resolver can be trained with
val bertExtractor = new SentenceEntityResolverApproach() .setNeighbours(25) .setThreshold(1000) .setInputCols("bert_embeddings") .setNormalizedCol("normalized_text") .setLabelCol("label") .setOutputCol("snomed_code") .setDistanceFunction("EUCLIDIAN") .setCaseSensitive(false) val snomedModel = bertExtractor.fit(snomedData)
- See also
-
class
SentenceEntityResolverModel extends AnnotatorModel[SentenceEntityResolverModel] with SentenceResolverParams with HasStorageModel with HasEmbeddingsProperties with HasCaseSensitiveProperties with HasSimpleAnnotate[SentenceEntityResolverModel] with HandleExceptionParams with HasSafeAnnotate[SentenceEntityResolverModel] with CheckLicense
The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g.
The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)
To use pretrained models please see the Models Hub for available models.
Example
Resolving CPT
First define pipeline stages to extract entities
val documentAssembler = new DocumentAssembler() .setInputCol("text") .setOutputCol("document") val sentenceDetector = SentenceDetectorDLModel.pretrained() .setInputCols("document") .setOutputCol("sentence") val tokenizer = new Tokenizer() .setInputCols("sentence") .setOutputCol("token") val word_embeddings = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models") .setInputCols("sentence", "token") .setOutputCol("embeddings") val clinical_ner = MedicalNerModel.pretrained("jsl_ner_wip_clinical", "en", "clinical/models") .setInputCols("sentence", "token", "embeddings") .setOutputCol("ner") val ner_converter = new NerConverter() .setInputCols("sentence", "token", "ner") .setOutputCol("ner_chunk") .setWhiteList("Test","Procedure") val c2doc = new Chunk2Doc() .setInputCols("ner_chunk") .setOutputCol("ner_chunk_doc") val sbert_embedder = BertSentenceEmbeddings .pretrained("sbiobert_base_cased_mli","en","clinical/models") .setInputCols("ner_chunk_doc") .setOutputCol("sbert_embeddings")
Then the resolver is defined on the extracted entities and sentence embeddings
val cpt_resolver = SentenceEntityResolverModel.pretrained("sbiobertresolve_cpt_procedures_augmented","en", "clinical/models") .setInputCols("sbert_embeddings") .setOutputCol("cpt_code") .setDistanceFunction("EUCLIDEAN") val sbert_pipeline_cpt = new Pipeline().setStages(Array( documentAssembler, sentenceDetector, tokenizer, word_embeddings, clinical_ner, ner_converter, c2doc, sbert_embedder, cpt_resolver))
Show results
sbert_outputs .select("explode(arrays_zip(ner_chunk.result ,ner_chunk.metadata, cpt_code.result, cpt_code.metadata, ner_chunk.begin, ner_chunk.end)) as cpt_code") .selectExpr( "cpt_code['0'] as chunk", "cpt_code['1'].entity as entity", "cpt_code['2'] as code", "cpt_code['3'].confidence as confidence", "cpt_code['3'].all_k_resolutions as all_k_resolutions", "cpt_code['3'].all_k_results as all_k_results" ).show(5) +--------------------+---------+-----+----------+--------------------+--------------------+ | chunk| entity| code|confidence| all_k_resolutions| all_k_codes| +--------------------+---------+-----+----------+--------------------+--------------------+ | heart cath|Procedure|93566| 0.1180|CCA - Cardiac cat...|93566:::62319:::9...| |selective coronar...| Test|93460| 0.1000|Coronary angiogra...|93460:::93458:::9...| |common femoral an...| Test|35884| 0.1808|Femoral artery by...|35884:::35883:::3...| | StarClose closure|Procedure|33305| 0.1197|Heart closure:::H...|33305:::33300:::3...| | stress test| Test|93351| 0.2795|Cardiovascular st...|93351:::94621:::9...| +--------------------+---------+-----+----------+--------------------+--------------------+
- See also
SentenceEntityResolverApproach for training a custom model
- case class TreeData(code: String, trained: Array[String], normalized: String) extends Product with Serializable
Value Members
-
object
ConfidenceFunction
Helper object to use while setting
confidenceFunction
parameter -
object
DistanceFunction
Helper object to use while setting
distanceFunction
parameter -
object
PoolingStrategy
Helper object to use while setting
poolingStrategy
parameter - object Resolution2Chunk extends DefaultParamsReadable[Resolution2Chunk] with Serializable
- object ResolverMerger extends ParamsAndFeaturesReadable[ResolverMerger] with Serializable
- object SentenceEntityResolverModel extends ReadablePretrainedSentenceEntityResolver with Serializable