c

com.johnsnowlabs.nlp.annotators.resolution

SentenceEntityResolverApproach

class SentenceEntityResolverApproach extends AnnotatorApproach[SentenceEntityResolverModel] with SentenceResolverParams with HasCaseSensitiveProperties with HandleExceptionParams with CheckLicense

Contains all the parameters and methods to train a SentenceEntityResolverModel. The model transforms a dataset with Input Annotation type SENTENCE_EMBEDDINGS, coming from e.g. BertSentenceEmbeddings and returns the normalized entity for a particular trained ontology / curated dataset. (e.g. ICD-10, RxNorm, SNOMED etc.)

To use pretrained models please use SentenceEntityResolverModel and see the Models Hub for available models.

Example

Training a SNOMED resolution model using BERT sentence embeddings

Define pre-processing pipeline for training data. It needs consists of columns for the normalized training data and their labels.

val documentAssembler = new DocumentAssembler()
   .setInputCol("normalized_text")
   .setOutputCol("document")
 val bertEmbeddings = BertSentenceEmbeddings.pretrained("sent_biobert_pubmed_base_cased")
   .setInputCols("sentence")
   .setOutputCol("bert_embeddings")
 val snomedTrainingPipeline = new Pipeline().setStages(Array(
   documentAssembler,
   bertEmbeddings
 ))
 val snomedTrainingModel = snomedTrainingPipeline.fit(data)
 val snomedData = snomedTrainingModel.transform(data).cache()

Then the Resolver can be trained with

val bertExtractor = new SentenceEntityResolverApproach()
  .setNeighbours(25)
  .setThreshold(1000)
  .setInputCols("bert_embeddings")
  .setNormalizedCol("normalized_text")
  .setLabelCol("label")
  .setOutputCol("snomed_code")
  .setDistanceFunction("EUCLIDIAN")
  .setCaseSensitive(false)

val snomedModel = bertExtractor.fit(snomedData)
See also

SentenceEntityResolverModel

Linear Supertypes
CheckLicense, HandleExceptionParams, HasCaseSensitiveProperties, ParamsAndFeaturesWritable, HasFeatures, SentenceResolverParams, AnnotatorApproach[SentenceEntityResolverModel], CanBeLazy, DefaultParamsWritable, MLWritable, HasOutputAnnotatorType, HasOutputAnnotationCol, HasInputAnnotationCols, Estimator[SentenceEntityResolverModel], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. SentenceEntityResolverApproach
  2. CheckLicense
  3. HandleExceptionParams
  4. HasCaseSensitiveProperties
  5. ParamsAndFeaturesWritable
  6. HasFeatures
  7. SentenceResolverParams
  8. AnnotatorApproach
  9. CanBeLazy
  10. DefaultParamsWritable
  11. MLWritable
  12. HasOutputAnnotatorType
  13. HasOutputAnnotationCol
  14. HasInputAnnotationCols
  15. Estimator
  16. PipelineStage
  17. Logging
  18. Params
  19. Serializable
  20. Serializable
  21. Identifiable
  22. AnyRef
  23. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Parameters

  1. val auxLabelCol: Param[String]

    Optional column with one extra label per document.

    Optional column with one extra label per document. This extra label will be outputted later on in an additional column (Default: "aux_label")

  2. val datasetInfo: Param[String]

    Descriptive information about the dataset being used.

    Descriptive information about the dataset being used.

    Definition Classes
    SentenceResolverParams
  3. val doExceptionHandling: BooleanParam

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  4. val dropCodesList: StringArrayParam

    List of codes in a pretrained model to leave out when continue training with new data.

  5. val labelCol: Param[String]

    column name for the value we are trying to resolve (Default: "code")

  6. val normalizedCol: Param[String]

    column name for the original, normalized description

  7. val overrideExistingCodes: BooleanParam

    Whether to override the existing codes with new data while continue the training from a pretrained model.

    Whether to override the existing codes with new data while continue the training from a pretrained model. Default value is false(keep all the codes).

  8. val pretrainedModelPath: Param[String]

    Path to an already trained SentenceEntityResolverModel.

    Path to an already trained SentenceEntityResolverModel.

    This pretrained model will be used as a starting point for training the new one. The path can be a local file path, a distributed file path (HDFS, DBFS), or a cloud storage (S3).

  9. val returnAllKEmbeddings: BooleanParam

    Whether to return all embeddings of all K candidates of the resolution.

    Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected (Default: false)

  10. val returnCosineDistances: BooleanParam

    Whether to calculate and return cosine distances between a sentence and the k closest candidates.

    Whether to calculate and return cosine distances between a sentence and the k closest candidates. Can improve accuracy but increases computation (Default: true)

  11. val returnResolvedTextEmbeddings: BooleanParam

    Whether to include embeddings for resolved text embeddings.(Default : false)

  12. val useAuxLabel: BooleanParam

    Whether to use Aux Label or not (Default: false)

Annotator types

Required input and expected output annotator types

  1. val inputAnnotatorTypes: Array[String]

    Input annotator types: SENTENCE_EMBEDDINGS

    Input annotator types: SENTENCE_EMBEDDINGS

    Definition Classes
    SentenceEntityResolverApproach → HasInputAnnotationCols
  2. val outputAnnotatorType: AnnotatorType

    Output annotator types: ENTITY

    Output annotator types: ENTITY

    Definition Classes
    SentenceEntityResolverApproach → HasOutputAnnotatorType

Members

  1. type AnnotatorType = String
    Definition Classes
    HasOutputAnnotatorType
  1. val auxLabelMap: StructFeature[Map[String, String]]
  2. def beforeTraining(spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  3. val caseSensitive: BooleanParam
    Definition Classes
    HasCaseSensitiveProperties
  4. def checkValidEnvironment(spark: Option[SparkSession], scopes: Seq[String]): Unit
    Definition Classes
    CheckLicense
  5. def checkValidScope(scope: String): Unit
    Definition Classes
    CheckLicense
  6. def checkValidScopeAndEnvironment(scope: String, spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  7. def checkValidScopesAndEnvironment(scopes: Seq[String], spark: Option[SparkSession], checkLp: Boolean): Unit
    Definition Classes
    CheckLicense
  8. final def clear(param: Param[_]): SentenceEntityResolverApproach.this.type
    Definition Classes
    Params
  9. val confidenceFunction: Param[String]
    Definition Classes
    SentenceResolverParams
  10. final def copy(extra: ParamMap): Estimator[SentenceEntityResolverModel]
    Definition Classes
    AnnotatorApproach → Estimator → PipelineStage → Params
  11. val description: String
    Definition Classes
    SentenceEntityResolverApproach → AnnotatorApproach
  12. val distanceFunction: Param[String]

    what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'

    what distance function to use for KNN: 'EUCLIDEAN' or 'COSINE'

    Definition Classes
    SentenceResolverParams
  13. lazy val embeddingsColumnName: String
  14. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  15. def explainParams(): String
    Definition Classes
    Params
  16. def extractAuxLabelMap(dataset: Dataset[_]): Map[String, String]

    Extracts a Map[String,String] where key=label and value=auxLabel from a dataset.

    Extracts a Map[String,String] where key=label and value=auxLabel from a dataset. If either of one columns does not exist, it will return an empty map

    dataset

    from which we extract the column

    returns

    a Map[String,String]

  17. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  18. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  19. val features: ArrayBuffer[Feature[_, _, _]]
    Definition Classes
    HasFeatures
  20. final def fit(dataset: Dataset[_]): SentenceEntityResolverModel
    Definition Classes
    AnnotatorApproach → Estimator
  21. def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[SentenceEntityResolverModel]
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  22. def fit(dataset: Dataset[_], paramMap: ParamMap): SentenceEntityResolverModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  23. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SentenceEntityResolverModel
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  24. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  25. def getAuxLabelMap(): Map[String, String]

    Map[String,String] where key=label and value=auxLabel from a dataset.

  26. def getCaseSensitive: Boolean
    Definition Classes
    HasCaseSensitiveProperties
  27. def getConfidenceFunction: String
    Definition Classes
    SentenceResolverParams
  28. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  29. def getDistanceFunction: String
    Definition Classes
    SentenceResolverParams
  30. def getInputCols: Array[String]
    Definition Classes
    HasInputAnnotationCols
  31. def getLazyAnnotator: Boolean
    Definition Classes
    CanBeLazy
  32. def getMissAsEmpty: Boolean
    Definition Classes
    SentenceResolverParams
  33. def getNeighbours: Int
    Definition Classes
    SentenceResolverParams
  34. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  35. final def getOutputCol: String
    Definition Classes
    HasOutputAnnotationCol
  36. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  37. def getThreshold: Double
    Definition Classes
    SentenceResolverParams
  38. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  39. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  40. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  41. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  42. lazy val labelColumnName: String
  43. val lazyAnnotator: BooleanParam
    Definition Classes
    CanBeLazy
  44. val missAsEmpty: BooleanParam

    whether or not to return an empty annotation on unmatched chunks

    whether or not to return an empty annotation on unmatched chunks

    Definition Classes
    SentenceResolverParams
  45. val neighbours: IntParam

    number of neighbours to consider in the KNN query to calculate WMD

    number of neighbours to consider in the KNN query to calculate WMD

    Definition Classes
    SentenceResolverParams
  46. lazy val normalizedColumnName: String
  47. def onTrained(model: SentenceEntityResolverModel, spark: SparkSession): Unit
    Definition Classes
    AnnotatorApproach
  48. val optionalInputAnnotatorTypes: Array[String]
    Definition Classes
    HasInputAnnotationCols
  49. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  50. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  51. final def set[T](param: Param[T], value: T): SentenceEntityResolverApproach.this.type
    Definition Classes
    Params
  52. def setAuxLabelMap(m: Map[String, String]): SentenceEntityResolverApproach.this.type

    Map[String,String] where key=label and value=auxLabel from a dataset.

  53. def setCaseSensitive(value: Boolean): SentenceEntityResolverApproach.this.type
    Definition Classes
    HasCaseSensitiveProperties
  54. def setConfidenceFunction(v: String): SentenceEntityResolverApproach.this.type
    Definition Classes
    SentenceResolverParams
  55. def setDistanceFunction(value: String): SentenceEntityResolverApproach.this.type
    Definition Classes
    SentenceResolverParams
  56. final def setInputCols(value: String*): SentenceEntityResolverApproach.this.type
    Definition Classes
    HasInputAnnotationCols
  57. def setInputCols(value: Array[String]): SentenceEntityResolverApproach.this.type
    Definition Classes
    HasInputAnnotationCols
  58. def setLazyAnnotator(value: Boolean): SentenceEntityResolverApproach.this.type
    Definition Classes
    CanBeLazy
  59. def setMissAsEmpty(v: Boolean): SentenceEntityResolverApproach.this.type
    Definition Classes
    SentenceResolverParams
  60. def setNeighbours(k: Int): SentenceEntityResolverApproach.this.type
    Definition Classes
    SentenceResolverParams
  61. final def setOutputCol(value: String): SentenceEntityResolverApproach.this.type
    Definition Classes
    HasOutputAnnotationCol
  62. def setReturnResolvedTextEmbeddings(value: Boolean): SentenceEntityResolverApproach.this.type

    Whether to include embeddings for resolved text embeddings.(Default : false)

  63. def setThreshold(dist: Double): SentenceEntityResolverApproach.this.type
    Definition Classes
    SentenceResolverParams
  64. val threshold: DoubleParam

    threshold value for the aggregated distance

    threshold value for the aggregated distance

    Definition Classes
    SentenceResolverParams
  65. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  66. def train(dataset: Dataset[_], recursivePipeline: Option[PipelineModel]): SentenceEntityResolverModel

    Returns the SentenceEntityResolverModel Transformer, that can be used to transform input datasets

    Returns the SentenceEntityResolverModel Transformer, that can be used to transform input datasets

    The dataset provided to the fit method should have one sentence per row and contain the following columns: SentenceEmbeddings, ResolverLabel, ResolverNormalized

    The cardinality of the dataset should not exceed 100.000 data points since searching in such a big KD-tree becomes impractical

    This method is called inside the AnnotatorApproach's fit method

    dataset

    a Dataset containing SentenceEmbeddings, ResolverLabel, ResolverNormalized

    returns

    a trained SentenceEntityResolverModel

    Definition Classes
    SentenceEntityResolverApproach → AnnotatorApproach
  67. final def transformSchema(schema: StructType): StructType
    Definition Classes
    AnnotatorApproach → PipelineStage
  68. val uid: String
    Definition Classes
    SentenceEntityResolverApproach → Identifiable
  69. def write: MLWriter
    Definition Classes
    ParamsAndFeaturesWritable → DefaultParamsWritable → MLWritable

Parameter setters

  1. def setAuxLabelCol(c: String): SentenceEntityResolverApproach.this.type

    Optional column with one extra label per document.

    Optional column with one extra label per document. This extra label will be outputted later on in an additional column

  2. def setDatasetInfo(value: String): SentenceEntityResolverApproach.this.type

    set descriptive information about the dataset being used

    set descriptive information about the dataset being used

    Definition Classes
    SentenceResolverParams
  3. def setDoExceptionHandling(value: Boolean): SentenceEntityResolverApproach.this.type

    If true, exceptions are handled.

    If true, exceptions are handled. If exception causing data is passed to the model, a error annotation is emitted which has the exception message. Processing continues with the next one. This comes with a performance penalty.

    Definition Classes
    HandleExceptionParams
  4. def setDropCodesList(v: Array[String]): SentenceEntityResolverApproach.this.type

    Sets a list of codes that will be left out in a pretrained model when continue training with new data.

  5. def setLabelCol(value: String): SentenceEntityResolverApproach.this.type

    column name for the value we are trying to resolve

  6. def setNormalizedCol(value: String): SentenceEntityResolverApproach.this.type

    column name for the original, normalized description

  7. def setOverrideExistingCodes(v: Boolean): SentenceEntityResolverApproach.this.type

    Whether to override the existing codes with new data while continue the training from a pretrained model.

    Whether to override the existing codes with new data while continue the training from a pretrained model. Default value is false(keep all the codes).

  8. def setPretrainedModelPath(path: String): SentenceEntityResolverApproach.this.type

    Set the location of an already trained SentenceEntityResolverModel, which is used as a starting point for training the new model.

  9. def setReturnAllKEmbeddings(b: Boolean): SentenceEntityResolverApproach.this.type

    Whether to return all embeddings of all K candidates of the resolution.

    Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected

  10. def setReturnCosineDistances(value: Boolean): SentenceEntityResolverApproach.this.type

    Whether to calculate and return cosine distances between a sentence and the k closest candidates.

    Whether to calculate and return cosine distances between a sentence and the k closest candidates. Can improve accuracy but increases computation.

  11. def setUseAuxLabel(b: Boolean): SentenceEntityResolverApproach.this.type

    Whether to use Aux Label or not

Parameter getters

  1. def getAuxLabelCol(): Option[String]

    Optional column with one extra label per document.

    Optional column with one extra label per document. This extra label will be outputted later on in an additional column

  2. def getDatasetInfo: String

    get descriptive information about the dataset being used

    get descriptive information about the dataset being used

    Definition Classes
    SentenceResolverParams
  3. def getLabelCol: String

    column name for the value we are trying to resolve

  4. def getNormalizedCol: String

    column name for the original, normalized description

  5. def getReturnAllKEmbeddings(): Boolean

    Whether to return all embeddings of all K candidates of the resolution.

    Whether to return all embeddings of all K candidates of the resolution. Embeddings will be in the metadata. Increase in RAM usage to be expected

  6. def getReturnCosineDistances: Boolean

    Whether to calculate and return cosine distances between a sentence and the k closest candidates.

    Whether to calculate and return cosine distances between a sentence and the k closest candidates. Can improve accuracy but increases computation.

  7. def getUseAuxLabel(): Boolean

    Whether to use Aux Label or not