re

package re

Ordering

Alphabetic

Visibility

Public
All

Type Members

case class BertREConfig(sentenceStartTokenId: Int = 102, sentenceEndTokenId: Int = 103, entity1StartTokenId: Int = 10, entity1EndTokenId: Int = 11, entity2StartTokenId: Int = 12, entity2EndTokenId: Int = 13, entity1StartTag: String = "e1b", entity1EndTag: String = "e1e", entity2StartTag: String = "e2b", entity2EndTag: String = "e2e") extends Product with Serializable
case class DLRelationInstance(relationType: String, entity1: String, entity2: String, entity1_begin: Int, entity1_end: Int, entity2_begin: Int, entity2_end: Int, chunk1: String, chunk2: String, chunk1_conf: String, chunk2_conf: String, syntactic_distance: String, context: Sentence) extends Product with Serializable
class GenericREModel extends RelationExtractionModel with HasStorageRef with ParamsAndFeaturesWritable with CheckLicense
Instantiated RelationExtractionModel for extracting relationships between any entitites.
Instantiated RelationExtractionModel for extracting relationships between any entitites. This class is not intended to be directly used, please use the RelationExtractionModel instead. Pairs of entitie should be specified using setRelationPairs Please see the Models Hub for available models.

See also
RelationExtractionModel to use the model
class PosologyREModel extends GenericREModel
Instantiated RelationExtractionModel for extracting relationships between different recognized drug entitites.
Instantiated RelationExtractionModel for extracting relationships between different recognized drug entitites. This class is not intended to be directly used, please use the RelationExtractionModel instead. Possible values are "DRUG-DOSAGE", "DRUG-ADE", "DRUG-FORM", "DRUG-FREQUENCY", "DRUG-ROUTE", "DRUG-REASON", "DRUG-STRENGTH", "DRUG-DURATION". Please see the Models Hub for available models.

See also
RelationExtractionModel to use the model
class REDataEncoder extends Serializable

class RENerChunksFilter extends AnnotatorModel[RENerChunksFilter] with HasSimpleAnnotate[RENerChunksFilter] with CheckLicense

Filters entities' dependency relations.

The annotator filters desired relation pairs (defined by the parameter realtionPairs), and store those on the output column. Filtering the possible relations can be useful to perform additional analysis for a specific use case (e.g., checking adverse drug reactions and drug realations), which can be the input for further analysis using a pretrained RelationExtractionDLModel.

For example, the ner_clinical NER model can identify PROBLEM, TEST, and TREATMENT entities. By using this annotator, one can filter (select) the relations between PROBLEM and TREATMENT entities only, removing any relation between the other entities, to further analyze the associations between clinical problems and treatments.

Example

Define pipeline stages to extract entities

val documenter = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentencer = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentences")

val tokenizer = new Tokenizer()
  .setInputCols("sentences")
  .setOutputCol("tokens")

val words_embedder = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("embeddings")

val pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("pos_tags")

val dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en")
  .setInputCols("sentences", "pos_tags", "tokens")
  .setOutputCol("dependencies")

val clinical_ner_tagger = MedicalNerModel.pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
  .setInputCols("sentences", "tokens", "embeddings")
  .setOutputCol("ner_tags")

val ner_chunker = new NerConverter()
  .setInputCols("sentences", "tokens", "ner_tags")
  .setOutputCol("ner_chunks")

Define the relation pairs and the filter

val relationPairs = Array("direction-external_body_part_or_region",
                      "external_body_part_or_region-direction",
                      "direction-internal_organ_or_component",
                      "internal_organ_or_component-direction")

val re_ner_chunk_filter = new RENerChunksFilter()
    .setInputCols("ner_chunks", "dependencies")
    .setOutputCol("re_ner_chunks")
    .setMaxSyntacticDistance(4)
    .setRelationPairs(Array("internal_organ_or_component-direction"))

val trained_pipeline = new Pipeline().setStages(Array(
  documenter,
  sentencer,
  tokenizer,
  words_embedder,
  pos_tagger,
  clinical_ner_tagger,
  ner_chunker,
  dependency_parser,
  re_ner_chunk_filter
))

val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia").toDF("text")
val result = trained_pipeline.fit(data).transform(data)

Show results

result.selectExpr("explode(re_ner_chunks) as re_chunks")
  .selectExpr("re_chunks.begin", "re_chunks.result", "re_chunks.metadata.entity", "re_chunks.metadata.paired_to")
  .show(6, truncate=false)
+-----+-------------+---------------------------+---------+
|begin|result       |entity                     |paired_to|
+-----+-------------+---------------------------+---------+
|35   |upper        |Direction                  |41       |
|41   |brain stem   |Internal_organ_or_component|35       |
|35   |upper        |Direction                  |59       |
|59   |cerebellum   |Internal_organ_or_component|35       |
|35   |upper        |Direction                  |81       |
|81   |basil ganglia|Internal_organ_or_component|35       |
+-----+-------------+---------------------------+---------+

See also: RelationExtractionDLModel for BERT based extraction

trait ReadRelationExtractionDLModelTensorflowModel extends ReadTensorflowModel
trait ReadZeroShotRelationExtractionModel extends ReadTensorflowModel
trait ReadablePretrainedRelationExtractionDLModel extends ParamsAndFeaturesReadable[RelationExtractionDLModel] with HasPretrained[RelationExtractionDLModel]
trait ReadablePretrainedZeroShotRelationExtractionModel extends ParamsAndFeaturesReadable[ZeroShotRelationExtractionModel] with HasPretrained[ZeroShotRelationExtractionModel]
trait RelationEncoding extends AnyRef

class RelationExtractionApproach extends GenericClassifierApproach with HandleExceptionParams

Trains a TensorFlow model for relation extraction.

For pretrained models, see the documentation of RelationExtractionModel.

To train a custom relation extraction model, you need to first create a Tensorflow graph using either the TfGraphBuilder annotator or the tf_graph module. Then, set the path to the Tensorflow graph using the method setModelFile.

If the parameter relationDirectionCol is set, the model will be trained using the direction information (see the parameter decription for details). Otherwise, the model won't have direction between the relation of the entities. After training a model (using the .fit() method), the resulting object is of class RelationExtractionModel.

Example

Defining pipeline stages to extract entities first

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols(Array("document"))
  .setOutputCol("tokens")

val embedder = WordEmbeddingsModel
  .pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("document", "tokens"))
  .setOutputCol("embeddings")

val posTagger = PerceptronModel
  .pretrained("pos_clinical", "en", "clinical/models")
  .setInputCols(Array("document", "tokens"))
  .setOutputCol("posTags")

val nerTagger = MedicalNerModel
  .pretrained("ner_events_clinical", "en", "clinical/models")
  .setInputCols(Array("document", "tokens", "embeddings"))
  .setOutputCol("ner_tags")

val nerConverter = new NerConverter()
  .setInputCols(Array("document", "tokens", "ner_tags"))
  .setOutputCol("nerChunks")

val depencyParser = DependencyParserModel
  .pretrained("dependency_conllu", "en")
  .setInputCols(Array("document", "posTags", "tokens"))
  .setOutputCol("dependencies")

Then define RelationExtractionApproach and training parameters

val re = new RelationExtractionApproach()
  .setInputCols(Array("embeddings", "posTags", "train_ner_chunks", "dependencies"))
  .setOutputCol("relations_t")
  .setLabelColumn("target_rel")
  .setEpochsNumber(300)
  .setBatchSize(200)
  .setlearningRate(0.001f)
  .setModelFile("path/to/graph_file.pb")
  .setFixImbalance(true)
  .setValidationSplit(0.05f)
  .setFromEntity("from_begin", "from_end", "from_label")
  .setToEntity("to_begin", "to_end", "to_label")

val finisher = new Finisher()
  .setInputCols(Array("relations_t"))
  .setOutputCols(Array("relations"))
  .setCleanAnnotations(false)
  .setValueSplitSymbol(",")
  .setAnnotationSplitSymbol(",")
  .setOutputAsArray(false)

Define complete pipeline and start training

val pipeline = new Pipeline()
  .setStages(Array(
    documentAssembler,
    tokenizer,
    embedder,
    posTagger,
    nerTagger,
    nerConverter,
    depencyParser,
    re,
    finisher))

val model = pipeline.fit(trainData)

See also: RelationExtractionModel for pretrained models and how to use it

class RelationExtractionDLModel extends AnnotatorModel[RelationExtractionDLModel] with WriteTensorflowModel with HasStorageRef with HasCaseSensitiveProperties with HasSimpleAnnotate[RelationExtractionDLModel] with RelationEncoding with HasEngine with HandleExceptionParams with HasSafeAnnotate[RelationExtractionDLModel] with CheckLicense

Extracts and classifies instances of relations between named entities.

Extracts and classifies instances of relations between named entities. In contrast with RelationExtractionModel, RelationExtractionDLModel is based on BERT. For pretrained models please see the Models Hub for available models.

Example

Relation Extraction between body parts

This is a continuation of the RENerChunksFilter example. See that class on how to extract the relation chunks. Define the extraction model

val re_ner_chunk_filter = new RENerChunksFilter()
 .setInputCols("ner_chunks", "dependencies")
 .setOutputCol("re_ner_chunks")
 .setMaxSyntacticDistance(4)
 .setRelationPairs(Array("internal_organ_or_component-direction"))

val re_model = RelationExtractionDLModel.pretrained("redl_bodypart_direction_biobert", "en", "clinical/models")
  .setPredictionThreshold(0.5f)
  .setInputCols("re_ner_chunks", "sentences")
  .setOutputCol("relations")

val trained_pipeline = new Pipeline().setStages(Array(
  documenter,
  sentencer,
  tokenizer,
  words_embedder,
  pos_tagger,
  clinical_ner_tagger,
  ner_chunker,
  dependency_parser,
  re_ner_chunk_filter,
  re_model
))

val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia").toDF("text")
val result = trained_pipeline.fit(data).transform(data)

Show results

result.selectExpr("explode(relations) as relations")
 .select(
   "relations.metadata.chunk1",
   "relations.metadata.entity1",
   "relations.metadata.chunk2",
   "relations.metadata.entity2",
   "relations.result"
 )
 .where("result != 0")
 .show(truncate=false)
+------+---------+-------------+---------------------------+------+
|chunk1|entity1  |chunk2       |entity2                    |result|
+------+---------+-------------+---------------------------+------+
|upper |Direction|brain stem   |Internal_organ_or_component|1     |
|left  |Direction|cerebellum   |Internal_organ_or_component|1     |
|right |Direction|basil ganglia|Internal_organ_or_component|1     |
+------+---------+-------------+---------------------------+------+

See also: RelationExtractionModel for ML based extraction
RENerChunksFilter on how to create inputs

class RelationExtractionModel extends GenericClassifierModel with ParamsAndFeaturesWritable with HandleExceptionParams with HasSafeAnnotate[GenericClassifierModel]

Extracts and classifies instances of relations between named entities.

Extracts and classifies instances of relations between named entities. For this, relation pairs need to be defined with setRelationPairs, to specify between which entities the extraction should be done.

For pretrained models please see the Models Hub for available models.

Example

Relation Extraction between body parts

Define pipeline stages to extract entities

val documenter = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val sentencer = new SentenceDetector()
  .setInputCols("document")
  .setOutputCol("sentences")

val tokenizer = new Tokenizer()
  .setInputCols("sentences")
  .setOutputCol("tokens")

val words_embedder = WordEmbeddingsModel.pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("embeddings")

val pos_tagger = PerceptronModel.pretrained("pos_clinical", "en", "clinical/models")
  .setInputCols("sentences", "tokens")
  .setOutputCol("pos_tags")

val dependency_parser = DependencyParserModel.pretrained("dependency_conllu", "en")
  .setInputCols("sentences", "pos_tags", "tokens")
  .setOutputCol("dependencies")

val clinical_ner_tagger = MedicalNerModel.pretrained("jsl_ner_wip_greedy_clinical","en","clinical/models")
  .setInputCols("sentences", "tokens", "embeddings")
  .setOutputCol("ner_tags")

val ner_chunker = new NerConverter()
  .setInputCols("sentences", "tokens", "ner_tags")
  .setOutputCol("ner_chunks")

Define the relations that are to be extracted

val relationPairs = Array("direction-external_body_part_or_region",
                      "external_body_part_or_region-direction",
                      "direction-internal_organ_or_component",
                      "internal_organ_or_component-direction")

val re_model = RelationExtractionModel.pretrained("re_bodypart_directions", "en", "clinical/models")
  .setInputCols("embeddings", "pos_tags", "ner_chunks", "dependencies")
  .setOutputCol("relations")
  .setRelationPairs(relationPairs)
  .setMaxSyntacticDistance(4)
  .setPredictionThreshold(0.9f)

val pipeline = new Pipeline().setStages(Array(
  documenter,
  sentencer,
  tokenizer,
  words_embedder,
  pos_tagger,
  clinical_ner_tagger,
  ner_chunker,
  dependency_parser,
  re_model
))

val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia").toDF("text")
val result = pipeline.fit(data).transform(data)

Show results

result.selectExpr("explode(relations) as relations")
 .select(
   "relations.metadata.chunk1",
   "relations.metadata.entity1",
   "relations.metadata.chunk2",
   "relations.metadata.entity2",
   "relations.result"
 )
 .where("result != 0")
 .show(truncate=false)
+------+---------+-------------+---------------------------+------+
|chunk1|entity1  |chunk2       |entity2                    |result|
+------+---------+-------------+---------------------------+------+
|upper |Direction|brain stem   |Internal_organ_or_component|1     |
|left  |Direction|cerebellum   |Internal_organ_or_component|1     |
|right |Direction|basil ganglia|Internal_organ_or_component|1     |
+------+---------+-------------+---------------------------+------+

See also: RelationExtractionApproach to train your own model.
RelationExtractionDLModel for BERT based extraction

case class RelationInstance(relationType: String, entity1: String, entity2: String, entity1_begin: Int, entity1_end: Int, entity2_begin: Int, entity2_end: Int, chunk1: String, chunk2: String, chunk1_conf: String, chunk2_conf: String, vector: Array[Float], description: String, sentence: Int = 0) extends Product with Serializable

class ZeroShotRelationExtractionModel extends MedicalBertForSequenceClassification with RelationEncoding with HasEngine

ZeroShotRelationExtractionModel implements zero shot binary relations extraction by utilizing BERT transformer models trained on the NLI (Natural Language Inference) task.

ZeroShotRelationExtractionModel implements zero shot binary relations extraction by utilizing BERT transformer models trained on the NLI (Natural Language Inference) task. The model inputs consists of documents/sentences and paired NER chunks, usually obtained by RENerChunksFilter. The definitions of relations which are extracted is given by a dictionary structures, specifying a set of statements regarding the relationship of named entities. These statements are automatically appended to each document in the dataset and the NLI model is used to determine whether a particular relationship between entities.

Pretrained models can be loaded with pretrained of the companion object:

val zeroShotRE = ZeroShotRelationExtractionModel.pretrained()
  .setInputCols("token", "document")
  .setOutputCol("label")

For available pretrained models please see the Models Hub.

Example

val documentAssembler = new DocumentAssembler()
  .setInputCol("text")
  .setOutputCol("document")

val tokenizer = new Tokenizer()
  .setInputCols(Array("document"))
  .setOutputCol("tokens")

val sentencer = new SentenceDetector()
  .setInputCols(Array("document"))
  .setOutputCol("sentences")

val embeddings = WordEmbeddingsModel
  .pretrained("embeddings_clinical", "en", "clinical/models")
  .setInputCols(Array("sentences", "tokens"))
  .setOutputCol("embeddings")

val posTagger = PerceptronModel
  .pretrained("pos_clinical", "en", "clinical/models")
  .setInputCols(Array("sentences", "tokens"))
  .setOutputCol("posTags")

val nerTagger = MedicalNerModel
  .pretrained("ner_clinical", "en", "clinical/models")
  .setInputCols(Array("sentences", "tokens", "embeddings"))
  .setOutputCol("nerTags")

val nerConverter = new NerConverter()
  .setInputCols(Array("sentences", "tokens", "nerTags"))
  .setOutputCol("nerChunks")

val dependencyParser = DependencyParserModel
  .pretrained("dependency_conllu", "en")
  .setInputCols(Array("document", "posTags", "tokens"))
  .setOutputCol("dependencies")

val reNerFilter = new RENerChunksFilter()
  .setRelationPairs(Array("problem-test","problem-treatment"))
  .setMaxSyntacticDistance(4)
  .setDocLevelRelations(false)
  .setInputCols(Array("nerChunks", "dependencies"))
  .setOutputCol("RENerChunks")

val re = ZeroShotRelationExtractionModel
  .load("/tmp/spark_sbert_zero_shot")
  .setRelationalCategories(
    Map(
      "CURE" -> Array("{TREATMENT} cures {PROBLEM}."),
      "IMPROVE" -> Array("{TREATMENT} improves {PROBLEM}.", "{TREATMENT} cures {PROBLEM}."),
      "REVEAL" -> Array("{TEST} reveals {PROBLEM}.")
      ))
  .setPredictionThreshold(0.9f)
  .setMultiLabel(false)
  .setInputCols(Array("sentences", "RENerChunks"))
  .setOutputCol("relations)

val pipeline = new Pipeline()
  .setStages(Array(
    documentAssembler,
    sentencer,
    tokenizer,
    embeddings,
    posTagger,
    nerTagger,
    nerConverter,
    dependencyParser,
    reNerFilter,
    re))

val model = pipeline.fit(Seq("").toDS.toDF("text"))
val results = model.transform(
  Seq("Paracetamol can alleviate headache or sickness. An MRI test can be used to find cancer.").toDS.toDF("text"))

results
  .selectExpr("EXPLODE(relations) as relation")
  .selectExpr("relation.result", "relation.metadata.confidence")
  .show(truncate = false)

+-------+----------+
|result |confidence|
+-------+----------+
|REVEAL |0.9760039 |
|IMPROVE|0.98819494|
|IMPROVE|0.9929625 |
+-------+----------+

See also: http://jmlr.org/papers/v21/20-074.html for details about using NLI models for zero shot categorization
RENerChunksFilter on how to generate paired named entity chunks for relation extraction

Value Members

object REFeatureGenerator
object RENerChunksFilter extends RENerChunksFilter with ParamsAndFeaturesReadable[RENerChunksFilter]
object RelationDirection
object RelationExtractionApproach extends RelationExtractionApproach
object RelationExtractionDLModel extends ReadablePretrainedRelationExtractionDLModel with ReadRelationExtractionDLModelTensorflowModel with Serializable
object RelationExtractionModel extends ReadsGenericClassifierGraph[RelationExtractionModel] with ReadablePretrainedGenericClassifier[RelationExtractionModel] with Serializable
object ZeroShotRelationExtractionModel extends ReadablePretrainedZeroShotRelationExtractionModel with ReadZeroShotRelationExtractionModel with Serializable
This is the companion object of MedicalBertForSequenceClassification.
This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.

Packages

re

package re

Type Members

Example

Example

Example

Relation Extraction between body parts

Example

Relation Extraction between body parts

Example

Value Members

Ungrouped

Packages

re 

package re

Type Members

Example

Example

Example

Relation Extraction between body parts

Example

Relation Extraction between body parts

Example

Value Members

Ungrouped

re