Packages

p

com.johnsnowlabs.legal.graph

relation_extraction

package relation_extraction

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. class RelationExtractionDLModel extends nlp.annotators.re.RelationExtractionDLModel

    Extracts and classifies instances of relations between named entities.

    Extracts and classifies instances of relations between named entities. In contrast with RelationExtractionModel, RelationExtractionDLModel is based on BERT. For pretrained models please see the Models Hub for available models.

    Example

    Relation Extraction between body parts

    This is a continuation of the RENerChunksFilter example. See that class on how to extract the relation chunks. Define the extraction model

    val re_ner_chunk_filter = new RENerChunksFilter()
     .setInputCols("ner_chunks", "dependencies")
     .setOutputCol("re_ner_chunks")
     .setMaxSyntacticDistance(4)
     .setRelationPairs(Array("internal_organ_or_component-direction"))
    
    val re_model = RelationExtractionDLModel.pretrained("redl_bodypart_direction_biobert", "en", "clinical/models")
      .setPredictionThreshold(0.5f)
      .setInputCols("re_ner_chunks", "sentences")
      .setOutputCol("relations")
    
    val trained_pipeline = new Pipeline().setStages(Array(
      documenter,
      sentencer,
      tokenizer,
      words_embedder,
      pos_tagger,
      clinical_ner_tagger,
      ner_chunker,
      dependency_parser,
      re_ner_chunk_filter,
      re_model
    ))
    
    val data = Seq("MRI demonstrated infarction in the upper brain stem , left cerebellum and  right basil ganglia").toDF("text")
    val result = trained_pipeline.fit(data).transform(data)

    Show results

    result.selectExpr("explode(relations) as relations")
     .select(
       "relations.metadata.chunk1",
       "relations.metadata.entity1",
       "relations.metadata.chunk2",
       "relations.metadata.entity2",
       "relations.result"
     )
     .where("result != 0")
     .show(truncate=false)
    +------+---------+-------------+---------------------------+------+
    |chunk1|entity1  |chunk2       |entity2                    |result|
    +------+---------+-------------+---------------------------+------+
    |upper |Direction|brain stem   |Internal_organ_or_component|1     |
    |left  |Direction|cerebellum   |Internal_organ_or_component|1     |
    |right |Direction|basil ganglia|Internal_organ_or_component|1     |
    +------+---------+-------------+---------------------------+------+
    See also

    RelationExtractionModel for ML based extraction

    RENerChunksFilter on how to create inputs

  2. class ZeroShotRelationExtractionModel extends nlp.annotators.re.ZeroShotRelationExtractionModel

    ZeroShotRelationExtractionModel implements zero shot binary relations extraction by utilizing BERT transformer models trained on the NLI (Natural Language Inference) task.

    ZeroShotRelationExtractionModel implements zero shot binary relations extraction by utilizing BERT transformer models trained on the NLI (Natural Language Inference) task. The model inputs consists of documents/sentences and paired NER chunks, usually obtained by RENerChunksFilter. The definitions of relations which are extracted is given by a dictionary structures, specifying a set of statements regarding the relationship of named entities. These statements are automatically appended to each document in the dataset and the NLI model is used to determine whether a particular relationship between entities.

    Pretrained models can be loaded with pretrained of the companion object:

    val zeroShotRE = ZeroShotRelationExtractionModel.pretrained()
      .setInputCols("token", "document")
      .setOutputCol("label")

    For available pretrained models please see the Models Hub.

    Example

    val documentAssembler = new DocumentAssembler()
      .setInputCol("text")
      .setOutputCol("document")
    
    val tokenizer = new Tokenizer()
      .setInputCols(Array("document"))
      .setOutputCol("tokens")
    
    val sentencer = new SentenceDetector()
      .setInputCols(Array("document"))
      .setOutputCol("sentences")
    
    val embeddings = WordEmbeddingsModel
      .pretrained("embeddings_clinical", "en", "clinical/models")
      .setInputCols(Array("sentences", "tokens"))
      .setOutputCol("embeddings")
    
    val posTagger = PerceptronModel
      .pretrained("pos_clinical", "en", "clinical/models")
      .setInputCols(Array("sentences", "tokens"))
      .setOutputCol("posTags")
    
    val nerTagger = MedicalNerModel
      .pretrained("ner_clinical", "en", "clinical/models")
      .setInputCols(Array("sentences", "tokens", "embeddings"))
      .setOutputCol("nerTags")
    
    val nerConverter = new NerConverter()
      .setInputCols(Array("sentences", "tokens", "nerTags"))
      .setOutputCol("nerChunks")
    
    val dependencyParser = DependencyParserModel
      .pretrained("dependency_conllu", "en")
      .setInputCols(Array("document", "posTags", "tokens"))
      .setOutputCol("dependencies")
    
    val reNerFilter = new RENerChunksFilter()
      .setRelationPairs(Array("problem-test","problem-treatment"))
      .setMaxSyntacticDistance(4)
      .setDocLevelRelations(false)
      .setInputCols(Array("nerChunks", "dependencies"))
      .setOutputCol("RENerChunks")
    
    val re = ZeroShotRelationExtractionModel
      .load("/tmp/spark_sbert_zero_shot")
      .setRelationalCategories(
        Map(
          "CURE" -> Array("{TREATMENT} cures {PROBLEM}."),
          "IMPROVE" -> Array("{TREATMENT} improves {PROBLEM}.", "{TREATMENT} cures {PROBLEM}."),
          "REVEAL" -> Array("{TEST} reveals {PROBLEM}.")
          ))
      .setPredictionThreshold(0.9f)
      .setMultiLabel(false)
      .setInputCols(Array("sentences", "RENerChunks"))
      .setOutputCol("relations)
    
    val pipeline = new Pipeline()
      .setStages(Array(
        documentAssembler,
        sentencer,
        tokenizer,
        embeddings,
        posTagger,
        nerTagger,
        nerConverter,
        dependencyParser,
        reNerFilter,
        re))
    
    val model = pipeline.fit(Seq("").toDS.toDF("text"))
    val results = model.transform(
      Seq("Paracetamol can alleviate headache or sickness. An MRI test can be used to find cancer.").toDS.toDF("text"))
    
    results
      .selectExpr("EXPLODE(relations) as relation")
      .selectExpr("relation.result", "relation.metadata.confidence")
      .show(truncate = false)
    
    +-------+----------+
    |result |confidence|
    +-------+----------+
    |REVEAL |0.9760039 |
    |IMPROVE|0.98819494|
    |IMPROVE|0.9929625 |
    +-------+----------+
    See also

    http://jmlr.org/papers/v21/20-074.html for details about using NLI models for zero shot categorization

    RENerChunksFilter on how to generate paired named entity chunks for relation extraction

Value Members

  1. object RelationExtractionDLModel extends ReadablePretrainedRelationExtractionDLModel with ReadRelationExtractionDLModelTensorflowModel with Serializable
  2. object ZeroShotRelationExtractionModel extends ReadablePretrainedZeroShotRelationExtractionModel with ReadZeroShotRelationExtractionModel with Serializable

    This is the companion object of MedicalBertForSequenceClassification.

    This is the companion object of MedicalBertForSequenceClassification. Please refer to that class for the documentation.

Ungrouped